Single measurement => Composite classifier

We start with the Single measurement classifier interface, how to visualize the change, color settings, and combine in transparency from earlier. See the video at 2:15.

CD31 nucleus 900

CD8a Nucleus 750

PD-1 nucleus 1500

Ki67 Nucleus 900

Once the individual classifiers “per channel” are created, combine them into a Composite classifier, as shown at 5:29. The outputs of this classifier will either have no class (if not positive for any marker), or any linear combination of the selected markers. The final name will always be in the order of the selected single measurement classifiers, so you cannot get both CD31+: CD8a+ and CD8a+: CD31+ cells in the same run of the classifier. You CAN end up reversing them though, in different runs of the classifier, which may cause issues with downstream processing in R/Excel or similar software.

symbol Does class X:Y include single positives?

Short answer - No. Classifications are unique to the objects that have that classification, and are not inclusive of "sub" classes. That means when you look at your detection summary for an annotation, the X+, Y+ and X+:Y+ (double positive) popoulations are all unique cells, with no overlap at all.

Machine learning classifiers

Next we dive into using QuPath’s built in machine learning tools, which are build on OpenCV libraries for classification.

The above video demonstrates the use of the Train object classifier, which requires annotated training data. One of the most important points I feel the need to emphasize here is that the classifier does not “see” what your cells look like. It is purely based on the measurements associated with the objects in question. Two important sub-points stem from this:

1. If your measurements change, the classifier breaks. That means the same classifier cannot be applied to cells generated through StarDist, CellPose and QuPath’s built in Cell Detection as they generate differently named measurements. You also need to add all necessary measurements to each other image - a single missing measurement can break the classifier.

2. Whatever difference between cell types you are trying to use to separate them into classes must be represented in the data. Certain cells being more dendritic or branched versus rounded will be meaningless if there is no measurement within the cell’s list of data that represents that type of information.

symbol Warning

In certain versions of QuPath, missing measurements or renamed measurements will not stop the classifier from running - instead they will result in all cells being positive for a given class. If you are seeing 100% of your cells show up as one class, there is a 95% chance a measurement is missing, or has been renamed (for example, a channel was renamed, or not renamed). The other, smaller possibility is that you do not have sufficient training data and are using a Neural Network as your classifier type.

The two main ways of creating training data for the object classifier are using “area annotations”, like the Brush tool, or Points annotations, for more specific cell typing. The classifier uses the class of the annotation to determine what the class of the detection they are annotating “should” be when training the classifier. For Points annotations, the point must be within the outline of the cell/detection. For area annotations, the cell centroid is the only point that matters - if the centroid of a cell is within the area annotation, it counts as positive for that class. If the centroid is not covered by the area, the cell is not counted.

Generally, if you are training area based classifications, such as the follicle classifier shown in the video, you would want to use a Brush or Wand tool. Alternatively, if you want to pick out T cells, B cells, or something more specific like a dividing cell, you will want to use the Points annotations. These annotation types are not exclusive. You could have a very general Tumor and Stroma defined using Brush tool annotations, then pick out individual cells of interest using Points annotations of a third class. Creating area-based classifications is obviously much quicker, just make sure you do not include any of your Points-identified cell types within those area annotations!

You can potentially have an enormous number of measurements for your cells using all of the tools QuPath has to offer. The more measurements you have, the more training data you need to create a robust classifier. In order to save yourself time generating sufficient training data to handle your full list of measurements, it may be worth limiting the number of measurements you use when training the classifier!

Top: Points annotations for specific cell type labeling. Each group of spots (one line in the associated table) would have a different class.

Bottom: Area annotations where each annotation has a specific class for training

These are two simple methods of classification. Some projects may require more stringent rules, manually curated decision trees, or even the combination of machine learning and single measurement classifiers - the results of a Train object classifier can be used as an input to a composite classifier, as shown in the video at 4:00! Read more about classification options here.

Finally, Pixel classifiers can be used to perform classifications. They follow the same rules as area annotation training data, which is why I am generally nervous about using them and will not go into any detail on this site. Specifically, that means that the pixel classification of the exact pixel that covers the centroid of the cell is the only pixel that matters. 90% of a cell could be classified as stroma, but if the very centroid is “pixel classified” as tumor, the cell will be tumor.

Object classification (detections)

symbol Object types

Built in object classifiers in QuPath (as of 0.5.0) are limited to detection type objects, including cells. These methods will not classify Annotation objects or TMA cores. Such objects can be classified, but require scripting or manual changes to the classification.

The first things to know when starting classification:

  1. As with pixel classifiers, keep a separate project for your ground truth and testing.

  2. Your classifier is only as good as your segmentation. If you want to analyze a cell membrane marker and your cell borders do not trace that marker, you are going to get some very poor results (standard cell detection vs CellPose).

  3. Your classifier and the measurements it is based off of are specific to your entire workflow. Don't expect an ML classifier to work on images taken on:

    1. A different instrument

    2. Same instrument with different settings

    3. After a maintenance cycle

    4. Using a different antibody batch

    5. Or a different fixative (batch)

    6. Different cell segmentation method (or even different settings)

    7. And more

      Here is a topic on attempting to control for some of these issues.

  4. Your classifier is only as useful as your staining is consistent. That means if the fixation, staining, and imaging conditions are all consistent in genetically identical samples (mice), you could be good to go with a single classifier across the project. With human samples, this is unlikely to work and may require separate classifiers per sample or sample group.

  5. Know your sample and your biology. Background can look very different for various antibodies for the same target protein, so you need to have someone who knows what normal expression looks like on a normal sample, or you may find all of your staining for a particular marker to be useless because the wrong conditions were used… too late. If you are not the biologist, communication is key.

  6. Normalization is often used as a catchall to handle these situations, but that has its own set of problems if you actually look at what is going on. Not all distributions are gaussian, especially in biology, and depending on how you apply the normalization you can get very different results even though they are all "normalized". 

Some useful scripts for the analysis

Script to activate the correct channels quickly.

This is no longer necessary as of QuPath 0.5.0, as you can now save channel settings within the Brightness/Contrast dialog, see the Visualization video at ~2 minutes.

viewer = getCurrentViewer()

c = viewer.getImageDisplay().availableChannels()

channelsOn = [1,2, 8, 14,15]

c.eachWithIndex{channelInfo, x->

    if(channelsOn.contains(x))

        viewer.getImageDisplay().setChannelSelected(channelInfo, true)

    else

        viewer.getImageDisplay().setChannelSelected(channelInfo, false)

}

viewer.repaintEntireImage()

Generate some cells.

createFullImageAnnotation(true)

runPlugin('qupath.imagej.detect.cells.WatershedCellDetection', '{"detectionImage":"Nuclear","requestedPixelSizeMicrons":0.4,"backgroundRadiusMicrons":0.0,"backgroundByReconstruction":true,"medianRadiusMicrons":0.6,"sigmaMicrons":0.6,"minAreaMicrons":10.0,"maxAreaMicrons":400.0,"threshold":3000.0,"watershedPostProcess":true,"cellExpansionMicrons":5.0,"includeNuclei":true,"smoothBoundaries":true,"makeMeasurements":true}')

Set the channel names to something shorter, as classification names will take the channel names.

setChannelNames (   'PCNA',   'Nuclear',   'CD31', 
'CD45',   'CD68',   'CD4',   'FOXP3',   'CD45RO',   
'CD8a',   'CD20',   'PD-L1',   'CD3d',   'CD163',
'E-Cadherin',    'PD-1',   'Ki67',   'Pan-CK',   'AF')

Know your sample

Sample: Orion2.ome.tif

Tissue type: Tonsil - this is an immune cell dense environment

Channels:

  1. PCNA - cell division marker (S phase)

  2. Sytox Green - Nuclear marker

  3. CD31 - Blood vessels

  4. CD45 - Leukocyte marker Neutrophils, Eosinophils, Basophils, Lymphocytes, Monocytes

  5. CD68 - Macrophages

  6. CD4 - CD4 T cells

  7. FOXP3 - Regulatory CD4 T cells

  8. CD45RO - activated Leukocytes (not naive)

  9. CD8a - CD8 T cell

  10. CD20 - B cells

  11. PD-L1 - Inactivates T cells expressing PD-1

  12. CD3d - T cells, generic

  13. CD163 - monocytes, M2 macrophages

  14. E-Cadherin - Cell-cell adhesion, epithelial cells

  15. PD-1 - Activated T cells

  16. Ki67 - Cell proliferation

  17. Pan-CK - Cytokeratin, tumor marker, usually

  18. Autofluorescence - detect tissue vs no tissue, red blood cells tend to have high intensity here

For the single measurement/composite classifier video, we will be using a smaller subset of all of these for brevity. In addition to the Sytox Green for cell detection, we will look at:

  1. CD31

  2. CD8

  3. PD-1

  4. Ki67

As long as you are visually choosing thresholds, or making any kind of visual determination or justification of accuracy, start by making sure all of your channels have proper min and max brightness/contrast settings. “Proper” is tricky, however, and the existence of any good set of brightness/contrast settings is dependent on good sample preparation and proper imaging conditions. The staining patterns should be validated by a biologist/pathologist familiar with the distribution of a given marker, or possibly validated through online resources when necessary.

Types of classifiers

Composite classifiers:

  1. Every marker/channel will be represented, all cells will be classified as positive for each of N markers.

  2. Issues with classification can be more easily and granularly inspected, for example the existence of "impossible" sets of positive classifications.

  3. Simple to implement

  4. Time consuming to implement for dozens of channels

  5. Does not require training data annnotations

Machine learning classifiers:

  1. All cells will have a class (though that class may be “Negative” or similar)

  2. Unfortunately, all cells will have one of the classes you created, even if there is a cell in your tissue that is not one of those classes. This can work well for concepts like Tumor vs Stroma, but less well if you have T cells, B cells, tumor cells, and endothelial cells. What would the macrophages end up being assigned to in such a classifier?

  3. Usually this will provide a simpler/smaller set of classifications vs a composite classifier, with a much easier to read confusion table when validating. Validation metrics will also be easier to read and understand due to the non-overlapping classifications.

  4. Usually requires more intensive validation, as you cannot so easily pick apart individual channel contributions.

  5. Requires manual annotation of training data

Overall plan

For the first classifier, we will create a set of single measurement classifiers and then combine them into a composite classifier, much as is shown in the main documentation here. https://qupath.readthedocs.io/en/stable/docs/tutorials/multiplex_analysis.html

Then, a second video will show how to create a machine learning classifier for your cells, using training data rather than thresholds.

As with all of these videos, you may need to increase your resolution within the video to be able to read the text.