Object Classification
Add labels to your detections to differentiate them!
Single measurement => Composite classifier
We start with the Single measurement classifier interface, how to visualize the change, color settings, and combine in transparency from earlier. See the video at 2:15.
CD31 nucleus 900
CD8a Nucleus 750
PD-1 nucleus 1500
Ki67 Nucleus 900
Once the individual classifiers “per channel” are created, combine them into a Composite classifier, as shown at 5:29. The outputs of this classifier will either have no class (if not positive for any marker), or any linear combination of the selected markers. The final name will always be in the order of the selected single measurement classifiers, so you cannot get both CD31+: CD8a+ and CD8a+: CD31+ cells in the same run of the classifier. You CAN end up reversing them though, in different runs of the classifier, which may cause issues with downstream processing in R/Excel or similar software.
Does class X:Y include single positives?
Machine learning classifiers
Next we dive into using QuPath’s built in machine learning tools, which are build on OpenCV libraries for classification.
The above video demonstrates the use of the Train object classifier, which requires annotated training data. One of the most important points I feel the need to emphasize here is that the classifier does not “see” what your cells look like. It is purely based on the measurements associated with the objects in question. Two important sub-points stem from this:
1. If your measurements change, the classifier breaks. That means the same classifier cannot be applied to cells generated through StarDist, CellPose and QuPath’s built in Cell Detection as they generate differently named measurements. You also need to add all necessary measurements to each other image - a single missing measurement can break the classifier.
2. Whatever difference between cell types you are trying to use to separate them into classes must be represented in the data. Certain cells being more dendritic or branched versus rounded will be meaningless if there is no measurement within the cell’s list of data that represents that type of information.
Warning
The two main ways of creating training data for the object classifier are using “area annotations”, like the Brush tool, or Points annotations, for more specific cell typing. The classifier uses the class of the annotation to determine what the class of the detection they are annotating “should” be when training the classifier. For Points annotations, the point must be within the outline of the cell/detection. For area annotations, the cell centroid is the only point that matters - if the centroid of a cell is within the area annotation, it counts as positive for that class. If the centroid is not covered by the area, the cell is not counted.
Generally, if you are training area based classifications, such as the follicle classifier shown in the video, you would want to use a Brush or Wand tool. Alternatively, if you want to pick out T cells, B cells, or something more specific like a dividing cell, you will want to use the Points annotations. These annotation types are not exclusive. You could have a very general Tumor and Stroma defined using Brush tool annotations, then pick out individual cells of interest using Points annotations of a third class. Creating area-based classifications is obviously much quicker, just make sure you do not include any of your Points-identified cell types within those area annotations!
You can potentially have an enormous number of measurements for your cells using all of the tools QuPath has to offer. The more measurements you have, the more training data you need to create a robust classifier. In order to save yourself time generating sufficient training data to handle your full list of measurements, it may be worth limiting the number of measurements you use when training the classifier!
Top: Points annotations for specific cell type labeling. Each group of spots (one line in the associated table) would have a different class.
Bottom: Area annotations where each annotation has a specific class for training
These are two simple methods of classification. Some projects may require more stringent rules, manually curated decision trees, or even the combination of machine learning and single measurement classifiers - the results of a Train object classifier can be used as an input to a composite classifier, as shown in the video at 4:00! Read more about classification options here.
Finally, Pixel classifiers can be used to perform classifications. They follow the same rules as area annotation training data, which is why I am generally nervous about using them and will not go into any detail on this site. Specifically, that means that the pixel classification of the exact pixel that covers the centroid of the cell is the only pixel that matters. 90% of a cell could be classified as stroma, but if the very centroid is “pixel classified” as tumor, the cell will be tumor.
Object classification (detections)
Object types
The first things to know when starting classification:
As with pixel classifiers, keep a separate project for your ground truth and testing.
Your classifier is only as good as your segmentation. If you want to analyze a cell membrane marker and your cell borders do not trace that marker, you are going to get some very poor results (standard cell detection vs CellPose).
Your classifier and the measurements it is based off of are specific to your entire workflow. Don't expect an ML classifier to work on images taken on:
A different instrument
Same instrument with different settings
After a maintenance cycle
Using a different antibody batch
Or a different fixative (batch)
Different cell segmentation method (or even different settings)
And more
Here is a topic on attempting to control for some of these issues.
Your classifier is only as useful as your staining is consistent. That means if the fixation, staining, and imaging conditions are all consistent in genetically identical samples (mice), you could be good to go with a single classifier across the project. With human samples, this is unlikely to work and may require separate classifiers per sample or sample group.
Know your sample and your biology. Background can look very different for various antibodies for the same target protein, so you need to have someone who knows what normal expression looks like on a normal sample, or you may find all of your staining for a particular marker to be useless because the wrong conditions were used… too late. If you are not the biologist, communication is key.
Normalization is often used as a catchall to handle these situations, but that has its own set of problems if you actually look at what is going on. Not all distributions are gaussian, especially in biology, and depending on how you apply the normalization you can get very different results even though they are all "normalized".
Some useful scripts for the analysis
Script to activate the correct channels quickly.
This is no longer necessary as of QuPath 0.5.0, as you can now save channel settings within the Brightness/Contrast dialog, see the Visualization video at ~2 minutes.
viewer = getCurrentViewer() c = viewer.getImageDisplay().availableChannels() channelsOn = [1,2, 8, 14,15] c.eachWithIndex{channelInfo, x-> if(channelsOn.contains(x)) viewer.getImageDisplay().setChannelSelected(channelInfo, true) else viewer.getImageDisplay().setChannelSelected(channelInfo, false) } viewer.repaintEntireImage()
Generate some cells.
createFullImageAnnotation(true) runPlugin('qupath.imagej.detect.cells.WatershedCellDetection', '{"detectionImage":"Nuclear","requestedPixelSizeMicrons":0.4,"backgroundRadiusMicrons":0.0,"backgroundByReconstruction":true,"medianRadiusMicrons":0.6,"sigmaMicrons":0.6,"minAreaMicrons":10.0,"maxAreaMicrons":400.0,"threshold":3000.0,"watershedPostProcess":true,"cellExpansionMicrons":5.0,"includeNuclei":true,"smoothBoundaries":true,"makeMeasurements":true}')
Set the channel names to something shorter, as classification names will take the channel names.
setChannelNames ( 'PCNA', 'Nuclear', 'CD31', 'CD45', 'CD68', 'CD4', 'FOXP3', 'CD45RO', 'CD8a', 'CD20', 'PD-L1', 'CD3d', 'CD163', 'E-Cadherin', 'PD-1', 'Ki67', 'Pan-CK', 'AF')
Know your sample
Sample: Orion2.ome.tif
Tissue type: Tonsil - this is an immune cell dense environment
Channels:
PCNA - cell division marker (S phase)
Sytox Green - Nuclear marker
CD31 - Blood vessels
CD45 - Leukocyte marker Neutrophils, Eosinophils, Basophils, Lymphocytes, Monocytes
CD68 - Macrophages
CD4 - CD4 T cells
FOXP3 - Regulatory CD4 T cells
CD45RO - activated Leukocytes (not naive)
CD8a - CD8 T cell
CD20 - B cells
PD-L1 - Inactivates T cells expressing PD-1
CD3d - T cells, generic
CD163 - monocytes, M2 macrophages
E-Cadherin - Cell-cell adhesion, epithelial cells
PD-1 - Activated T cells
Ki67 - Cell proliferation
Pan-CK - Cytokeratin, tumor marker, usually
Autofluorescence - detect tissue vs no tissue, red blood cells tend to have high intensity here
For the single measurement/composite classifier video, we will be using a smaller subset of all of these for brevity. In addition to the Sytox Green for cell detection, we will look at:
CD31
CD8
PD-1
Ki67
As long as you are visually choosing thresholds, or making any kind of visual determination or justification of accuracy, start by making sure all of your channels have proper min and max brightness/contrast settings. “Proper” is tricky, however, and the existence of any good set of brightness/contrast settings is dependent on good sample preparation and proper imaging conditions. The staining patterns should be validated by a biologist/pathologist familiar with the distribution of a given marker, or possibly validated through online resources when necessary.
Types of classifiers
Composite classifiers:
Every marker/channel will be represented, all cells will be classified as positive for each of N markers.
Issues with classification can be more easily and granularly inspected, for example the existence of "impossible" sets of positive classifications.
Simple to implement
Time consuming to implement for dozens of channels
Does not require training data annnotations
Machine learning classifiers:
All cells will have a class (though that class may be “Negative” or similar)
Unfortunately, all cells will have one of the classes you created, even if there is a cell in your tissue that is not one of those classes. This can work well for concepts like Tumor vs Stroma, but less well if you have T cells, B cells, tumor cells, and endothelial cells. What would the macrophages end up being assigned to in such a classifier?
Usually this will provide a simpler/smaller set of classifications vs a composite classifier, with a much easier to read confusion table when validating. Validation metrics will also be easier to read and understand due to the non-overlapping classifications.
Usually requires more intensive validation, as you cannot so easily pick apart individual channel contributions.
Requires manual annotation of training data
Overall plan
For the first classifier, we will create a set of single measurement classifiers and then combine them into a composite classifier, much as is shown in the main documentation here. https://qupath.readthedocs.io/en/stable/docs/tutorials/multiplex_analysis.html
Then, a second video will show how to create a machine learning classifier for your cells, using training data rather than thresholds.
As with all of these videos, you may need to increase your resolution within the video to be able to read the text.