ROSALIND Automated Cell Type Prediction and Cell Type Score Methods

One of the biggest challenges of single cell (scRNA) data analysis is cell type prediction. Single cell type prediction is often a time consuming and laborious process. ROSALIND enables automated cell type prediction by using a marker gene-based scoring system. Following the organ or tissue selection, ROSALIND assigns cell type enrichment scores with an improved cell marker database. Cell clusters are then annotated with cell type labels corresponding to the highest prediction score. Simultaneously, ROSALIND also runs Gene Set Enrichment Analysis (GSEA) to predict cell types using knowledge bases like PanglaoDB2 and others, by generating p-values for enrichment analysis.
 

More specifically, ROSALIND Automated Cell Type Prediction utilizes positive and negative cell markers from public scRNA-seq datasets to predict cell types represented by the clusters with a modified version of the ScType cell marker database1. Negative gene markers are particularly useful in distinguishing similar immune cell types (i.e. CD8 T-cells vs. CD4 T-cells, or even further into naïve or effector T-cells for each subclass).


Cluster cell type prediction is calculated by assigning a Cell Type Score to each potential cell type in every cluster. The Cell Type Score is dependent on the number of cells within that cluster. Both positive and negative cell markers are assigned a specificity score. The more specific that marker is to a cell type, the higher the specificity score. The more common that marker is across multiple cell types, the lower the specificity score. Expression of markers in each cell for a cluster is normalized and multiplied by the specificity score, which can be positive or negative. The score of each cell will be calculated and the cluster cell enrichment score will be a total of all cell scores where the expression of the negative marker set is subtracted from the expression of the positive marker set. 

A cluster will not be assigned a cell type if a low or negative score is calculated. A low score is considered a value less than a quarter of the total number of cells within the cluster. A cluster will be assigned to the highest Cell Type Score if the minimum threshold is met of greater than a quarter of the total number of cells within the cluster.

 

For example, if 1000 cells are in Cluster 3 and cell type scores are calculated to:

myeloid = +300
lymphocytes = +250
platelets = -20

The cluster will be assigned to myeloid. Both myeloid and lymphocytes meet the threshold of a quarter of the total number of cells within the cluster, but the myeloid cell score is greater and therefore will be assigned to Cluster 3.

 

To enable ROSALIND Automated Cell Type Prediction, select your organ or tissue type during the single cell experiment analysis set-up. Each available organ or tissue type is a broad category encompassing tissue specific subcategories to provide more targeted cell type prediction. Each organ or tissue type also includes immune cell classes.

*Please note: if you select "Other" as the organ or tissue type, ROSALIND Automated Cell Type Prediction will not be included in the analysis results.

 

References:

  1. Ianevski, A., Giri, A.K. & Aittokallio, T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat Commun 13, 1246 (2022). https://doi.org/10.1038/s41467-022-28803-w