February 25 2025

Scupa: single-cell unified polarization assessment of immune cells using the single-cell foundation model

Abstract

Motivation:

Immune cells undergo cytokine-driven polarization in response to diverse stimuli, altering their transcriptional profiles and functional states. This dynamic process is central to immune responses in health and diseases, yet a systematic approach to assess cytokine-driven polarization in single-cell RNA sequencing data has been lacking.

Results:

To address this gap, we developed single-cell unified polarization assessment (Scupa), the first computational method for comprehensive immune cell polarization assessment. Scupa leverages data from the Immune Dictionary, which characterizes cytokine-driven polarization states across 14 immune cell types. By integrating cell embeddings from the single-cell foundation model Universal Cell Embeddings, Scupa effectively identifies polarized cells across different species and experimental conditions. Applications of Scupa in independent datasets demonstrated its accuracy in classifying polarized cells and further revealed distinct polarization profiles in tumor-infiltrating myeloid cells across cancers. Scupa complements conventional single-cell data analysis by providing new insights into dynamic immune cell states, and holds potential for advancing therapeutic insights, particularly in cytokine-based therapies.

Availability and implementation:

The code is available at https://github.com/bsml320/Scupa.

Authors: Wendao Liu ⚬, Zhongming Zhao ⚬

Affiliations: The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, United States; Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States

License: © The Author(s) 2025. Published by Oxford University Press. CC BY 4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Article links: DOI: 10.1093/bioinformatics/btaf090 | PubMed: 39999031 | PMC: PMC11893155

Relevance: Moderate: mentioned 3+ times in text

Full text: PDF (8.3 MB)

1 Introduction

Immune cells detect and respond to a variety of stimuli, ensuring the body can effectively combat infections and other environmental or biological threats. Cytokines are crucial signaling molecules that facilitate communication between immune cells. These cytokines can induce significant changes in the transcriptional profiles and functional states of immune cells, a phenomenon known as immune cell polarization (ref. Kourilsky and Truffa-Bachi 2001, ref. Murray 2017). Through polarization, immune cells adapt their responses to better address specific challenges, enhancing the overall effectiveness of the immune system.

Recently, single-cell RNA sequencing (scRNA-seq) has been widely applied in immunological studies to investigate the responses of immune cells under various conditions. The production and response to cytokines, which play critical regulatory roles, have been extensively studied in numerous diseases, including COVID-19 (ref. Ren , ref. Liu ), rheumatoid arthritis (ref. Zhang ), and cancers (ref. Chung , ref. Garris ). Importantly, a recent study systematically characterized the responses of 14 immune cell types to each of 86 cytokines and summarized the results as the Immune Dictionary (ref. Cui ). A total of 66 cytokine-driven cell polarization states were identified in that study, serving as a valuable reference for assessing immune cell polarization in other scRNA-seq studies.

Currently, the investigation of immune cell polarization in scRNA-seq data is limited and there lacks consistent standards across studies. Most studies identified polarized cells based on the expression of certain signature genes from previous findings. However, this empirical approach suffers from technical noises, such as dropout effect and batch effect (ref. Kharchenko , ref. Hicks ), as well as biological variations including tissue or disease variations. For example, many studies analysed macrophage polarization using M1 and M2 signature genes, but there was no consensus in the signature gene lists and the expression of these genes varied by conditions (ref. Cochain , ref. Cheng , ref. Theocharidis ). To address this challenge, recent advances in single-cell foundation models offer unprecedented potential. These models were trained on vast amounts of scRNA-seq data with millions of cells across all the representative human organs and learned cell representations within a unified biological latent space (ref. Rosen , ref. Cui , ref. Hao ). As they have been successfully demonstrated in multiple downstream tasks like cell type annotation and batch correction, we hypothesized that single-cell foundation models could also effectively represent immune cell polarization.

To facilitate the analysis of immune cell polarization in scRNA-seq data, we developed Scupa for single-cell unified polarization assessment. After being trained on scRNA-seq data from the Immune Dictionary, Scupa learns the representations of immune cell polarization within the latent space of Universal Cell Embeddings (UCEs) (ref. Rosen ). As the first computational method for systematic immune cell polarization analysis, Scupa enables the assessment of individual cell polarization across various predefined cytokine-driven cell polarization states in any scRNA-seq dataset.

2 Materials and methods

2.1 Data collection

The Immune Dictionary scRNA-seq dataset was downloaded from Single Cell Portal, which characterizes the single-cell transcriptomic profiles of immune cell types in lymph nodes from mice treated with each of 86 cytokines (https://singlecell.broadinstitute.org/single_cell/study/SCP2554/). The IFN-β-treated human peripheral blood mononuclear cell (PBMC) scRNA-seq dataset with cell type annotation was downloaded using SeuratData (https://github.com/satijalab/seurat-data) (ref. Kang ). The rest datasets were downloaded from Gene Expression Omnibus (GEO) database with following accession numbers: cytokine-treated human macrophage scRNA-seq dataset (GSE168710) (ref. Zhang ), IL-2-treated mouse spleen scRNA-seq dataset (GSE206732) (ref. Hashimoto ), and pan-cancer infiltrating myeloid cell scRNA-seq data (GSE154763) (ref. Cheng ). The code and processed datasets with UCE cell embeddings for generating the results are available in the Zenodo repository (https://doi.org/10.5281/zenodo.13312247).

2.2 Generating cell embeddings using UCE and dimension reduction

We used single-cell foundation model, UCE, to generate cell embeddings for all scRNA-seq datasets used in this study. The pretrained four-layer model was employed with a batch size of 50. The UCE cell embeddings are 1280-d, representing cells in the unified latent space. However, this high dimensionality poses challenges for training machine learning models on most polarization states with a limited number of cells.

To address this issue, we performed principal component analysis (PCA) to reduce the dimensionality of UCE cell embeddings for each cell type. Principal components (PCs) are linear combinations of vector bases in the cell embedding space, with the top PCs representing directions with the largest variation for that cell type. By default, Scupa uses the first 20 PCs as features for machine learning training and prediction. Additionally, we generated 2-d uniform manifold approximation and projections (UMAPs) using the first 20 PCs for data visualization.

2.3 Identifying fully polarized cells in the Immune Dictionary

According to both our analysis and the original study (ref. Cui ), only a subset of immune cells were polarized after cytokine treatment in the in vivo experiments. This was likely due to variable cytokine concentration, receptor expression, and cellular status of different cells. The cell embeddings of some cells from cytokine-treated samples were closer to those of unpolarized cells from the phosphate buffered saline (PBS)-treated samples than other cells from cytokine-treated samples, likely suggesting an unpolarized state or mildly polarization state. Therefore, we first identified fully polarized cells of each polarization state for training machine learning models.

We identified the fully polarized cells of each polarization state based on the following three criteria. (1) The cell is from a sample treated with one of the driving cytokines. (2) The mean expression of top marker genes of the polarization in the cell is higher than that of most other cells. (3) The UCE cell embeddings of the cell are similar to those of the other cells from the samples treated with driving cytokines.

For criterion 2, we used a consistent threshold of the 90th quantile, i.e., the mean expression of top marker genes in the cell was required to be higher than that in 90% cells of the same cell type. For criterion 3, we found that the UCE cell embeddings of the same cell type were highly correlated and, thus, not informative for identifying fully polarized cells. To overcome this issue, we calculated the “embedding shift” as the vector difference between the cell embedding of each cell and the cell embedding of unpolarized cell center, which represented the cell embedding change from the unpolarized state. We then calculated the cosine similarity between the embedding shift of each cell with the rest of the cells from the samples treated with driving cytokines. Fully polarized cells had to secure a minimum mean cosine similarity, ranging from 0.08 to 0.2 among different cell types. Those cells satisfying all criteria were considered fully polarized cells and then used for training machine learning models alongside unpolarized cells from PBS-treated samples.

2.4 Training and testing machine learning models

When training machine learning models to classify unpolarized cells and polarized cells, we tested several models including: (1) logistic regression using the “glm” function from R package “stats”, (2) support vector machine (SVM) using the “svm” function from R package “e1071”, (3) random forest using the “randomForest” function from R package “randomForest”, and (4) semi-supervised learning approach. For each cell type, 70% of cells were randomly selected for training and the remaining 30% for testing. During training, the unpolarized cells were labeled with a polarization score of 0, while the fully polarized cells identified in the previous step were labeled with a polarization score of 1. For binary classification models, the predicted probability to the fully polarized state was used as the polarization score. For regression models, the output prediction was clamped to a range of 0–1 and used as the polarization score. We repeated the training and testing 20 times and calculated the mean area under the receiver operating characteristic curve (AUROC) values for each machine learning model. SVM showed the best performance with the highest mean AUROC values across all polarization states. The final SVM models in Scupa were trained using the “svm” function from R package “e1071”, with a linear kernel and C-classification type (ref. Meyer and Wien 2001).

For the semi-supervised learning approach, we included the cells from the samples treated with cytokines other than the polarization state-driving cytokines as unlabeled data. We first trained supervised machine learning models (logistic regression, SVM, random forest) on labeled data: unpolarized cells and fully polarized cells from the previous step. The trained models were then used to classify unlabeled cells as either unpolarized or polarized. In the end, the final machine learning models were trained on the combined data from initial identification and following prediction. In our comparison of the testing results from supervised models with semi-supervised models, we found that the semi-supervised models generally had slightly worse performance compared to the corresponding supervised models, despite that they improved the performance on some polarization states with small cell numbers (Supplementary Table S1). Therefore, we did not use the trained semi-supervised models for prediction in the Scupa package.

2.5 Quantifying statistical uncertainty using conformal prediction

Because the transcriptomic change during immune cell polarization is a continuous process, a binary classifier may not be able to identify the intermediately polarized cells. Therefore, we used the conformal prediction to quantify the statistical uncertainty in polarization assessment (ref. Alvarsson ). In brief, we randomly divided the polarized and unpolarized cells into five folds. Four folds were used to train the machine learning model, and the rest fold was used for calibration. The process was repeated five times with each fold used for calibration to calculate nonconformity scores. The nonconformity score $S_{n c}$ for each cell was calculated as the absolute value of difference between polarization score $S_{polar}$ and the true label $I_{polarized}$ (0 for unpolarized cells and 1 for polarized cells):

S_{n c} = | S_{polar} - I_{polarized} |

Given an error level α, the quantile threshold for nonconformity scores of N cells is calculated as:

\hat{q} = quantile (S_{n c, 1}, \dots, S_{n c, N}; ⌈(1 - α) (N + 1)⌉ / N)

For each polarization state, Scupa makes a class prediction as one of:

Polarized ( $S_{polar} > 1 - \hat{q}, S_{polar} \geq \hat{q}$ );
Unpolarized ( $S_{polar} \leq 1 - \hat{q}, S_{polar} < \hat{q}$ );
Intermediate ( $S_{polar} > 1 - \hat{q}, S_{polar} < \hat{q}$ ), meaning that a cell is predicted as both polarized and unpolarized;
Uncertain ( $S_{polar} \leq 1 - \hat{q}, S_{polar} \geq \hat{q}$ ), meaning that a cell is predicted as neither polarized nor unpolarized.

The error level is set to 0.05 by default, and decreasing the error level will result in more intermediate predictions and less uncertain predictions.

2.6 Cross-dataset batch effect correction

As a single-cell foundation model, UCE is robust to dataset and batch-specific artifacts, though cross-dataset batch effects may still persist between the Immune Dictionary and other datasets. To enhance Scupa’s transferability to diverse datasets, we provide a straightforward and effective approach for cross-dataset batch effect correction. UCE’s capability allows us to represent cross-dataset batch effects as the difference in unpolarized cell embeddings between two datasets. When there are untreated control and treated samples, cells from the control samples could be specified as unpolarized cells. Scupa first calculates the center of reference unpolarized cells’ UCE cell embeddings $c_{ref}$ , and the center of input unpolarized cells’ UCE cell embeddings $c_{i n}$ . For a cell k with UCE cell embeddings ( $em b_{k, original}$ ) from the input dataset, its cell embeddings are adjusted to:

em b_{k, adjusted} = em b_{k, original} - c_{i n} + c_{ref}

This adjustment allows the learned representations of immune cell polarization to be applied to the adjusted cell embeddings, bypassing complicated data integration processes and thus, preserving polarization information that might be lost with scRNA-seq data integration methods (ref. Stuart , ref. Polański ). In the Scupa package, we implement this batch effect correction approach and provide an optional parameter for users to specify unpolarized cells in the input dataset, suitable for experimental designs with untreated healthy controls.

Regarding the cytokine treatment datasets in Scupa evaluation, we used this batch effect correction approach when analysing immune cell polarization. In the IFN-β-treated human PBMC scRNA-seq dataset and cytokine-treated human macrophage scRNA-seq dataset, the cells from the untreated sample were specified as unpolarized cells. Similarly, in the IL-2-treated mouse spleen scRNA-seq dataset, the cells from the untreated mouse were specified as unpolarized cells. We found only slight differences in the polarization analysis results when specifying unpolarized cells or not in these datasets. This evaluation indicated the robustness of UCE and Scupa to cross-dataset batch effect correction. For the pan-cancer myeloid cell dataset, we did not specify unpolarized cells due to the absence of an untreated healthy control sample in the dataset.

2.7 Benchmarking single-cell foundation models

We compared predicting immune cell polarization using cell embeddings from three single-cell foundation models, UCE, scGPT (ref. Cui ), and scFoundation (ref. Hao ). The cell embeddings for the Immune Dictionary were generated using the scGPT-continual pretrained model, and the pretrained scFoundation model. As these two models were trained only on human data, we converted the mouse gene symbols to their human orthologs as input. We used the same data processing, training, and testing procedures for cell embeddings from three models. Top 20 PCs cell embeddings were calculated for repeated training and testing 20 times, with random 70% cells for training and 30% cells for testing. AUROC values were calculated to measure the model performance.

2.8 Statistical analysis

When comparing the cell proportions between two conditions in the IL-2-treated mouse spleen scRNA-seq dataset, we used two-sided Fisher exact test. All statistical analyses were performed in R (v 4.4.1).

3 Results

3.1 Overview of Scupa framework and immune cell polarization states

Immune cells undergo transcriptional and phenotypical changes in cytokine-driven polarization. Scupa uses the immune cell polarization states in the Immune Dictionary as the reference, and trains machine learning models to distinguish between polarized cells from cytokine-treated samples and unpolarized cells from PBS-treated control samples. The specific polarized states of a cell type are usually driven by the specific cytokines, with each state exhibiting a unique transcriptional profile. Instead of relying on gene expression features, which can vary across species, tissues, and conditions, Scupa utilizes cell embeddings from the single-cell foundation model UCE for assessing immune cell polarization (ref. Rosen ). We further reduced the dimension of cell embeddings using PCA, and trained SVM models to classify polarized cells and unpolarized cells based on PCs. The models learned the representation of each polarized state in the latent space of UCEs, allowing for transferability to other datasets (Fig. 1a). We found that SVM outperformed several other machine learning models in this task (Supplementary Table S1). In addition, we chose UCE over other single-cell foundation models because of its advantages in multiple-species compatibility, “zero-shot” capability without the need for fine tuning, and high model performance. SVM models trained on 1280-d UCE cell embeddings and 3072-d scFoundation cell embeddings achieved similar high performance (mean AUROC = 0.978), while those trained on 512-d scGPT cell embeddings had lower performance (mean AUROC = 0.928, Supplementary Table S2).

PMC11893155 – btaf090-F1 — **Figure 1.:** The framework of Scupa and cytokine-driven cell polarization states. (a) Scupa uses the Immune Dictionary as the reference and training set for measuring immune cell polarization. Major immune cell polarization states were defined in the Immune Dictionary according to driving cytokines and top marker genes. Universal Cell Embeddings (UCEs) were generated for the Immune Dictionary and any new scRNA-seq dataset for Scupa. Scupa trained support vector machine (SVM) models for various cell polarization states using the Immune Dictionary, and then it applies the trained models to predict polarization in a new dataset. It outputs the polarization score and class prediction for each individual cell. Created with BioRender.com. (b) A network showing the driving cytokines and immune cell polarization states.

For any new scRNA-seq dataset containing immune cells presented in the Immune Dictionary, Scupa examines whether the cells have similar transcriptional changes as reference polarized cells, thereby inferring their polarization states and received cytokines. The polarization of each cell is assessed based on its UCE cell embeddings. The trained SVM models predict the polarization scores of each cell using the learned representation in the unified latent space of UCEs. According to the cell type, Scupa assigns a score to each individual cell for every polarization state, ranging from 0 (unpolarized) to 1 (fully polarized). Additionally, Scupa outputs a class prediction as one of polarized, intermediate, unpolarized, and uncertain class based on conformal prediction, facilitating the identification of significantly polarized cells and intermediate cells. Scupa is designed for integration into the widely used Seurat pipeline for comprehensive single-cell data analyses (ref. Hao ), enabling the output scores and class predictions to be readily visualized in multiple formats (Fig. 1a).

Scupa supports the analysis of 66 polarization states in 14 immune cell types from the Immune Dictionary (Fig. 1b, Supplementary Fig. S1). Among these states, the “a states” of all cell types represent those driven by some type-I interferons (IFN-α1, IFN-β, IFN-ε, IFN-κ). The “b states” represent those driven by IFN-γ and interleukins inducing IFN-γ expression (IL-2, IL-12, IL-15, IL-18). The “c states” are for those driven by two proinflammatory cytokines IL-1α and IL-1β, and the “d states” are mainly driven by TNF-α. In contrast, the driving cytokines of the “e states” and “f states” in various cell types tend to vary. Identification of polarized cells using Scupa suggests the potential presence of one or more driving cytokines in the tissue, thereby facilitating the understanding of immune cell environment, communication, and response in scRNA-seq data.

3.2 Scupa learns the representation of cell polarization in various immune cell types

Cytokines are key regulators of intracellular signaling and gene expression, acting as messengers that mediate and modulate immune responses. They activate signaling cascades that lead to the phosphorylation and activation of various transcription factors, which then translocate to the nucleus to modulate the transcription of genes. Consequently, the cytokine-driven immune cell polarization states are characterized by unique transcriptional profiles (ref. Cui ). UCE can effectively capture these transcriptional changes and represent them as variations in cell embeddings. For example, we found that the CD8+ T cells treated with driving cytokines of each polarization state tended to have cell embeddings shifting away from the unpolarized cells, as visualized by UMAP (Fig. 2a). By filtering cells based on cosine similarity and the expression of top marker genes, we obtained fully polarized cells in each state, which displayed distinctly different cell embedding distributions from unpolarized cells (Fig. 2b).

PMC11893155 – btaf090-F2 — **Figure 2.:** Scupa learns the representation of polarization states in CD8+ T cells and other cells. (a) Uniform manifold approximation and projection (UMAP) plots showing CD8+ T cells from the samples treated with driving cytokines of each polarization state or from the control samples treated with PBS. UMAPs are derived from UCE cell embeddings rather than gene expression. (b) UMAP plot showing the distribution of fully polarized cells of each polarization state after filtering. (c) UMAP plots showing polarization scores of each polarization state from Scupa prediction. (d) The Spearman correlation coefficients between polarization scores from each two polarization states. (e) The distribution of polarization scores by predicted classes using conformal prediction at the error level 0.05. (f) Box plots showing testing AUROC values in 20 repeats across all polarization states.

For each polarization state, we trained an SVM model to classify polarized cells and unpolarized cells, and quantified the polarization using a score derived from the trained models. With this approach, the gradients of polarization scores in the unified cell embedding space represent the directions of cell polarization (Fig. 2c). Among all CD8+ T cells, we found that the polarization scores of some states are highly correlated (Fig. 2d). For example, the Spearman correlation coefficient between T8-e and T8-f state scores is 0.94. This indicates a high similarity in transcriptional changes between these two states, which aligns with the findings in the original study (ref. Cui ). We then derived class predictions as one of “unpolarized”, “intermediate”, “polarized”, and “uncertain” from polarization scores based on conformal prediction (Fig. 2e). Importantly, in addition to CD8+ T cells, we observed similar patterns of polarization scores and class predictions across all other 13 cell types and 61 polarization states. This demonstrated that Scupa could effectively learn the representation of all polarization states in the unified cell embedding space (Supplementary Figs S2–S14).

Next, we evaluated the performance of Scupa across all polarization states. Using a random 70% of the data for training and the remaining 30% for testing, we repeated this process 20 times and calculated testing AUROC value. The median AUROC values were above 0.95 in 59 out of the 66 polarization states (Fig. 2f). For polarization states with lower performance, such as LC-d (median AUROC = 0.898), the primary factor was the insufficient number of cells leading to poor model fitting (Supplementary Fig. S15). Overall, Scupa achieves superior performance in learning the representations of polarization states in scRNA-seq data.

3.3 Scupa assesses immune cell polarization in in vitro cytokine stimulation datasets

To evaluate Scupa’s performance in independent datasets, we collected two scRNA-seq datasets generated from in vitro cytokine-stimulated samples. The first dataset comprises human PBMC samples stimulated with IFN-β or left unstimulated in vitro (ref. Kang ). IFN-β stimulation dramatically induces the transcription of interferon-stimulated genes, causing the cell embeddings of stimulated cells to diverge significantly from those of unstimulated cells across all immune cell types (Fig. 3a). We applied Scupa to analyse the immune cell polarization in this dataset. Considering that IFN-β is the driving cytokine of “a states” in all immune cell types (Fig. 1b), we examined the polarization scores of these states. We found that all cells from the IFN-β-stimulated sample had polarization scores close to 1, while those cells from the control sample had scores close to 0. This clearly indicates a tremendous difference (Fig. 3b). Additionally, nearly all cells from the IFN-β-stimulated sample were predicted as polarized cells, whereas those cells from the control sample were predicted as unpolarized cells (Fig. 3c, Supplementary Fig. S16a and b). The ROC curves further confirmed the near-perfect performance of using polarization scores to classify cells from stimulated and control samples, with AUROCs above 0.99 in most cell types (Fig. 3d).

PMC11893155 – btaf090-F3 — **Figure 3.:** Validation of Scupa in in vitro cytokine stimulation datasets. (a) UMAP plots showing the immune cells from the control and IFN-β-stimulated samples. UMAPs are derived from UCE cell embeddings rather than gene expression. Ctrl: unstimulated control sample. Stim: IFN-β-stimulated sample. (b) UMAP plots showing the polarization scores in different immune cell types. The cells from IFN-β-stimulated samples generally had much higher scores. (c) UMAP plots showing the class predictions. (d) ROC curves showing the superior performance of polarization scores for classifying immune cells from the control and IFN-β-stimulated samples. (e) The mean of predicted polarization scores increased by the simulated polarization level across cell types. (f) The class predictions transitioned from “unpolarized” to “intermediate” or “uncertain”, and then to “polarized” with the increasing simulated polarization level. (g) UMAP plots showing macrophages from the control and five cytokine-stimulated samples. (h) UMAP plots showing the macrophage polarization scores. (i) The mean macrophage polarization scores of the control and cytokine-stimulated samples. (j) ROC curves comparing the performance using polarization scores or polarization marker gene mean expression for classifying macrophages from the control and different cytokine-stimulated samples. IFNγ(+) stands for IFN-γ stimulated and TNF-α/IFN-γ-costimulated samples. TNFα(+) stands for TNF-α stimulated and TNF-α/IFN-γ-costimulated samples.

Cell polarization is a continuous process with the gradual upregulation of interferon-stimulated genes. Accordingly, we simulated a cell trajectory with intermediate polarization levels from this dataset. Simulated cells were generated as linear combinations of randomly selected unpolarized and polarized cells, with the simulated polarization level defined by the proportion of polarized cells. As expected, polarization scores predicted by Scupa increased with higher simulated polarization levels, though the rate of increase varied across cell types (Fig. 3e). Among the cell types, monocytes and type 1 conventional dendritic cells (cDC1s) exhibited the highest polarization scores at most polarization levels, indicating greater sensitivity to IFN-β stimulation, whereas natural killer (NK) cells were the least sensitive. At the default error level of 0.05 for conformal prediction, intermediate or uncertain predictions primarily occurred at simulated polarization levels between 0.2 and 0.5, with variations observed across cell types (Fig. 3f). Reducing the error level to 0.01 significantly increased the proportion of cells classified as intermediate and decreased the proportion classified as unpolarized or polarized (Supplementary Fig. S16c). These findings demonstrate that Scupa can effectively identify not only fully polarized cells but also those with intermediate polarization levels, leveraging polarization scores and conformal prediction.

In the second dataset, human macrophages differentiated from blood monocytes were stimulated with IFN-β, IFN-γ, TNF-α, and IL-4 (ref. Zhang ). These cytokines induced macrophages to polarize into different states: Mac-a, Mac-b, Mac-d, and Mac-e. Furthermore, the authors revealed that TNF-α and IFN-γ costimulation induced a proinflammatory phenotype with the upregulation of IL-1B, leading to macrophage polarization into Mac-c state. UMAP visualization of UCE cell embeddings revealed significant separation between IFN-β, IFN-γ-stimulated, or TNF-α/IFN-γ-costimulated cells and unstimulated cells, while TNF-α- and IL-4-stimulated cells were closer to unstimulated cells (Fig. 3g). We calculated the polarization scores using Scupa, and then compared the scores across conditions (Fig. 3h and i). These polarization scores aligned well with expectations. IFN-β-stimulated macrophages exhibited the highest Mac-a scores, followed by IFN-γ-stimulated and TNF-α/IFN-γ-costimulated macrophages. Mac-b scores were highest in IFN-γ-stimulated and TNF-α/IFN-γ-costimulated macrophages, followed by IFN-β-stimulated macrophages. TNF-α/IFN-γ-costimulated macrophages showed highest Mac-c scores. TNF-α/IFN-γ-costimulated macrophages showed the highest Mac-d scores, followed by IFN-γ- and TNF-α stimulated macrophages. At last, Mac-e scores were highest in IL-4-stimulated macrophages. To assess the effectiveness of polarization scores, we benchmarked them against the mean expression of polarization marker genes using ROC curves (Fig. 3j, Supplementary Fig. S1). Polarization scores consistently demonstrated better or comparable performance in classifying all five polarization states, highlighting the strength of cell embedding-based classifiers over marker gene expression.

Importantly, although Scupa was trained on mouse scRNA-seq data, it demonstrated good performance on the two human scRNA-seq data without any adaptation, suggesting that Scupa is capable of assessing immune cell polarization using unified representations across species.

3.4 Scupa reveals increased polarized cells after cytokine therapy to mouse model

To evaluate Scupa for in vivo experiments, we collected a scRNA-seq dataset generated from cytokine-treated mice. In this study, the mice chronically infected with lymphocytic choriomeningitis virus (LCMV) were treated with IL-2, anti-PD-L1, or a combination therapy (ref. Hashimoto ). The virus-specific CD8+ T cells from the mouse spleen were sorted for scRNA-seq. We clustered cells from three treatment groups and the control group using UCE cell embeddings (Fig. 4a), and applied Scupa to analyse the CD8+ T cell polarization in this dataset. As IL-2 is the driving cytokine for two polarization states, T8-b and T8-e, we examined the polarization scores and class predictions of these two states (Fig. 4b and c). A small number of cells were polarized to T8-b state and enriched in cluster 3, while more cells were polarized to T8-e state and enriched in cluster 2. To verify the two polarization states with shared driving cytokines, we examined the expression of polarization state marker genes identified in the Immune Dictionary (ref. Cui ). We found that the expression of some marker genes correlated with polarization scores, such as Stat1 with T8-b scores and Ptma with T8-e scores. This result suggested that Scupa could distinguish these two polarization states. On the other hand, we observed the ubiquitous high or low expression of some marker genes (e.g. Igtp, Gbp4, Ncl, and Npm1) in all CD8+ T cells, highlighted the variability of gene expression in different datasets (Fig. 4d and e). Therefore, Scupa’s independence on marker genes makes it more applicable to various scRNA-seq data from diverse sources.

PMC11893155 – btaf090-F4 — **Figure 4.:** Application of Scupa to an in vivo cytokine treatment dataset. (a) UMAP plots showing CD8+ T cells in different clusters and treatment groups. (b) UMAP plots showing polarization scores of two IL-2-driving states, T8-b and T8-e. (c) UMAP plots showing polarization predictions of two IL-2-driving states. (d) The expression level of five T8-b marker genes. (e) The expression level of five T8-e marker genes. (f) The percentage of predicted polarized cells among different treatment groups. (g) The percentage of cells in cluster 2 among different treatment groups. ****: P < .0001.

Since IL-2 was administrated in vivo, it was likely that only a subset of spleen T cells was polarized by IL-2, while other T cells were polarized by different cytokines or remained unpolarized. To analyse the effect of IL-2 treatment, we compared the percentage of predicted polarized cells among groups. This percentage significantly increased in the IL-2 treatment group when compared to the control group (P < .0001), and increased in the combination therapy treatment group when compared to the anti-PD-L1 treatment group (P < .0001, Fig. 4f). We also compared the percentage of cells in cluster 2, which was highly enriched with the cells polarized to T8-e state. A similar increase was observed with IL-2 treatment (Fig. 4g), confirming the CD8+ T cell polarization to IL-2 driving polarization states.

As above, Scupa was validated using scRNA-seq datasets generated from both in vitro and in vivo samples. It superbly classified stimulated and unstimulated cells from in vitro samples, and also revealed the increases of cell polarized to a cytokine-driving polarization state in in vivo samples.

3.5 Scupa reveals polarization states and proinflammatory responses of myeloid cells across cancer types

Myeloid cells play a crucial role in the tumor microenvironment, influencing cancer progression and response to therapy. These cells, which include macrophages, monocytes, dendritic cells (DCs), and neutrophils, can exhibit either pro-tumor or anti-tumor functions depending on their polarization state and the cytokine milieu (ref. Engblom , ref. van Vlerken-Ysla ). Understanding the dual roles of myeloid cells in cancers is essential for developing therapeutic strategies that modulate their function to inhibit tumor progression and improve patient outcomes. In order to systematically analyse the myeloid cell polarization in multiple cancer types, we applied Scupa to a pan-cancer single-cell atlas of tumor-infiltrating myeloid cell dataset (ref. Cheng ).

We first revisited the macrophage polarization using Scupa as it has been extensively investigated in cancer research (ref. Najafi ). We calculated the polarization scores of five polarization states for all macrophage clusters and compared these scores across seven cancer types. Among the five states, Mac-b, Mac-c, and Mac-d are all M1-like states driven by proinflammatory cytokines, while Mac-e is a M2-like state driven by cytokines that induce M2 polarization (Fig. 1b) (ref. Murray 2017). Several macrophage clusters displayed consistent polarization profiles across cancer types. C1QC+ and LYVE1+ macrophages generally had low polarization scores of all polarization states, indicating that they were mainly unpolarized. ISG15+ macrophages generally had high Mac-a, Mac-b, Mac-c, and Mac-d polarization scores, suggesting that they were primarily polarized by type-I interferons and proinflammatory cytokines. NLRP3+ and INHBA+ macrophages also had high Mac-b, Mac-c, and Mac-d scores. This might indicate polarization by proinflammatory cytokines. In particular, SPP1+ macrophages showed significant variation across these cancer types. They had the highest Mac-e polarization scores in thyroid carcinoma (THCA), but had much lower scores in pancreatic adenocarcinoma (PAAD) or uterine corpus endometrial carcinoma (UCEC). When comparing macrophage polarization in different cancer types, we found that PAAD and kidney cancer had the lowest overall polarization scores in all polarization states. This is potentially due to lower cytokine production in these tumors (Fig. 5a).

PMC11893155 – btaf090-F5 — **Figure 5.:** Comparison of the polarization states of infiltrating myeloid cells across seven cancer types by Scupa. (a) The mean polarization scores in different macrophage subpopulations from different cancer types. Polarization scores are scaled across cancer types. (b) The polarization scores of macrophage states Mac-b, Mac-c, and Mac-d were positively correlated with M1 signature gene expression. In contrast, the polarization scores of macrophage state Mac-e were not positively correlated with M2 signature gene expression. (c) The correlation between macrophage state polarization scores, M1 signature gene expression, M2 signature gene expression, angiogenesis signature gene expression, and phagocytosis signature gene expression in all macrophages. (d) LYM macrophages display the overall highest proinflammatory state scores. (e) The mean polarization scores in different monocyte subpopulations from seven cancer types. (f) The mean polarization scores in different DC subpopulations from seven cancer types. ESCA: esophageal carcinoma, KIDNEY: kidney cancer, LYM: lymphoma, MYE: myeloma, PAAD: pancreatic adenocarcinoma, THCA: thyroid carcinoma, UCEC: uterine corpus endometrial carcinoma.

Next, we analysed the relationship between polarization scores of macrophage polarization states and the expression of M1 and M2 signature genes (ref. Cheng ). The mean expression of M1 signature genes was positively correlated with Mac-b, Mac-c, and Mac-d polarization scores across all cancer types. In contrast, there was no consistent correlation between the mean expression of M2 signature genes and Mac-e polarization scores (Fig. 5b). This lack of correlation was likely attributed to the observation that the M2 signature genes did not overlap with the Mac-e marker genes from the Immune Dictionary, thus resulting in strong variation in different studies. Additionally, we included angiogenesis and phagocytosis signature genes in the correlation analysis (ref. Cheng ). The mean expression of angiogenesis signature genes was moderately positively correlated with Mac-b (Spearman’s correlation coefficient ρ = 0.24), Mac-c (ρ = 0.42), Mac-d (ρ = 0.47), and Mac-e (ρ = 0.27) polarization scores, highlighting macrophages’ simultaneous contributions to inflammation and angiogenesis (ref. Ono 2008). Conversely, the mean expression of phagocytosis signature genes was negatively correlated with polarization scores of all polarization states, suggesting that phagocytotic macrophages were generally less polarized by cytokines (Fig. 5c). Considering the high correlation between Mac-b, Mac-c, and Mac-d polarization scores, and proinflammatory cytokines as driving cytokines for these states, we defined a macrophage proinflammatory state score as the maximum of these three scores in each single cell. This score summarizes macrophage proinflammatory polarization and its distribution indicates the proinflammatory activity in different cancers. We found that lymphoma (LYM) was characterized with the highest overall proinflammatory scores, followed by THCA and esophageal carcinoma (ESCA). PAAD and kidney cancer exhibited the lowest overall proinflammatory scores (Fig. 5d).

In addition to macrophages, we examined the polarization of monocytes and different DC populations. These cell populations displayed stronger variation in polarization across cancer types than macrophages (Fig. 5e and f). In LYM, CD14+ monocytes, CD16+ monocytes, cDC2, and pDCs had high polarization scores of all polarization states, supporting the strong effect and important role of multiple cytokines in the cancer (ref. Skinnider and Mak 2002). In ESCA, UCEC, and THCA, monocytes, cDC1, and pDCs were also polarized to multiple states. Notably, the overall polarization of macrophages, monocytes, and DCs showed a consistent trend in all seven cancer types. All these myeloid cell populations were more polarized in LYM, THCA, UCEC, and THCA, but less polarized in PAAD, LYE, and kidney cancer. This trend likely reflects the variation in cytokine milieu in the tumor microenvironment of these cancer types (Fig. 5d–f). Taken together, these results indicated the distinct cytokine environments, myeloid cell polarization patterns, and proinflammatory responses in different cancers, demonstrating Scupa’s capability to complement the conventional scRNA-seq analysis with unique cell polarization analysis.

4 Discussion

In this study, we introduce Scupa, the first computational method to assess immune cell polarization from scRNA-seq data. Scupa is designed to complement the conventional scRNA-seq analysis pipelines by providing additional perspectives in the cytokine environment and immune cell polarization. The method leverages the recently released Immune Dictionary, which systematically characterized the responses of 14 immune cell types to 86 cytokines and then identified 66 cytokine-driven polarization states. Unlike traditional approaches that rely on predefined signature genes, Scupa utilizes cell embeddings from the single-cell foundation model, UCE, to capture the nuanced transcriptional changes associated with different polarization states. Our results clearly indicated that Scupa could effectively classify polarized and unpolarized cells by training machine learning models on cell embeddings. The method was validated using independent datasets, including human PBMCs treated with IFN-β, human macrophages treated with different cytokines, and CD8+ T cells from mice spleen treated with IL-2 and anti-PD-L1. Scupa accurately identified polarized cells and revealed the cytokine-driven polarization states within these datasets. With UCE’s multiple-species compatibility, even though our method was trained on the mouse data of Immune Dictionary, it could effectively be applied to human dataset without additional fine tuning. While this needs further evaluation, it suggested the robustness of our method across species.

Additionally, we applied Scupa to a pan-cancer single-cell atlas to investigate the polarization of myeloid cells across seven cancer types. The analysis reveals distinct polarization profiles and proinflammatory responses in macrophage, monocyte, and DC populations. Notably, LYM, THCA, UCEC exhibit higher polarization scores when compared to PAAD, LYE, and kidney cancer, reflecting distinct cytokine environments in different cancer types.

Macrophage polarization has been one of the top research interests in immunology due to its significant role in health and disease. Traditionally, macrophages have been categorized into two main polarization states. The first state is M1, or classically activated macrophages that are induced by proinflammatory cytokines like IFN-γ and TNF-α. These macrophages are associated with antimicrobial and tumoricidal activities. The second state is M2, or alternatively activated macrophages that are stimulated by cytokines such as IL-4 and IL-13. These macrophages are involved in tissue repair and immune regulation (ref. Murray 2017, ref. Najafi ). However, this dichotomous classification has been increasingly recognized as an oversimplification, with emerging evidence suggesting a spectrum of intermediate states influenced by a variety of cytokines and environmental cues (ref. Ma ). In many scRNA-seq studies, the lack of consistent marker genes and standards further makes the polarization inference arbitrary. Scupa provides a signature gene-free approach for analysing macrophage polarization to five cytokine-driving states. Using Scupa, we identified the C1QC+ or LYVE1+ unpolarized macrophage subpopulations and multiple macrophage subpopulations polarized to M1-like polarization states. Of note, Scupa analysis revealed SPP1+ macrophages polarized to Mac-e state, a M2-like polarization state, suggesting its pro-tumor roles. This result was further supported by previous findings that worse clinical outcomes were associated with higher SPP1 expression (ref. Cheng ).

Cytokine therapies have emerged as a powerful strategy for treating various diseases, leveraging the potent regulatory effects of cytokines to modulate immune responses and target disease processes. These therapies harness the ability of cytokines to influence cellular behavior, enhancing or suppressing immune responses as needed. For instance, cytokines such as interferons have been used to treat viral infections and certain cancer types (ref. Parker , ref. Kaufmann ), while interleukins have shown promise in enhancing immune responses in cancer immunotherapy (ref. Berraondo , ref. Hashimoto ). Scupa’s ability to systematically assess immune cell polarization based on scRNA-seq data offers a valuable tool for assessing the molecular effects of cytokine therapies (Fig. 4). By evaluating how cytokine treatments impact the polarization and functional states of immune cells, Scupa can provide insights into the therapeutic mechanisms at a granular level. It also holds the potential to identify therapeutic targets for cytokine therapies, facilitating the development and optimization of cytokine-based treatments.

There are some limitations in this work. First, there lacks high-quality experimentally generated immune cell polarization states. Although we included all predefined polarization states from the Immune Dictionary, there may be additional, unidentified polarization states due to the constraints of experimental design. For example, some chemokines have been found to induce macrophage polarization (ref. Xuan , ref. Ruytinx ), but the chemokine family was not included in the Immune Dictionary. Second, there are noncytokine pathways of immune cell polarization, such as hypoxia and lactate for macrophage polarization (ref. Colegio , ref. El Kasmi ). Identification of new immune cell polarization states will require further investigations comparable to the Immune Dictionary, and necessary immunological expertise. Such data has not been available yet, but Scupa can be updated to include newly identified polarization states in future studies. Third, while Scupa has demonstrated robust in identifying polarization and states in both mice and humans, more evaluation is needed to enhance the models across species.

In conclusion, we introduced Scupa, the first method for comprehensive immune cell polarization assessment using scRNA-seq data. It is broadly appliable to the studies of various diseases involving immune cell populations and is particularly useful in contexts where cytokines play important roles in disease pathogenesis, progression, and treatment.

Supplementary Materials

btaf090_supplementary_data.zip (ZIP)

References

Predicting with confidence: using conformal prediction in drug discovery.. J Pharm Sci, 2021. [PubMed]
Cytokines in clinical cancer immunotherapy.. Br J Cancer, 2019. [PubMed]
A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells.. Cell, 2021. [PubMed]
Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer.. Nat Commun, 2017. [PubMed]
Single-cell RNA-Seq reveals the transcriptional landscape and heterogeneity of aortic macrophages in murine atherosclerosis.. Circ Res, 2018. [PubMed]
Functional polarization of tumour-associated macrophages by tumour-derived lactic acid.. Nature, 2014. [PubMed]
Dictionary of immune responses to cytokines at single-cell resolution.. Nature, 2024. [PubMed]
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.. Nat Methods, 2024. [PubMed]
Adventitial fibroblasts induce a distinct proinflammatory/profibrotic macrophage phenotype in pulmonary hypertension.. J Immunol, 2014. [PubMed]
The role of myeloid cells in cancer therapies.. Nat Rev Cancer, 2016. [PubMed]
Successful anti-PD-1 cancer immunotherapy requires T cell-dendritic cell crosstalk involving the cytokines IFN-gamma and IL-12.. Immunity, 2018. [PubMed]
Large-scale foundation model on single-cell transcriptomics.. Nat Methods, 2024. [PubMed]
Integrated analysis of multimodal single-cell data.. Cell, 2021. [PubMed]
PD-1 combination therapy with IL-2 modifies CD8(+) T cell exhaustion program.. Nature, 2022. [PubMed]
Missing data and technical variability in single-cell RNA-sequencing experiments.. Biostatistics, 2018. [PubMed]
Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.. Nat Biotechnol, 2018. [PubMed]
Host-directed therapies for bacterial and viral infections.. Nat Rev Drug Discov, 2018. [PubMed]
Bayesian approach to single-cell differential expression analysis.. Nat Methods, 2014. [PubMed]
Cytokine fields and the polarization of the immune response.. Trends Immunol, 2001. [PubMed]
Delineating COVID-19 immunological features using single-cell RNA sequencing.. Innovation (Camb), 2022. [PubMed]
Macrophage diversity in cancer revisited in the era of single-cell omics.. Trends Immunol, 2022. [PubMed]
Support vector machines.. R News, 2001
Macrophage polarization.. Annu Rev Physiol, 2017. [PubMed]
Macrophage polarity in cancer: a review.. J Cell Biochem, 2019. [PubMed]
Molecular links between tumor angiogenesis and inflammation: inflammatory stimuli of macrophages and cancer cells as targets for therapeutic strategy.. Cancer Sci, 2008. [PubMed]
Antitumour actions of interferons: implications for cancer therapy.. Nat Rev Cancer, 2016. [PubMed]
BBKNN: fast batch alignment of single cell transcriptomes.. Bioinformatics, 2020. [PubMed]
COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas.. Cell, 2021. [PubMed]
Rosen Y , RoohaniY, AgarwalA et al Universal cell embeddings: a foundation model for cell biology. bioRxiv, 2023.2011.2028.568918, 2023, 10.1101/2023.11.28.568918, preprint: not peer reviewed.
Chemokine-induced macrophage polarization in inflammatory conditions.. Front Immunol, 2018. [PubMed]
The role of cytokines in classical Hodgkin lymphoma.. Blood, 2002. [PubMed]
Comprehensive integration of single-cell data.. Cell, 2019. [PubMed]
Single cell transcriptomic landscape of diabetic foot ulcers.. Nat Commun, 2022. [PubMed]
Functional states of myeloid cells in cancer.. Cancer Cell, 2023. [PubMed]
The chemotaxis of M1 and M2 macrophages is regulated by different chemokines.. J Leukoc Biol, 2015. [PubMed]
IFN-gamma and TNF-alpha drive a CXCL10+ CCL2+ macrophage phenotype expanded in severe COVID-19 lungs and inflammatory diseases with tissue inflammation.. Genome Med, 2021. [PubMed]
Defining inflammatory cell states in rheumatoid arthritis joint synovial tissues by integrating single-cell transcriptomics and mass cytometry.. Nat Immunol, 2019. [PubMed]