Skip to main content

Artificial intelligence learning landscape of triple-negative breast cancer uncovers new opportunities for enhancing outcomes and immunotherapy responses


Triple-negative breast cancer (TNBC) is a relatively aggressive breast cancer subtype due to tumor relapse, drug resistance, and multi-organ metastatic properties. Identifying reliable biomarkers to predict prognosis and precisely guide TNBC immunotherapy is still an unmet clinical need. To address this issue, we successfully constructed a novel 25 machine learning (ML) algorithms-based immune infiltrating cell (IIC) associated signature of TNBC (MLIIC), achieved by multiple transcriptome data of purified immune cells, TNBC cell lines, and TNBC entities. The TSI index was employed to determine IIC-RNAs that were accompanied by an expression pattern of upregulation in immune cells and downregulation in TNBC cells. LassoLR, Boruta, Xgboost, SVM, RF, and Pamr were utilized for further obtaining the optimal IIC-RNAs. Following univariate Cox regression analysis, LassoCox, CoxBoost, and RSF were utilized for the dimensionality reduction of IIC-RNAs from a prognostic perspective. RSF, Ranger, ObliqueRSF, Rpart, CoxPH, SurvivalSVM, CoxBoost, GlmBoost, SuperPC, StepwiseCox, Enet, LassoCox, CForest, Akritas, BlackBoost, PlsRcox, SurvReg, GBM, and CTree were used for determining the most potent MLIIC signature. Consequently, this MLIIC signature was correlated significantly with survival status validated by four independent TNBC cohorts. Also, the MLIIC signature had a superior predictive capability for TNBC prognosis, compared with 148 previously reported signatures. In addition, MLIIC signature scores developed by immunofluorescent staining of tissue arrays from TNBC patients showed a substantial prognostic value. In TNBC immunotherapy, the low MLIIC profile demonstrated significant immune-responsive efficacy in a dataset of multiple cancer types. MLIIC signature could also predict m6A epigenetic regulation which controls T cell homeostasis. Therefore, this well-established MLIIC signature is a robust predictive indicator for TNBC prognosis and the benefit of immunotherapy, thus providing an efficient tool for combating TNBC.


Triple-negative breast cancer (TNBC) is broadly defined as a particular subtype of breast cancer, which is negative for estrogen receptor-negative (ER), progesterone receptor-negative (PR), and human epidermal growth factor receptor 2 (HER2) expression. It is indisputable that TNBC is rather aggressive because of its tumor relapse, drug resistance, and multi-organ metastatic properties [1]. Platinum-based drugs, PARP inhibitors, and immunotherapeutics represented by antibody-drug couples, PD-L1/PD-1 checkpoint inhibitors, and their diverse combination modalities are now becoming prominent options for the clinical management of TNBC [2, 3]. Meanwhile, the molecular heterogeneity of TNBC and the continuing deficiency of therapeutically successful approaches beyond chemotherapy, have made TNBC characterized by poor prognosis [4]. Therefore, identifying reliable biomarkers to predict outcomes and precisely guide TNBC treatment decisions is still an unmet clinical need and warrants further investigation.

The tumor microenvironment (TME) is a highly complex multicellular entity consisting of tumor cells, immune cells, cancer-associated fibroblasts (CAFs) and adipocytes (CAAs), endothelial cells, extracellular matrix, and mesenchymal stem cells. In recent years, the advancement of multi-omics techniques has unearthed and unveiled the molecular heterogeneity characteristics within TNBC TME, highlighting the intensive dynamic correlations between the cancer cells and other non-neoplastic cells [5]. The tumor landscape deciphered by multi-omics, especially in TME complexity, may benefit the in-depth understanding of the TNBC oncogenesis, progression, and transformation. Immune cell types in TME are abundant, including various types of B cells, T cells, tumor-associated macrophages (TAMs), dendritic cells (DCs), and CAFs. These immune cells constitute the primary effector cells of the cancerous immune response, thus determining the TNBC prognosis significantly. For example, studies have demonstrated that patients with a high abundance of tumor-infiltrating lymphocytes (TILs) have higher performance in terms of overall survival (OS), disease-free survival (DFS), and complete pathological response (pCR), compared with patients with low levels of TILs [6]. Similarly, similar results are also observed in the CD4 + and CD8 + TIL populations, which strongly suggests that TIL could serve as a significant predictor for TNBC prognosis. In the age of big data, a variety of high-throughput data are developing, and making appropriate use of these data as a tool is commonly the key to understanding cancer mechanisms and justifying cancer treatment. Moreover, by integrating genomics, metabolomics, and other data, prediction algorithms using machine learning (ML) can accurately and rapidly process massive data for elucidating the molecular characteristics of TNBC at multiple scales [7]. The ML employment will provide efficient tools for mining novel and reliable markers, tumor prognosis and metastasis prediction, and hierarchical patient management [8,9,10,11].

Although previously reported algorithms have emphasized the intriguing characteristics of immune-related genes in judging TNBC prognosis, there were no comprehensive reports on immune infiltrating cell (IIC) associated signatures in TNBC-based ML. To address this issue, in the current study, we successfully constructed a TNBC MLIIC signature based on 25 ML algorithms, which was achieved by multiple transcriptome datasets of various purified immune cells, TNBC cell lines, as well as TNBC entities. Subsequently, the predictive capability of this MLIIC signature and its corresponding potential in optimizing the immunotherapy responsiveness in TNBC were meticulously validated. Undoubtedly, this well-validated MLIIC signature in TNBC in this study will provide a profound insight into interpreting prognosis, immune cell alterations, and tumor immune landscape in the ecosystem of TNBC.

Materials and methods

Acquisition of TNBC patients and various tumor cell line cohorts

Transcriptome data and clinical data of TNBC patients were obtained from 3 databases, including The Cancer Genome Atlas (TCGA, via Illumina-HiSeq platform, Molecular Taxonomy of Breast Cancer International Consortium (METABRIC, through Illumina-HiSeq platform, and Gene Expression Omnibus (GEO, via Illumina-HiSeq platform and Affymetrix Human Genome U133 Plus 2.0 Array platform. The overall samples enrolled in our research comprised 122 samples from the TCGA TNBC dataset (training set), 299 from the METABRIC dataset (validation set), 133 from the GSE96058 dataset (validation set), and 107 from the GSE103091 dataset (validation set). Transcriptome data were acquired from GSE36133 (Cancer Cell Line Encyclopedia project (CCLE)) for 20 TNBC cell lines using the Affymetrix Human Genome U133 Plus 2.0 Array platform. The fragments per kilobase million (FPKM) values of RNA sequencing data were converted into transcripts per kilobase million (TPM) values. The Robust Multi-array Average (RMA) algorithm was implemented for quantile normalization, background correction, and log2 transformation of microarray data derived from the Affymetrix platform through the R package ‘affy’ [12]. The data relating to copy number variations (CNVs) and DNA methylation information in the TNBC cohort were all acquired from the TCGA database.

Acquisition of purified immune cell lineage cohorts

Transcriptome information for all the 115 purified cell lineages of 19 primary immune cell categories was available and analyzed through the Affymetrix Human Genome U133 Plus 2.0 Array platform with 16 datasets, which included GSE27291 (T gamma delta), GSE27838 (NK activated), GSE28490 (B cell activated), GSE13906 (T gamma delta), GSE23371 (Immature dendritic cells), GSE25320 (Mast cells activated), GSE28698 (Eosinophils), GSE28726 (NKT activated, CD4 T cell resting, CD4 T cell resting, CD4 T cell activated), GSE49910 (Neutrophils, Monocytes, CD8 T cell resting, CD8 T cell activated, CD4 T cell activated, B cell activated), GSE51540 (T helper 17), GSE59237 (Dendritic cells resting, Dendritic cells activated), GSE37750 (Plasmacytoid dendritic cells), GSE39889 (Neutrophils), GSE42058 (Myeloid dendritic cells), GSE6863 (Immature dendritic cells), GSE8059 (NK resting, NK activated). This part of the data was further collected and processed according to the previous study [13].

MLIIC signature construction

An MLIIC signature was determined by a comprehensive analysis of purified immune cells, TNBC cell lines, and TNBC solid tumor tissues using an innovative computational framework according to a series of sequential ML algorithms. The flowchart is outlined below (Fig. 1).

Fig. 1
figure 1

The computational framework for constructing the MLIIC signature. The top 15% expressed RNAs were adopted for candidate immune-related RNAs for each immune cell line. TSI was a widely used index to assess gene expression level relationships in tissue specification. TSI was applied to calculate the expression specificity of candidate immune-related RNAs for each cell type. The highly expressed RNAs in all immune cell types were identified as igRNA. igRNAs were believed to have high specificity in all immune cell types. igRNAs significantly upregulated in immune cell lines and downregulated in TNBC cell lines were identified as IIC-RNAs. IIC-RNAs were believed to be specific for immune cell lines and unspecific for TNBC cell lines, which were used as the input for ML-based classification and dimensionality reduction. Six ML algorithms for classification were utilized to determine potentially valuable IIC-RNAs. Univariate Cox regression analysis was further performed to screen out IIC-RNAs with prognostic features. Three ML algorithms for survival were taken to identify more valuable IIC-RNAs that were used as the input for signature construction. The MLIIC signature was eventually constructed according to RSF scoring with the best performance among 19 ML algorithms for scoring. The relationship between MLIIC signature, prognosis, biological function, tumor immune microenvironment, genome alternations, chemotherapeutic drug, and the immunotherapeutic response was thoroughly explored in the subsequent validation session. Finally, the MLIIC signature was verified using the LUAD tissue chip

  1. (1)

    The highest 15% expressed RNAs of each immune cell line were extracted as potential screening immune-related RNAs because of the appropriate number of genes for the downstream analysis.

  2. (2)

    The tissue specificity index (TSI) suggested by Yanai et al. [14], a widely used index to assess gene expression level relationships in tissue specification, was employed to determine the specificity of the above candidate RNA expression per immune cell.

    $$TSI = \frac{{\sum\nolimits_{i = 1}^N {\left( {1 - {x_i}} \right)} }}{{N - 1}}$$

    where N represents the comprehensive count encompassing a variety of immune cell types, \({x}_{i}\) indicated the normalized expression level of immune cell type i relative to the highest RNA expression intensity detected across all immune cell types. The TSI spans a range from 0 to 1. A TSI value of 0 designates the RNA as broadly present across immune cells, while a TSI value of 1 indicates the RNA’s exclusive presence in a specific immune cell types. RNAs exhibiting robust expression across all immune cell types have been verified as immune-related generic RNAs (igRNAs) as immune-related generic RNAs (igRNAs). igRNAs were believed to have high specificity in all immune cell types.

  3. (3)

    Differentially expressed igRNAs that showed a pattern of both upregulations in multiple immune cell lines and downregulation in TNBC cell lines were determined as IIC-RNAs obtained through the R package ‘limma’ [15]. IIC-RNAs were believed to be specific for immune cell lines and unspecific for TNBC cell lines, which were used as the input for ML-based classification and dimensionality reduction.

  4. (4)

    A total of 6 ML algorithms for classification were further used for downscaling, including least absolute shrinkage and selection operator regularised logistic regression (LassoLR), Boruta, Xgboost, support vector machine (SVM), Random Forest (RF), and prediction analysis for microarrays (Pamr). This step aimed to filter worthwhile IIC-RNAs by extracting the intersected IIC-RNAs identified by 6 ML algorithms for classification.

  5. (5)

    IIC-RNAs with prognostic potential were then screened in the TCGA TNBC dataset using univariate Cox regression analysis with the threshold of P < 0.05 and were used as the input for signature construction.

  6. (6)

    Next, 3 ML algorithms for survival, including Random Survival Forest (RSF), least absolute shrinkage and selection operator regularized Cox regression (LassoCox), and CoxBoost, were subsequently applied to assess the significance of the prognostic IIC-RNAs and conduct the dimensionality reduction accordingly.

  7. (7)

    19 ML algorithms for scoring, including RSF, Ranger, ObliqueRSF, recursive partitioning and regression trees (Rpart), CoxPH, SurvivalSVM, CoxBoost, gradient boosting with component-wise linear models (GlmBoost), supervised principal components (SuperPC), StepwiseCox, elastic net regression (Enet), LassoCox, conditional random forests (CForest), akritas conditional non-parametric survival estimator (Akritas), gradient boosting with regression trees (BlackBoost), partial least squares regression for cox models and related techniques (PlsRcox), regression for a parametric survival model (SurvReg), generalized boosted regression models (GBM), and conditional inference trees (CTree), were used to determine the most reliable model.

  8. (8)

    The MLIIC signature was established according to the prognostic IIC-RNAs via performing the RSF algorithm. Log-rank score test survival trees was executed by a previously established methodology [16]. Firstly, the x-variable x was supposed to be ranked as x1 ≤ x2 ≤ … ≤ xn. Then, the “ranks” of every survival time Tj (j [1, …, n]) were calculated. The specific equation utilized was given below:

    $${a_j} = {\delta _j} - \sum\nolimits_{k = 1}^{\Gamma j} {\frac{{{\delta _k}}}{{n - {\Gamma _k} + 1}}}$$

    where \({\Gamma _k} = \# \left[ {t:{T_t} \le {T_k}} \right]\) and \({\Gamma _j}\) represented the index of the order for Tj. The log-rank score test was presented below:

    $$MLIIC\,signature\, = \,S\,\left( {x,\,c} \right) = \frac{{\sum\nolimits_{xk \le c} {\left( {{a_j} - {n_l}\bar a} \right)} }}{{\sqrt {{n_l}\left( {1 - \frac{{{n_l}}}{n}} \right)S_a^2} }}$$

    where \(\overline{a}\) and \({s}_{a}^{2}\) represented the sample mean and sample variance of [aj : j = 1,. . ., n], respectively. The measure of node separation was determined utilizing log-rank score splitting by | S (x, c) |. The best split was achieved by maximizing this value over x and c.

Annotation of immune-related characteristics for the MLIIC signature

A detailed description of this part is provided in the supplementary file [17,18,19,20,21,22,23,24,25,26,27].

Predictive value of the MLIIC signature for immunotherapy response

A detailed description of this part is provided in the supplementary file [28,29,30,31,32,33,34,35,36,37,38,39].

Drug susceptibility prediction

A detailed description of this part is provided in the supplementary file [40,41,42,43].

Multi-omics alteration characteristics of the MLIIC signature score

A detailed description of this part is provided in the supplementary file [44].

Multiplex immunofluorescence (IF) staining in TNBC samples

We obtained the tissue microarray from the Outdo Biotech company (BRC1603, Shanghai, China) and the ethics was approved. Next, after a series of chip sample processing procedures, the corresponding primary antibodies were used for incubation in the two groups, including anti-MEI1 (Mouse, sc-515359, Santa Cruz, United States), anti-HMX1 (Rabbit, orb184221, biorbyt, United Kingdom), anti-VIM (Rabbit, 10366-1-AP, Proteintech, China), and anti-MORN3 (Rabbit, PA5-58506, ThermoFisher, United States) antibodies. Afterward, the incubation and tyramide signal amplification (TSA) (FITC-TSA, CY3-TSA, 594-TSA, 647-TSA, Servicebio, China) in the microarray were performed using relevant secondary antibodies (GB23301, GB23303, Servicebio, China) that were conjugated to the fluorophore. Besides, the nuclei were stained with 4’,6-Diamidino2-phenylindole dihydrochloride (DAPI, 1:1,000, Beyotime, China). The IF digital images were acquired by using the Pannoramic Scanner (3D HISTECH, Hungary). The intensity quantifications were finally calculated to evaluate the expression feature of these stained biomarkers.

Statistical analysis

The samples were classified into subgroups depending on the cut-off threshold of our MLIIC signature ascertained by the R package ‘survminer’. To assess OS disparities between the two MLIIC signature groups, Kaplan-Meier survival plots were generated using the ‘survival’ R package. For each of the individual clinical variables, including the MLIIC features, the C-index of OS was calculated. Furthermore, the prognostic implications of the MLIIC signature were elucidated through time-dependent receiver operating characteristic (ROC) curves, facilitating by the ‘timeROC’ R package. In order to assess the variations between two groups and multiple groups for continuous variables, we utilized the Wilcoxon rank sum test and the Kruskal-Wallis test, respectively. The correlation between two variables was calculated using the Spearman correlation analysis, and its significance was assessed using a two-sided hypothesis test. A P-value less than 0.05 was considered statistically significant. All P-values were two-sided. For all statistical analyses, they were conducted in the R project, version 4.1.2.


Identification of IIC-RNAs

To meticulously assess the immune cell-associated RNA, we first investigated 115 purified cell lineages containing 19 sorts of primary immune cells in 16 datasets by reviewing the literature from 2007 to 2022 (Fig. 1). A total of 5474 RNAs were extracted from each immune cell line, among the 15% most highly expressed, and used to screen for these related immune-related RNAs. The TSI scores of 5474 RNAs were calculated to determine the igRNAs that were ubiquitously represented in a total of 19 immune cell types. It is essential to mention that RNAs with low TSI scores present generally high expression patterns in diverse immune cell types, demonstrating the critical role of the immune effects. Subsequently, 2724 igRNAs were proved to be essential factors in the modulation of elemental immunity and were accompanied by a characteristic threshold of TSI < 0.2. Through analyzing these differential expressions of 2724 igRNAs, it was observed that 212 igRNAs were remarkably upregulated in 115 immune cell lines and downregulated in 20 TNBC cell lines depicted in Figure S1. Finally, we denoted these 212 igRNAs as the IIC-RNAs of TNBC.

Development of the MLIIC signature

A total of 6 ML algorithms for classification, encompassing LassoLR, Boruta, Xgboost, SVM, RF, and Pamr, were effectively deployed to discern nine noteworthy IIC-RNAs from the aformentioned screened igRNAs (Fig. 2A). The robustness of the identified IIC-RNAs’ prognostic potential of the OS of TNBC patients was further substantiated through univariate Cox proportional hazards regression. Remarkably, this analysis unveiled four pivotal IIC-RNAs within the TCGA dataset (Fig. 2B). Following this, we extended our inquiry by engaging three distinct survival-oriented ML algorithms, including LassoCox (Fig. 2C), CoxBoost (Fig. 2D), and RSF (Fig. 2E), all aimed at rigorously scrutinizing the efficacy of the four prognosis-related IIC-RNAs (Fig. 2F). The expression pattern of the 4 IIC-RNAs in immune cells, including meiotic defect 1 (MEI1), H6 family homeobox 1 transcription factor gene (HMX1), membrane occupation and recognition nexus repeat containing 3 (MORN3), vimentin (VIM) were represented in Figure S2A. Figure S2B shows the differences in 4 IIC-RNAs between immune and tumor cells, and Figure S2C demonstrates the relationship between 4 IIC-RNAs and the prognosis for TNBC. 19 ML algorithms for scoring were used to determine the most reliable signature (Fig. 2G). Then, we established an MLIIC signature according to the four prognostic IIC-RNAs by employing the RSF algorithm.

Fig. 2
figure 2

Development of the MLIIC signature based on ML. (A) Venn plot shows the intersected genes identified by six ML algorithms for classification. (B) Univariate Cox regression analysis of the screened nine intersected genes displayed via forest plot. (C) Dimension reduction of the ten prognostic genes by the CoxBoost algorithm. (D) Dimension reduction of the ten prognostic genes by the LassoCox algorithm. (E) Dimension reduction of the ten prognostic genes by RSF algorithm. (F) Venn plot shows the intersected prognostic genes identified by three ML algorithms for survival. (G) Performance of 19 ML algorithms for scoring in terms of signature construction. (H) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the TCGA, METABRIC, GSE96058, and GSE103091 datasets. (I) Time-dependent ROC curves of the MLIIC signature regarding 1-, 2-, 3-, 4-, and 5-year OS in the TCGA, METABRIC, GSE96058, and GSE103091 datasets

Prognostic value of the MLIIC signature

The characteristics of the four included cohorts are provided in Table S1. TNBC patients with high MLIIC signature scores had decreased OS time in the TCGA TNBC, METABRIC, GSE96058, and GSE103091 datasets (Fig. 2H). Besides, TNBC patients with high MLIIC signature scores had decreased progression-free survival (PFS) (Figure S3A) and disease-specific survival (DFS) (Figure S3B) time in the TCGA TNBC dataset. Concordantly, time-dependent ROC curves were generated for 1-, 2-,3-, 4-, and 5-year overall survival (OS) within the TCGA TNBC dataset, yielding respective AUC values of 0.854, 0.902, 0.904, 0.895, and 0.908. Similarly, the METABRIC dataset exhibited AUC values of 0.748, 0.718, 0.767, 0.776, and 0.766 for the corresponding time intervals. Corresponding results were observed for the GSE96058 dataset, with AUC values of 0.717, 0.616, 0.677, 0.697, and 0.621, and the GSE103091 dataset, with AUC values of 0.683, 0.725, 0.706, 0.739, and 0.721. There outcomes collectively reinforce the prognostic significance of the MLIIC signature (Fig. 2I).

Comparison of prognostic value between the MLIIC signature and previous signatures

In the TCGA dataset, the MLIIC signature was markedly related to survival status, tumor stage, and TNM staging system (Fig. 3A). Besides, the MLIIC signature indicated the prognostic potential for superior accuracy over age, gender, TNM staging system, and the C-index of tumor staging in the TCGA dataset (Fig. 3B). Due to the rapid advancements in omics technologies, numerous studies have been reported to construct and analyze signatures based on specific gene combinations with promising predictive efficacy. We then aimed to systematically compare these relevant signatures with our MLIIC signature in the past decade. After a detailed investigation, we included a total of 148 signatures in terms of RNA signatures (Table S2). It should be noted that the MLIIC signature exhibited superior performance with respect to the C-index in the TCGA TNBC (Fig. 3C), METABRIC (Fig. 3D), GSE96058 (Fig. 3E), and GSE103091 (Fig. 3F) datasets, compared to nearly all of the previous models. Moreover, the MLIIC signature was compared with the MAPS signature and CMPS signature [45] with respect to the C-index in LumA, LumB, HER2, and Basel subtypes of BRCA in the TCGA dataset (Figure S3C).

Fig. 3
figure 3

The superior predictive capability of MLIIC signature for TNBC prognosis. (A) Univariate and multivariate Cox regression analysis of OS of individual clinical variables with and without MLIIC signature in the TCGA dataset displayed via forest plot. (B) Bar plot shows the C-index of the MLIIC signature and various clinical factors in the TCGA, METABRIC, GSE96058, and GSE103091 datasets. (C) The C-index of the MLIIC signature and other published models developed in the TCGA, METABRIC, GSE96058, and GSE103091 datasets displayed via forest plot

Prediction of biological mechanisms associated with MLIIC signature

Considering the upregulation of immune-related features shown in the low-MLIIC group, we preferred to unearth the underlying biological mechanisms. The MLIIC signature score prominently displayed a robust inverse correlation with a plethora of immunologic pathways, encompassing the B cell receptor signaling pathway, T cell receptor signaling pathway, T cell-mediated immunity, the initiation of the immune response, as well as the intricate process of antigen processing and presentation (Fig. 4A). Significant differences in the immunological pathways in two MLIIC signature score groups were further proved by t-distributed stochastic neighbor embedding (t-SNE) (Fig. 4B). The chromosome distribution of genes significantly correlated with the MLIIC signature score is shown in Fig. 4C. The genes significantly associated with the MLIIC signature score were enriched in immune infiltration and activation pathways via Metascape (Fig. 4D). In GSEA of GO and KEGG terms, the low MLIIC signature group showed enrichment of the T cell receptor signaling pathway and B cell receptor signaling pathway as expected (Fig. 4E). Taken together, our results revealed that a low MLIIC signature represented a potency of superior immune response under immunotherapy.

Fig. 4
figure 4

Biological peculiarities of the MLIIC signature score in the TCGA dataset. (A) MsigDB-based GSVA analysis delineated the biological attributes of two MLIIC signature score groups displayed via heat map. (B) t-SNE plot of GO and KEGG terms delineated the differences in pathway activity in two MLIIC signature score groups. (C) Chromosome distribution of genes significantly correlated with MLIIC signature score. (D) Metascape-based enrichment analysis of genes significantly associated with MLIIC signature score. (E) GSEA of GO and KEGG terms for the MLIIC signature score

Immune characteristics related to the MLIIC signature

Then, to delve into the immune status represented by MLIIC features, the subsequent focus was on the relationship between MLIIC features and immune infiltrating cells and immune modulators. As shown in Fig. 5A and B, in the TCGA dataset, the lower MLIIC signature subgroup exhibited higher levels of immune infiltrating cells and modulators, confirming an inflammatory but comparatively immunopromoted TME, potentially contributing to the choices of immunotherapy. MLIIC signature was significantly and negatively correlated with several classical immune checkpoints (Figure S4A). Additionally, we further analyzed the status of TMB, MSI, CYT, GEP, APM score, IFN-γ, TCR Richness, TCR Shannon, and CD8, which were associated with a stronger immunoreactive TME between the high and low score group. Related results suggested that all the indicators were at high levels in the low-MLIIC group from the TCGA dataset (Fig. 5C-5F, S4B-S4G).

Fig. 5
figure 5

Immune-related characteristics of the MLIIC signature in the TCGA dataset. (A) Heat map presented the correlation between the MLIIC signature and immune infiltrating cells. (B) From left to right: mRNA expression; expression versus methylation; amplification frequency; and the deletion frequency for 75 immunomodulators genes by MLIIC signature groups displayed by heat map. (C) Box plot showed the TMB levels between two MLIIC signature score groups. (D) Box plot showed the MSI levels between two MLIIC signature score groups. (E) Box plot showed the CYT levels between two MLIIC signature score groups. (F) Box plot showed the GEP levels between two MLIIC signature score groups

The MLIIC signature is a predictive indicator of immunotherapy response

On account of the superior predictive capability of our MLIIC signature towards the benefit of immunotherapy, we performed validation of its effectiveness in several immunotherapy datasets.

In the GSE35640 (Fig. 6A) and GSE91061 (Fig. 6B), melanoma patients with low MLIIC signature scores had significantly and even increased survival periods (Fig. 6A). In the GSE78220 dataset, we also observed that melanoma patients with low MLIIC signature scores had significantly longer survival times (Fig. 6C) and were able to respond positively to anti-PD-1 immunotherapy (Fig. 6D). In the Van Allen dataset, melanoma patients with low MLIIC signature scores experienced increased survival times (Fig. 6E). As predicted, patients suffering from melanoma with low MLIIC signature scores appeared to be responsive to anti-CTLA-4 immunotherapy (Fig. 6F). In the Nathanson dataset, consistent with previous trends, melanoma patients with low MLIIC signature scores survived longer (Fig. 6G) and frequently responded to anti-CTLA-4 immunotherapy (Fig. 6H). In the IMvigor dataset, urothelial carcinoma patients with low MLIIC signature scores had significantly and even increased survival periods (Fig. 6I). Moreover, urothelial carcinoma patients with low MLIIC signature scores were more inclined to be responsive to anti-PD-L1 immunotherapy (Fig. 6J). In the Braun dataset, patients suffering from renal cell carcinoma with low MLIIC signature scores experienced increased survival time (Fig. 6K), which was consistent with a trend toward their response to anti-PD-1 immunotherapy (Fig. 6L). Furthermore, patients in the GSE179351, including those with colorectal adenocarcinoma (Fig. 6M) and pancreatic adenocarcinoma (Fig. 6N) dataset with low MLIIC signature scores, were also more likely to respond to immunotherapy. Of note, patients with low MLIIC signature scores were more likely to react to targeted therapy in the GSE165252 (esophageal adenocarcinoma) (Fig. 6O) and GSE103668 (TNBC) (Fig. 6P) datasets. Based on the TIDE algorithm, a low MLIIC signature score correlated significantly with the responses of immune checkpoint inhibitors (ICIs) in the TCGA dataset (Fig. 6Q). According to the Submap analysis, a low MLIIC signature score predicted an association with anti-PD-1 immunotherapy responses in the TCGA dataset (Fig. 6R).

Fig. 6
figure 6

Predictive value of the MLIIC signature in immunotherapy response. (A) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE36540 dataset. (B) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE91061 dataset. (C) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the GSE78220 dataset. (D) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE78220 dataset. (E) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the Van Allen dataset. (F) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the Van Allen dataset. (G) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the Nathanson dataset. (H) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the Nathanson dataset. (I) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the IMvigor dataset. (J) Box plot displayed the levels of the MLIIC signature score in patients with different immunotherapy responses in the IMvigor dataset. (K) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the Braun dataset. (L) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the Braun dataset. (M-N) Box plot displayed the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE179351 dataset. (O) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE165252 dataset. (P) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE103668 dataset. (Q) Contingency table of the two MLIIC signature score groups and immunotherapy responses based on TIDE algorithm. (R) Heat map of the two MLIIC signature score groups and anti-PD-1/anti-CTLA-4 immunotherapy responses based on submap analysis

Prediction of drug response related to the MLIIC signature score

A lower CMap score indicates a greater likelihood of the drug reversing the molecular attributes of the disease as per the CMap theory. Remarkably, arachidonyltrifluoromethane exhibits the lowest CMap score, suggesting its potential efficacy in treating TNBC patients with a prominent MLIIC signature score (Fig. 7A). Noteworthy findings also include Afatinib, displaying significantly heightened drug sensitivity within the high MLIIC characteristic score cohort (Fig. 7B). Furthermore, a compelling trend emerges with CTRP-derived tinifarnih-P1 (Fig. 7C) and PRISM-derived tucatinib (Fig. 7D), both showcasing robust negative correlation with the MLIIC signature score. Notably, these two drugs exhibit markedly improved drug sensitivity within the high MLIIC signature score subset.

Fig. 7
figure 7

The drug responses between two MLIIC signature score groups. (A) CMap-based drug prediction. (B) Box plot shows the GDSC-based drug prediction. (C) Dot plot and box plot show the CTRP-based drug prediction. (D) Dot plot and box plot shows the PRISM-based drug prediction

Multi-omics alteration characteristics related to the MLIIC signature score

In the context of our study, we noted varying patterns of chromosomal alterations across two distinct groups based on the MLIIC signature scores (Fig. 8A). The precise genomic regions displaying alterations are visually depicted in Fig. 8B. Remarkably, the cohort withelevated MLIIC signature scores exhibited pronounced chromosomal instability, prominently showcasing aberrations in FGA, FGG, and FGL (Fig. 8C).

Fig. 8
figure 8

Multi-omics alteration characteristics of the MLIIC signature score in the TCGA dataset. (A) GISTIC 2.0-based chromosome amplifications and deletions in two MLIIC signature score groups. (B) Waterfall plot shows the genomic alteration landscape in two MLIIC signature score groups. (C) Box plot shows the fraction of genome alteration, the fraction of genome gained, and the fraction of genome lost in two MLIIC signature score groups

The superior predictive capability of MLIIC signature for TNBC prognosis in the TNBC tissue array

Lastly, to more sufficiently verify the accuracy, we used multiplex IF staining for the key indicators MEI1, HMX1, VIM, and MORN3 of the MLIIC signature, in the external cohort of the TNBC tissue microarray. The staining intensities of these four IIC-RNAs were analyzed semi-quantitatively, and the score of the MLIIC signature was generated. The captured representative IF images of MEI1, HMX1, VIM, and MORN3 in two MLIIC signature score subgroups were demonstrated in Fig. 9A. The Kaplan-Meier survival curves indicated that those TNBC patients with low MLIIC signature scores exhibited a markedly higher survival state (Fig. 9B). Furthermore, the Time-dependent ROC curves of the MLIIC signature for 3-, 5-, and 10-year OS assessment (0.679, 0.706, and 0.713, respectively) substantiated the prognostic potential of our MLIIC signature (Fig. 9C). In addition, the C-index of our signature and the focused clinical factors were illustrated in Fig. 9D, which suggested that this MLIIC signature displayed certain superiority versus the C-index.

Fig. 9
figure 9

The superior predictive capability of MLIIC signature for TNBC prognosis in the TNBC tissue array. (A) Representative multiplex IF images of MEI1, HMX1, VIM, and MORN3 in two MLIIC signature score groups. (B) Kaplan-Meier survival curves of the MLIIC signature regarding OS. (C) Time-dependent ROC curves of the MLIIC signature regarding 3-, 5-, and 10-year OS. (D) Bar plot shows the C-index of the MLIIC signature and focused clinical factors

Microenvironment characteristics related to the MLIIC signature

m6A epigenetic regulators, including ALKBH5, FMR1, FTO, HNRNPA2B1, HNRNPC, IGF2BP1, IGF2BP2, IGF2BP3, METTL14, NPLOC4, RBM15, RBM15B, RBMX, WTAP, YTHDC1, YTHDC2, YTHDF2, YTHDF3, and ZC3H13, were significantly highly expressed in the low MLIIC signature score group (Figure S5A). This promising result also corroborated that MLIIC is a credible immune microenvironmental trait in relation to the mRNA methylation that has been proven to control T cell homeostasis. MLIIC signature was further found to negatively correlate with CYT, TLS, IFN-γ, chemokines, Roh_IS, Davoli_IS, Ayes_expIS, and T cell inflamed signature (Figure S5B).


TNBC is a highly complicated, heterogeneous, relapsed BC type characterized by a high propensity to metastasize, poor outcomes, and a lack of treatment targets. TNBC no longer relies on a single traditional treatment but gradually evolves towards classification and precision treatment. Some emerging regimens, including PARP1 inhibitor, androgen receptor (AR) inhibitor, PI3K inhibitor, and especially ICIs, as well as their optimized combinations, are currently under clinical investigation [46]. ICI treatment produces a long-lasting and complete tumor regressive effect in some TNBC patients but a temporary, partial, or no response in some patients [47]. The highly heterogeneous and complex immune status of TME causes the clinical outcome of TNBC to be unpredictable. This is an intriguing issue for establishing well-defined indicators to determine prognosis and immunotherapy responsiveness.

ML has been proposed as a promising resource in many domains of medicine for classifying and predicting patient outcomes. Its model prognosis depends on the complex, multidimensional, and nonlinear relationships between patient tumor malignancy, surgical treatment, and drug therapy [48]. The analysis of immune infiltration, including the quantity and quality of TILs, the integrated evaluation of TIL and other clinical markers, and the use of ML platforms for a more comprehensive characterization of immune features in breast cancer, are worthy of in-depth evaluation [49]. Here, using 25 comprehensive and sequential ML algorithms, we successfully constructed an immune-related MLIIC signature of TNBC based on four finally screened RNAs, including MEI1, MORN3, HMX1, and VIM. MEI1 participants in homologous recombination in meiosis during mammalian spermatogenesis.

Moreover, polymorphic alleles of the human MEI1 gene have been confirmed to be relevant to human azoospermia through meiotic arrest [50]. In oncology research, only one report has shown by sequencing studies that MEI1 is one of the vital differential genes in human papillomavirus (HPV) + vs. HPV- tumors. It may be involved in the prognosis of cervical cancer [51]. MORN3 is known to be a protein that affects cardiac function and tumor progression. At the same time, Morncide, the targeting peptide of Morn3, possesses the ability to curb tumor growth by activating the p53 pathway [52]. The role of MORN3 in tumors has also rarely been studied and is only present in one bioinformatic model construction. However, this study included a relatively large number of 16 genes and did not sufficiently consider the predictive capability of immunotherapy [53]. HMX1 is a critical orchestrator in developing craniofacial structures, ocular defects, and morphological abnormalities of the outer ear [54, 55]. HMX1 is also involved in the activity of NKL homologous frame genes in normal and malignant bone marrow cells [56]. Vimentin (VIM) is a typical mesenchymal marker with abnormal methylation expression patterns in breast and prostate cancer [57]. Natarajan et al. demonstrated that the VIM gene network modulated various neoplastic processes, such as adipogenesis, senescence, and autophagy [58]. Overall, all four genes are now less reported and even less in oncology. Besides, we also performed a semi-quantitative validation of these indicators by multiple IF staining in an external cohort validation with TNBC tissue microassay. Our results presented that these IIC-RNAs have different patterns of high or low expression in high- or low-risk groups, predicting that they exert diverse functional properties in TNBC tumors.

On the one hand, this indicates the specificity and novelty of the genes involved in our MLIIC signature, accompanied by high expression in immune cells and low expression in TNBC. Meanwhile, it also reflects that subsequent studies must explore the relative gaps in the functions associated with these four genes. In terms of the prognosis capability, the superiority of 1-, 2-, 3-, 4-, and 5-year OS prediction of MLIIC signature were also confirmed, compared to 148 previous published signatures with respect to the C-index in all included databases.

TNBC is characterized by higher tumor mutational load, tumor-infiltrating T cell levels, and PD-L1 RNA expression versus other breast cancer subtypes. These immunogenic features make TNBC a potential population for immunotherapy. For monitoring responses to tumor immunotherapy, PD-L1, tumor-associated antigens (TAAs), BRCA1/2 mutation, and ctDNA have been confirmed to be effective candidates, among which PD-L1 was among the first applied biomarkers. Notably, IMpassion130 was the first study to successfully introduce immunotherapy into TNBC, in which PD-L1 was used as a well-established indicator of immunotherapy in metastatic TNBC [59]. However, this indicator had a narrow application scope and could not assure whether the PD-L1 positive cohort was the population that could benefit from ICI in early TNBC. Inconsistent findings suggested that PD-L1 is by far the best, but not perfect, predictive biomarker for ICI efficacy. Therefore, identifying predictive biomarkers and the corresponding signature strategies for precision immunotherapy are essential concerns for the current TNBC treatment. In our study, we found that this MLIIC signature with a low score indicated a more robust immune effector cell activation and more benefits from immune therapy. Specifically, the lower MLIIC signature subgroup possessed a higher abundance of TILs, immune modulators, TMB, MSI, CYT, GEP, and IFN-γ and showed a distinct enriched signaling pathway for immune response, T and B cell receptor, antigen processing and presentation, and PD-L1/PD-1 checkpoint. Our signature is effectively capable of reflecting a wealth of information regarding alteration in immune cell infiltration and related pathways, presenting more insight into tumor immunity than reported studies.

BC is characterized by insufficient T-lymphocyte infiltration and is known as an immune “cold” tumor, resulting in an inadequate response to immunotherapy, including PD-1/PD-L1 inhibitors. Our study shows that TNBC has high low-risk immune infiltration, characterized by increased infiltration of cancer-suppressing immune cells and high levels of cancer-suppressing immune cells. The reasons for this phenomenon may have some relevance to the physical structure of breast tissue, immune cell abundance, tumor heterogeneity, age, hormone levels, and the included cohort. Firstly, differences in the immune microenvironment across cancer species are driven by the organ-specific structural and molecular characteristics of the different tissues. Different angiogenesis, immune cell distribution, and stem cell activity exist in sterile tissues (pancreas, brain), filtering and metabolic tissues (liver, kidney), environmental interface tissues (skin, lung, intestine), and body surface tissues (breast). When tumorigenesis occurs, immune cell infiltration and homing are influenced by tissue location-specific factors. For example, in gliomas and RCC, where there is a high degree of microangiogenesis and astrocytes, the abundance and functional status of immune cell infiltration are specific, while in TNBC, the TME formed by a large number of adipocytes surrounded by a large number of adipose cells is primarily influenced by the regulation of crosstalk and changes in paracrine signaling between adipose tissue and cancer cells.

Moreover, adipogenesis in TNBC is associated with cancer metabolism and unfavorable tumor immune microenvironment, consequently affecting immunotherapy. Secondly, as previously mentioned, TNBC is highly heterogeneous and can be classified into different entities based on various immune, metabolic, pathological, and other molecular characteristics. We note that in terms of immune stratification, Shao et al. classified TNBC into three types: immune desert type, immune inactivation type, and inflammatory immune type [60]. Among them, the immune desert-type microenvironment has a low tumor infiltration rate, fails to attract immune cells, and is associated with MYC gene amplification. The immunodepleted type is chemotactic, but innate immune inactivation and a low amount of tumor antigen may contribute to immune escape. In contrast, the immune-inflammatory type highly expresses immunecheckpoint molecules. This implies that certain specific cohorts of TNBC subtypes present a high abundance of infiltrative immune features. Besides, the expression of oncoproteins, cancer cell proliferation, progression, and metastasis are all increased by the frequent upregulation of m6A epigenetic regulators in human cancer tissues from a variety of organ origins [61]. A recent study reported that by targeting the IL-7/STAT5/SOCS pathways, m6A mRNA methylation regulates T cell homeostasis [62]. In our study, m6A epigenetic regulators were significantly highly expressed in the low MLIIC signature score group, further indicating the immune active environment in TNBC patients with low MLIIC signature scores.

Although the predictive efficacy of MLIIC signature has been preliminarily documented in TNBC, we remain concerned that several issues remain to be addressed. Firstly, this study is essentially a retrospective study using previous data, and its actual value needs to be further corroborated using real-world information. MLIIC signature is also in urgent need of validation in a prospective immunotherapy cohort in terms of predicting the effects of immunotherapy. In addition, TNBC belongs to a subtype of malignant BC and possesses its complex classification, including the PAM50 type and six categories based on molecular characteristics. More accurate molecular typing can guide the prognosis and treatment selection of TNBC. The MLIIC signature in this study predicts overall TNBC, but no in-depth exploration of the subdivided TNBC subtypes was made. This is of extremely high value in the diagnosis and treatment of TNBC. Therefore, we would likely construct more accurate diagnostic signatures for different TNBC subtypes in the follow-up. Finally, the MLIIC signature score is closely related to multiple immune cells, tumor process pathways, and other biological mechanisms. In this regard, the key molecules and interaction networks also deserve further profound study in molecular biology. Finally, it is essential to note that our signature is currently an efficient tool that reflects a wide range of treatment information but is not a replacement for the gold standard diagnosis and treatment method. The combination of MLIIC signature and conventional tools is exciting and promising.


Taken together, based on 25 ML algorithms, we successfully constructed a robust and reliable MLIIC signature for predicting TNBC prognosis, mutation, biological function, drug responsiveness, immune infiltration, and immunotherapy responsiveness. Notably, the TNBC patients with a low score of MLIIC signature possessed a superior prognosis, a more active immune microenvironment, and a more substantial immunotherapeutic effect. Our well-established MLIIC signature provides an effective tool to guide TNBC prognosis determination and treatment stratification management.

Data Availability

All data used in this work can be acquired from the Gene Expression Omnibus (GEO;, the Cancer Genome Atlas (TCGA) datasets (, the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC;, and so on.



Androgen receptor


Antigen processing and presenting machinery


Area under the curve


Cancer-associated adipocytes


Cancer-associated fibroblasts


Cancer Cell Line Encyclopedia project


Copy number variations


Cytotoxic activity


Dendritic cells


Disease-free survival


Estrogen receptor-negative


Estimation of STromal and Immune cells in MAlignant Tumours using Expression data


Fragments per kilobase million


Gene Expression Omnibus


T cell-inflamed gene expression profile


Gene Ontology


Gene set enrichment analysis


Gene set variation analysis


H6 family homeobox 1 transcription factor gene


Half maximal inhibitory concentration


Human epidermal growth factor receptor 2


Human papillomavirus


Immune-related generic RNAs


Immune checkpoint inhibitors


Immune infiltrating cell


Interferon γ


Kyoto Encyclopedia of Genes and Genomes


Machine learning


ML algorithms based IIC associated signature


Meiotic defect 1


Membrane occupation and recognition nexus repeat containing 3


Microenvironment Cell Populations-counter


Microsatellite instability


Molecular Signatures Database


Molecular Taxonomy of Breast Cancer International Consortium


Overall survival


Pathological complete response


Progression-free survival


Progesterone receptor-negative


Robust Multi-array Average


Receiver operating characteristic


Single-cell gene set enrichment analysis


Subnetwork Mappings in Alignment of Pathways


T cell receptor


The Cancer Genome Atlas


Tissue specificity index


T-distributed stochastic neighbor embedding


Transcripts per kilobase million


Triple-negative breast cancer


Tumor Immune Dysfunction and Exclusion


Tumor Immune Estimation Resource


Tumor microenvironment


Tumor mutation burden


Tumor-associated antigens


Tumor-associated macrophages


Tumor-infiltrating immune cell-associated


Tumor-infiltrating lymphocytes




  1. Bianchini G, Balko JM, Mayer IA, Sanders ME, Gianni L. Triple-negative breast cancer: challenges and opportunities of a heterogeneous disease. Nat Rev Clin Oncol. 2016;13(11):674–90.

    Article  Google Scholar 

  2. Bianchini G, De Angelis C, Licata L, Gianni L. Treatment landscape of triple-negative breast cancer - expanded options, evolving needs. Nat Rev Clin Oncol. 2022;19(2):91–113.

    Article  Google Scholar 

  3. Wang Y, Zhang H, Liu C, Wang Z, Wu W, Zhang N, Zhang L, Hu J, Luo P, Zhang J, Liu Z, Peng Y, Liu Z, Tang L, Cheng Q. Immune checkpoint modulators in cancer immunotherapy: recent advances and emerging concepts. J Hematol Oncol. 2022;15(1):111.

    Article  Google Scholar 

  4. Anurag M, Jaehnig EJ, Krug K, Lei JT, Bergstrom EJ, Kim BJ, Vashist TD, Tran Huynh AM, Dou Y, Gou X, Huang C, Shi Z, Wen B, Korchina V, Gibbs RA, Muzny DM, Doddapaneni H, Dobrolecki LE, Rodriguez H, Robles AI, Hiltke T, Lewis MT, Nangia JR, Nemati Shafaee M, Li S, Hagemann IS, Hoog J, Lim B, Osborne CK, Mani DR, Gillette MA, Zhang B, Echeverria GV, Miles G, Rimawi MF, Carr SA, Ademuyiwa FO, Satpathy S, Ellis MJ. Proteogenomic markers of chemotherapy resistance and response in triple negative breast cancer. Cancer Discov. 2022.

    Article  Google Scholar 

  5. So JY, Ohm J, Lipkowitz S, Yang L. Triple negative breast cancer (TNBC): non-genetic tumor heterogeneity and immune microenvironment: emerging treatment options. Pharmacol Ther. 2022;237:108253.

    Article  Google Scholar 

  6. Huertas-Caro CA, Ramirez MA, Gonzalez-Torres HJ, Sanabria-Salas MC, Serrano-Gomez SJ. Immune Lymphocyte infiltrate and its Prognostic Value in Triple-Negative breast Cancer. Front Oncol. 2022;12:910976.

    Article  Google Scholar 

  7. Xu Z, Wang X, Zeng S, Ren X, Yan Y, Gong Z. Applying artificial intelligence for cancer immunotherapy. Acta Pharm Sin B. 2021;11(11):3393–405.

    Article  Google Scholar 

  8. Zaitsev A, Chelushkin M, Dyikanov D, Cheremushkin I, Shpak B, Nomie K, Zyrin V, Nuzhdina E, Lozinsky Y, Zotova A, Degryse S, Kotlov N, Baisangurov A, Shatsky V, Afenteva D, Kuznetsov A, Paul SR, Davies DL, Reeves PM, Lanuti M, Goldberg MF, Tazearslan C, Chasse M, Wang I, Abdou M, Aslanian SM, Andrewes S, Hsieh JJ, Ramachandran A, Lyu Y, Galkin I, Svekolkin V, Cerchietti L, Poznansky MC, Ataullakhanov R, Fowler N, Bagaev A. Precise reconstruction of the TME using bulk RNA-seq and a machine learning algorithm trained on artificial transcriptomes. Cancer Cell. 2022;40(8):879–894e816.

    Article  Google Scholar 

  9. Zhang N, Zhang H, Liu Z, Dai Z, Wu W, Zhou R, Li S, Wang Z, Liang X, Wen J, Zhang X, Zhang B, Ouyang S, Zhang J, Luo P, Li X, Cheng Q. An artificial intelligence network-guided signature for predicting outcome and immunotherapy response in lung adenocarcinoma patients based on 26 machine learning algorithms. Cell Prolif. 2023;56(4):e13409.

    Article  Google Scholar 

  10. Zhang H, Zhang N, Wu W, Zhou R, Li S, Wang Z, Dai Z, Zhang L, Liu Z, Zhang J, Luo P, Liu Z, Cheng Q. Machine learning-based tumor-infiltrating immune cell-associated lncRNAs for predicting prognosis and immunotherapy response in patients with glioblastoma. Brief Bioinform. 2022.

    Article  Google Scholar 

  11. Zhang N, Zhang H, Wu W, Zhou R, Li S, Wang Z, Dai Z, Zhang L, Liu F, Liu Z, Zhang J, Luo P, Liu Z, Cheng Q. Machine learning-based identification of tumor-infiltrating immune cell-associated lncRNAs for improving outcomes and immunotherapy responses in patients with low-grade glioma. Theranostics. 2022;12(13):5931–48.

    Article  Google Scholar 

  12. Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20(3):307–15.

    Article  Google Scholar 

  13. Sun J, Zhang Z, Bao S, Yan C, Hou P, Wu N, Su J, Xu L, Zhou M. Identification of tumor immune infiltration-associated lncRNAs for improving prognosis and immunotherapy response of patients with non-small cell lung cancer. J Immunother Cancer. 2020;8(1).

  14. Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, Bar-Even A, Horn-Saban S, Safran M, Domany E, Lancet D, Shmueli O. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005;21(5):650–9.

    Article  Google Scholar 

  15. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

    Article  Google Scholar 

  16. Hothorn T, Lausen B. On the exact distribution of maximally selected rank statistics. Comput Stat Data Anal. 2003;43(2):121–37.

    Article  MathSciNet  MATH  Google Scholar 

  17. Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, Porta-Pardo E, Gao GF, Plaisier CL, Eddy JA, Ziv E, Culhane AC, Paull EO, Sivakumar IKA, Gentles AJ, Malhotra R, Farshidfar F, Colaprico A, Parker JS, Mose LE, Vo NS, Liu J, Liu Y, Rader J, Dhankani V, Reynolds SM, Bowlby R, Califano A, Cherniack AD, Anastassiou D, Bedognetti D, Mokrab Y, Newman AM, Rao A, Chen K, Krasnitz A, Hu H, Malta TM, Noushmehr H, Pedamallu CS, Bullman S, Ojesina AI, Lamb A, Zhou W, Shen H, Choueiri TK, Weinstein JN, Guinney J, Saltz J, Holt RA, Rabkin CS, Cancer Genome Atlas Research N, Lazar AJ, Serody JS, Demicco EG, Disis ML, Vincent BG, Shmulevich I. The Immune Landscape of Cancer. Immunity. 2018;48(4):812–30. e814.

    Article  Google Scholar 

  18. Ayers M, Lunceford J, Nebozhyn M, Murphy E, Loboda A, Kaufman DR, Albright A, Cheng JD, Kang SP, Shankaran V, Piha-Paul SA, Yearley J, Seiwert TY, Ribas A, McClanahan TK. IFN-gamma-related mRNA profile predicts clinical response to PD-1 blockade. J Clin Invest. 2017;127(8):2930–40.

    Article  Google Scholar 

  19. Roh W, Chen PL, Reuben A, Spencer CN, Prieto PA, Miller JP, Gopalakrishnan V, Wang F, Cooper ZA, Reddy SM, Gumbs C, Little L, Chang Q, Chen WS, Wani K, De Macedo MP, Chen E, Austin-Breneman JL, Jiang H, Roszik J, Tetzlaff MT, Davies MA, Gershenwald JE, Tawbi H, Lazar AJ, Hwu P, Hwu WJ, Diab A, Glitza IC, Patel SP, Woodman SE, Amaria RN, Prieto VG, Hu J, Sharma P, Allison JP, Chin L, Zhang J, Wargo JA, Futreal PA. Integrated molecular analysis of tumor biopsies on sequential CTLA-4 and PD-1 blockade reveals markers of response and resistance. Sci Transl Med. 2017;9(379).

  20. Senbabaoglu Y, Gejman RS, Winer AG, Liu M, Van Allen EM, de Velasco G, Miao D, Ostrovnaya I, Drill E, Luna A, Weinhold N, Lee W, Manley BJ, Khalil DN, Kaffenberger SD, Chen Y, Danilova L, Voss MH, Coleman JA, Russo P, Reuter VE, Chan TA, Cheng EH, Scheinberg DA, Li MO, Choueiri TK, Hsieh JJ, Sander C, Hakimi AA. Tumor immune microenvironment characterization in clear cell renal cell carcinoma identifies prognostic and immunotherapeutically relevant messenger RNA signatures. Genome Biol. 2016;17(1):231.

    Article  Google Scholar 

  21. Li T, Fan J, Wang B, Traugh N, Chen Q, Liu JS, Li B, Liu XS. TIMER: a web server for Comprehensive Analysis of Tumor-Infiltrating Immune cells. Cancer Res. 2017;77(21):e108–10.

    Article  Google Scholar 

  22. Charoentong P, Finotello F, Angelova M, Mayer C, Efremova M, Rieder D, Hackl H, Trajanoski Z. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype Relationships and Predictors of response to checkpoint blockade. Cell Rep. 2017;18(1):248–62.

    Article  Google Scholar 

  23. Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, Selves J, Laurent-Puig P, Sautes-Fridman C, Fridman WH, de Reynies A. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218.

    Article  Google Scholar 

  24. Yoshihara K, Shahmoradgoli M, Martinez E, Vegesna R, Kim H, Torres-Garcia W, Trevino V, Shen H, Laird PW, Levine DA, Carter SL, Getz G, Stemke-Hale K, Mills GB, Verhaak RG. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612.

    Article  Google Scholar 

  25. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

    Article  Google Scholar 

  26. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523.

    Article  Google Scholar 

  27. Chin CH, Chen SH, Wu HH, Ho CW, Ko MT, Lin CY. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol 8 Suppl. 2014;4(Suppl 4):11.

    Article  Google Scholar 

  28. Ulloa-Montoya F, Louahed J, Dizier B, Gruselle O, Spiessens B, Lehmann FF, Suciu S, Kruit WH, Eggermont AM, Vansteenkiste J, Brichard VG. Predictive gene signature in MAGE-A3 antigen-specific cancer immunotherapy. J Clin Oncol. 2013;31(19):2388–95.

    Article  Google Scholar 

  29. Riaz N, Havel JJ, Makarov V, Desrichard A, Urba WJ, Sims JS, Hodi FS, Martin-Algarra S, Mandal R, Sharfman WH, Bhatia S, Hwu WJ, Gajewski TF, Slingluff CL Jr, Chowell D, Kendall SM, Chang H, Shah R, Kuo F, Morris LGT, Sidhom JW, Schneck JP, Horak CE, Weinhold N, Chan TA. Tumor and Microenvironment Evolution during Immunotherapy with Nivolumab. Cell. 2017;171(4):934–949e916.

    Article  Google Scholar 

  30. Hugo W, Zaretsky JM, Sun L, Song C, Moreno BH, Hu-Lieskovan S, Berent-Maoz B, Pang J, Chmielowski B, Cherry G, Seja E, Lomeli S, Kong X, Kelley MC, Sosman JA, Johnson DB, Ribas A, Lo RS. Genomic and transcriptomic features of response to Anti-PD-1 therapy in metastatic melanoma. Cell. 2017;168(3):542.

    Article  Google Scholar 

  31. Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, Sucker A, Hillen U, Foppen MHG, Goldinger SM, Utikal J, Hassel JC, Weide B, Kaehler KC, Loquai C, Mohr P, Gutzmer R, Dummer R, Gabriel S, Wu CJ, Schadendorf D, Garraway LA. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science. 2015;350(6257):207–11.

    Article  Google Scholar 

  32. Nathanson T, Ahuja A, Rubinsteyn A, Aksoy BA, Hellmann MD, Miao D, Van Allen E, Merghoub T, Wolchok JD, Snyder A, Hammerbacher J. Somatic mutations and Neoepitope Homology in Melanomas treated with CTLA-4 blockade. Cancer Immunol Res. 2017;5(1):84–91.

    Article  Google Scholar 

  33. Mariathasan S, Turley SJ, Nickles D, Castiglioni A, Yuen K, Wang Y, Kadel EE, Koeppen III, Astarita H, Cubas JL, Jhunjhunwala R, Banchereau S, Yang R, Guan Y, Chalouni Y, Ziai C, Senbabaoglu J, Santoro Y, Sheinson S, Hung D, Giltnane J, Pierce JM, Mesh AA, Lianoglou K, Riegler S, Carano J, Eriksson RAD, Hoglund P, Somarriba M, Halligan L, van der Heijden DL, Loriot MS, Rosenberg Y, Fong JE, Mellman L, Chen I, Green DS, Derleth M, Fine C, Hegde GD, Bourgon PS, Powles R T. TGFbeta attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature. 2018;554(7693):544–8.

    Article  Google Scholar 

  34. Braun DA, Hou Y, Bakouny Z, Ficial M, Sant’ Angelo M, Forman J, Ross-Macdonald P, Berger AC, Jegede OA, Elagina L, Steinharter J, Sun M, Wind-Rotolo M, Pignon JC, Cherniack AD, Lichtenstein L, Neuberg D, Catalano P, Freeman GJ, Sharpe AH, McDermott DF, Van Allen EM, Signoretti S, Wu CJ, Shukla SA, Choueiri TK. Interplay of somatic alterations and immune infiltration modulates response to PD-1 blockade in advanced clear cell renal cell carcinoma. Nat Med. 2020;26(6):909–18.

    Article  Google Scholar 

  35. Parikh AR, Szabolcs A, Allen JN, Clark JW, Wo JY, Raabe M, Thel H, Hoyos D, Mehta A, Arshad S, Lieb DJ, Drapek LC, Blaszkowsky LS, Giantonio BJ, Weekes CD, Zhu AX, Goyal L, Nipp RD, Dubois JS, Van Seventer EE, Foreman BE, Matlack LE, Ly L, Meurer JA, Hacohen N, Ryan DP, Yeap BY, Corcoran RB, Greenbaum BD, Ting DT, Hong TS. Radiation therapy enhances immunotherapy response in microsatellite stable colorectal and pancreatic adenocarcinoma in a phase II trial. Nat Cancer. 2021;2(11):1124–35.

    Article  Google Scholar 

  36. van den Ende T, de Clercq NC, van Berge Henegouwen MI, Gisbertz SS, Geijsen ED, Verhoeven RHA, Meijer SL, Schokker S, Dings MPG, Bergman J, Haj Mohammad N, Ruurda JP, van Hillegersberg R, Mook S, Nieuwdorp M, de Gruijl TD, Soeratram TTD, Ylstra B, van Grieken NCT, Bijlsma MF, Hulshof M, van Laarhoven HWM. Neoadjuvant Chemoradiotherapy Combined with Atezolizumab for Resectable Esophageal Adenocarcinoma: a single-arm phase II feasibility trial (PERFECT). Clin Cancer Res. 2021;27(12):3351–9.

    Article  Google Scholar 

  37. Birkbak NJ, Li Y, Pathania S, Greene-Colozzi A, Dreze M, Bowman-Colin C, Sztupinszki Z, Krzystanek M, Diossy M, Tung N, Ryan PD, Garber JE, Silver DP, Iglehart JD, Wang ZC, Szuts D, Szallasi Z, Richardson AL. Overexpression of BLM promotes DNA damage and increased sensitivity to platinum salts in triple-negative breast and serous ovarian cancers. Ann Oncol. 2018;29(4):903–9.

    Article  Google Scholar 

  38. Jiang P, Gu S, Pan D, Fu J, Sahu A, Hu X, Li Z, Traugh N, Bu X, Li B, Liu J, Freeman GJ, Brown MA, Wucherpfennig KW, Liu XS. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med. 2018;24(10):1550–8.

    Article  Google Scholar 

  39. Hoshida Y, Brunet JP, Tamayo P, Golub TR, Mesirov JP. Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE. 2007;2(11):e1195.

    Article  Google Scholar 

  40. Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform. 2021;22(6).

  41. Geeleher P, Cox N, Huang RS. pRRophetic: an R package for prediction of clinical chemotherapeutic response from tumor gene expression levels. PLoS ONE. 2014;9(9):e107468.

    Article  Google Scholar 

  42. Yang C, Zhang H, Chen M, Wang S, Qian R, Zhang L, Huang X, Wang J, Liu Z, Qin W, Wang C, Hang H, Wang H. A survey of optimal strategy for signature-based drug repositioning and an application to liver cancer. Elife. 2022;11.

  43. Yang K, Dinasarapu AR, Reis ES, Deangelis RA, Ricklin D, Subramaniam S, Lambris JD. CMAP: complement map database. Bioinformatics. 2013;29(14):1832–3.

    Article  Google Scholar 

  44. Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747–56.

    Article  Google Scholar 

  45. Mao XY, Perez-Losada J, Abad M, Rodriguez-Gonzalez M, Rodriguez CA, Mao JH, Chang H. iCEMIGE: integration of CEll-morphometrics, MIcrobiome, and GEne biomarker signatures for risk stratification in breast cancers. World J Clin Oncol. 2022;13(7):616–29.

    Article  Google Scholar 

  46. Xie X, Lee J, Iwase T, Kai M, Ueno NT. Emerging drug targets for triple-negative breast cancer: a guided tour of the preclinical landscape. Expert Opin Ther Targets. 2022;26(5):405–25.

    Article  Google Scholar 

  47. Tan Q, Yin S, Zhou D, Chi Y, Man X, Li H. Potential predictive and prognostic value of biomarkers related to Immune checkpoint inhibitor therapy of Triple-Negative breast Cancer. Front Oncol. 2022;12:779786.

    Article  Google Scholar 

  48. Cammarota G, Ianiro G, Ahern A, Carbone C, Temko A, Claesson MJ, Gasbarrini A, Tortora G. Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat Rev Gastroenterol Hepatol. 2020;17(10):635–48.

    Article  Google Scholar 

  49. Fan K, Cheng L, Li L. Artificial intelligence and machine learning methods in predicting anti-cancer drug combination effects. Brief Bioinform. 2021;22(6).

  50. Sato H, Miyamoto T, Yogev L, Namiki M, Koh E, Hayashi H, Sasaki Y, Ishikawa M, Lamb DJ, Matsumoto N, Birk OS, Niikawa N, Sengoku K. Polymorphic alleles of the human MEI1 gene are associated with human azoospermia by meiotic arrest. J Hum Genet. 2006;51(6):533–40.

    Article  Google Scholar 

  51. Dai G, Ou J, Wu B. A predictive study of metabolism reprogramming in cervical carcinoma. Ann Transl Med. 2022;10(7):414.

    Article  Google Scholar 

  52. Liang L, Wang H, Shi H, Li Z, Yao H, Bu Z, Song N, Li C, Xiang D, Zhang Y, Wang J, Hu Y, Xu Q, Ma Y, Cheng Z, Wang Y, Zhao S, Qian J, Chen Y, Fang JY, Xu J. A designed peptide targets two types of modifications of p53 with anti-cancer activity. Cell Chem Biol. 2018;25(6):761–774e765.

    Article  Google Scholar 

  53. Zhong Z, Jiang W, Zhang J, Li Z, Fan F. Identification and validation of a novel 16-gene prognostic signature for patients with breast cancer. Sci Rep. 2022;12(1):12349.

    Article  Google Scholar 

  54. Meng X, Pan B, Zhang Z, Guo P, Wang C, Huang X, Si N, Jiang H. Functional pathway and process Enrichment Analysis of genes Associated with Morphological Abnormalities of the outer ear. J Craniofac Surg. 2022.

    Article  Google Scholar 

  55. Si N, Zhang Z, Meng X, Huang X, Wang C, Pan B. Generation of an induced pluripotent stem cell line from a congenital microtia patient with 4p16.1 microduplication involving the long-range enhancer of HMX1. Stem Cell Res. 2021;53:102357.

    Article  Google Scholar 

  56. Nagel S, Scherr M, MacLeod RAF, Pommerenke C, Koeppel M, Meyer C, Kaufmann M, Dallmann I, Drexler HG. NKL homeobox gene activities in normal and malignant myeloid cells. PLoS ONE. 2019;14(12):e0226212.

    Article  Google Scholar 

  57. Mohebi M, Ghafouri-Fard S, Modarressi MH, Dashti S, Zekri A, Kholghi-Oskooei V, Taheri M. Expression analysis of vimentin and the related lncRNA network in breast cancer. Exp Mol Pathol. 2020;115:104439.

    Article  Google Scholar 

  58. Natarajan S, Sumantran VN, Ranganathan M, Madheswaran S. Microarray and pattern miner analysis of AXL and VIM gene networks in MDAMB231 cells. Mol Med Rep. 2018;18(4):4147–55.

    Article  Google Scholar 

  59. Emens LA, Molinero L, Loi S, Rugo HS, Schneeweiss A, Dieras V, Iwata H, Barrios CH, Nechaeva M, Nguyen-Duc A, Chui SY, Husain A, Winer EP, Adams S, Schmid P. Atezolizumab and nab-Paclitaxel in Advanced Triple-Negative breast Cancer: biomarker evaluation of the IMpassion130 study. J Natl Cancer Inst. 2021;113(8):1005–16.

    Article  Google Scholar 

  60. Wu SY, Wang H, Shao ZM, Jiang YZ. Triple-negative breast cancer: new treatment strategies in the era of precision medicine. Sci China Life Sci. 2021;64(3):372–88.

    Article  Google Scholar 

  61. Lan Q, Liu PY, Haase J, Bell JL, Huttelmaier S, Liu T. The critical role of RNA m(6)a methylation in Cancer. Cancer Res. 2019;79(7):1285–92.

    Article  Google Scholar 

  62. Li HB, Tong J, Zhu S, Batista PJ, Duffy EE, Zhao J, Bailis W, Cao G, Kroehling L, Chen Y, Wang G, Broughton JP, Chen YG, Kluger Y, Simon MD, Chang HY, Yin Z, Flavell RA. M(6)a mRNA methylation controls T cell homeostasis by targeting the IL-7/STAT5/SOCS pathways. Nature. 2017;548(7667):338–42.

    Article  Google Scholar 

Download references


The author expresses gratitude to the public databases, websites, and software used in the paper. We are grateful to the High Performance Computing Center of Central South University for partial support of this work.



Author information

Authors and Affiliations



NZ, QZ, QC, SL, ZY, and HZ designed and drafted the manuscript.HZ, QC, NZ, ZL, XY, HL, PL, RZ, ZW, ZD, XL, JW, XZ, BZ, and WW wrote figure legends and revised the manuscript.NZ, SL, and HZ conducted data analysis.All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Quan Cheng, Qi Zhang or Zhifang Yang.

Ethics declarations

Ethics approval and consent to participate

The ethics of the study was approved. Written informed consent was obtained from all patients.

Consent for publication

Not applicable.

Competing interests

All authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Zhang, N., Zhang, H. et al. Artificial intelligence learning landscape of triple-negative breast cancer uncovers new opportunities for enhancing outcomes and immunotherapy responses. J Big Data 10, 132 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Triple-negative breast cancer
  • Machine learning
  • Immunotherapy
  • Immune infiltrating cell
  • Prognosis