Artificial intelligence learning landscape of triple-negative breast cancer uncovers new opportunities for enhancing outcomes and immunotherapy responses

Triple-negative breast cancer (TNBC) is a relatively aggressive breast cancer subtype due to tumor relapse, drug resistance, and multi-organ metastatic properties. Identifying reliable biomarkers to predict prognosis and precisely guide TNBC immunotherapy is still an unmet clinical need. To address this issue, we successfully constructed a novel 25 machine learning (ML) algorithms-based immune infiltrating cell (IIC) associated signature of TNBC (MLIIC), achieved by multiple transcriptome data of purified immune cells, TNBC cell lines, and TNBC entities. The TSI index was employed to determine IIC-RNAs that were accompanied by an expression pattern of upregulation in immune cells and downregulation in TNBC cells. LassoLR, Boruta, Xgboost, SVM, RF, and Pamr were utilized for further obtaining the optimal IIC-RNAs. Following univariate Cox regression analysis, LassoCox, CoxBoost, and RSF were utilized for the dimensionality reduction of IIC-RNAs from a prognostic perspective. RSF, Ranger, ObliqueRSF, Rpart, CoxPH, SurvivalSVM, CoxBoost, GlmBoost, SuperPC, StepwiseCox, Enet, LassoCox, CForest, Akritas, BlackBoost, PlsRcox, SurvReg, GBM, and CTree were used for determining the most potent MLIIC signature. Consequently, this MLIIC signature was correlated significantly with survival status validated by four independent TNBC cohorts. Also, the MLIIC signature had a superior predictive capability for TNBC prognosis, compared with 148 previously reported signatures. In addition, MLIIC signature scores developed by immunofluorescent staining of tissue arrays from TNBC patients showed a substantial prognostic value. In TNBC immunotherapy, the low MLIIC profile demonstrated significant immune-responsive efficacy in a dataset of multiple cancer types. MLIIC signature could also predict m6A epigenetic regulation which controls T cell homeostasis. Therefore, this well-established MLIIC signature is a robust predictive indicator for TNBC prognosis and the benefit of immunotherapy, thus providing an efficient tool for combating TNBC.


Introduction
Triple-negative breast cancer (TNBC) is broadly defined as a particular subtype of breast cancer, which is negative for estrogen receptor-negative (ER), progesterone receptor-negative (PR), and human epidermal growth factor receptor 2 (HER2) expression.It is indisputable that TNBC is rather aggressive because of its tumor relapse, drug resistance, and multi-organ metastatic properties [1].Platinum-based drugs, PARP inhibitors, and immunotherapeutics represented by antibody-drug couples, PD-L1/PD-1 checkpoint inhibitors, and their diverse combination modalities are now becoming prominent options for the clinical management of TNBC [2,3].Meanwhile, the molecular heterogeneity of TNBC and the continuing deficiency of therapeutically successful approaches beyond chemotherapy, have made TNBC characterized by poor prognosis [4].Therefore, identifying reliable biomarkers to predict outcomes and precisely guide TNBC treatment decisions is still an unmet clinical need and warrants further investigation.
The tumor microenvironment (TME) is a highly complex multicellular entity consisting of tumor cells, immune cells, cancer-associated fibroblasts (CAFs) and adipocytes (CAAs), endothelial cells, extracellular matrix, and mesenchymal stem cells.In recent years, the advancement of multi-omics techniques has unearthed and unveiled the molecular heterogeneity characteristics within TNBC TME, highlighting the intensive dynamic correlations between the cancer cells and other non-neoplastic cells [5].The tumor landscape deciphered by multi-omics, especially in TME complexity, may benefit the in-depth understanding of the TNBC oncogenesis, progression, and transformation.Immune cell types in TME are abundant, including various types of B cells, T cells, tumor-associated macrophages (TAMs), dendritic cells (DCs), and CAFs.These immune cells constitute the primary effector cells of the cancerous immune response, thus determining the TNBC prognosis significantly.For example, studies have demonstrated that patients with a high abundance of tumor-infiltrating lymphocytes (TILs) have higher performance in terms of overall survival (OS), disease-free survival (DFS), and complete pathological response (pCR), compared with patients with low levels of TILs [6].Similarly, similar results are also observed in the CD4 + and CD8 + TIL populations, which strongly suggests that TIL could serve as a significant predictor for TNBC prognosis.In the age of big data, a variety of high-throughput data are developing, and making appropriate use of these data as a tool is commonly the key to understanding cancer mechanisms and justifying cancer treatment.Moreover, by integrating genomics, metabolomics, and other data, prediction algorithms using machine learning (ML) can accurately and rapidly process massive data for elucidating the molecular characteristics of TNBC at multiple scales [7].The ML employment will provide efficient tools for mining novel and reliable markers, tumor prognosis and metastasis prediction, and hierarchical patient management [8][9][10][11].
Although previously reported algorithms have emphasized the intriguing characteristics of immune-related genes in judging TNBC prognosis, there were no comprehensive reports on immune infiltrating cell (IIC) associated signatures in TNBC-based ML.To address this issue, in the current study, we successfully constructed a TNBC MLIIC signature based on 25 ML algorithms, which was achieved by multiple transcriptome datasets of various purified immune cells, TNBC cell lines, as well as TNBC entities.Subsequently, the predictive capability of this MLIIC signature and its corresponding potential in optimizing the immunotherapy responsiveness in TNBC were meticulously validated.Undoubtedly, this well-validated MLIIC signature in TNBC in this study will provide a profound insight into interpreting prognosis, immune cell alterations, and tumor immune landscape in the ecosystem of TNBC.

Acquisition of TNBC patients and various tumor cell line cohorts
Transcriptome data and clinical data of TNBC patients were obtained from 3 databases, including The Cancer Genome Atlas (TCGA, https://xenabrowser.net/) via Illumina-HiSeq platform, Molecular Taxonomy of Breast Cancer International Consortium (METABRIC, https://www.cbioportal.org/)through Illumina-HiSeq platform, and Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo)via Illumina-HiSeq platform and Affymetrix Human Genome U133 Plus 2.0 Array platform.The overall samples enrolled in our research comprised 122 samples from the TCGA TNBC dataset (training set), 299 from the METABRIC dataset (validation set), 133 from the GSE96058 dataset (validation set), and 107 from the GSE103091 dataset (validation set).Transcriptome data were acquired from GSE36133 (Cancer Cell Line Encyclopedia project (CCLE)) for 20 TNBC cell lines using the Affymetrix Human Genome U133 Plus 2.0 Array platform.The fragments per kilobase million (FPKM) values of RNA sequencing data were converted into transcripts per kilobase million (TPM) values.The Robust Multi-array Average (RMA) algorithm was implemented for quantile normalization, background correction, and log2 transformation of microarray data derived from the Affymetrix platform through the R package 'affy' [12].The data relating to copy number variations (CNVs) and DNA methylation information in the TNBC cohort were all acquired from the TCGA database.

MLIIC signature construction
An MLIIC signature was determined by a comprehensive analysis of purified immune cells, TNBC cell lines, and TNBC solid tumor tissues using an innovative computational framework according to a series of sequential ML algorithms.The flowchart is outlined below (Fig. 1).
Fig. 1 The computational framework for constructing the MLIIC signature.The top 15% expressed RNAs were adopted for candidate immune-related RNAs for each immune cell line.TSI was a widely used index to assess gene expression level relationships in tissue specification.TSI was applied to calculate the expression specificity of candidate immune-related RNAs for each cell type.The highly expressed RNAs in all immune cell types were identified as igRNA.igRNAs were believed to have high specificity in all immune cell types.igRNAs significantly upregulated in immune cell lines and downregulated in TNBC cell lines were identified as IIC-RNAs.IIC-RNAs were believed to be specific for immune cell lines and unspecific for TNBC cell lines, which were used as the input for MLbased classification and dimensionality reduction.Six ML algorithms for classification were utilized to determine potentially valuable IIC-RNAs.Univariate Cox regression analysis was further performed to screen out IIC-RNAs with prognostic features.Three ML algorithms for survival were taken to identify more valuable IIC-RNAs that were used as the input for signature construction.The MLIIC signature was eventually constructed according to RSF scoring with the best performance among 19 ML algorithms for scoring.The relationship between MLIIC signature, prognosis, biological function, tumor immune microenvironment, genome alternations, chemotherapeutic drug, and the immunotherapeutic response was thoroughly explored in the subsequent validation session.Finally, the MLIIC signature was verified using the LUAD tissue chip (1) The highest 15% expressed RNAs of each immune cell line were extracted as potential screening immune-related RNAs because of the appropriate number of genes for the downstream analysis.(2) The tissue specificity index (TSI) suggested by Yanai et al. [14], a widely used index to assess gene expression level relationships in tissue specification, was employed to determine the specificity of the above candidate RNA expression per immune cell.
where N represents the comprehensive count encompassing a variety of immune cell types, x i indicated the normalized expression level of immune cell type i relative to the highest RNA expression intensity detected across all immune cell types.The TSI spans a range from 0 to 1.A TSI value of 0 designates the RNA as broadly present across immune cells, while a TSI value of 1 indicates the RNA's exclusive presence in a specific immune cell types.RNAs exhibiting robust expression across all immune cell types have been verified as immune-related generic RNAs (igRNAs) as immune-related generic RNAs (igRNAs).igRNAs were believed to have high specificity in all immune cell types.
(3) Differentially expressed igRNAs that showed a pattern of both upregulations in multiple immune cell lines and downregulation in TNBC cell lines were determined as IIC-RNAs obtained through the R package 'limma' [15].IIC-RNAs were believed to be specific for immune cell lines and unspecific for TNBC cell lines, which were used as the input for ML-based classification and dimensionality reduction.(4) A total of 6 ML algorithms for classification were further used for downscaling, including least absolute shrinkage and selection operator regularised logistic regression (LassoLR), Boruta, Xgboost, support vector machine (SVM), Random Forest (RF), and prediction analysis for microarrays (Pamr).This step aimed to filter worthwhile IIC-RNAs by extracting the intersected IIC-RNAs identified by 6 ML algorithms for classification.(5) IIC-RNAs with prognostic potential were then screened in the TCGA TNBC dataset using univariate Cox regression analysis with the threshold of P < 0.05 and were used as the input for signature construction.(6) Next, 3 ML algorithms for survival, including Random Survival Forest (RSF), least absolute shrinkage and selection operator regularized Cox regression (LassoCox), and CoxBoost, were subsequently applied to assess the significance of the prognostic IIC-RNAs and conduct the dimensionality reduction accordingly.(7) 19 ML algorithms for scoring, including RSF, Ranger, ObliqueRSF, recursive partitioning and regression trees (Rpart), CoxPH, SurvivalSVM, CoxBoost, gradient boosting with component-wise linear models (GlmBoost), supervised principal components (SuperPC), StepwiseCox, elastic net regression (Enet), LassoCox, conditional random forests (CForest), akritas conditional non-parametric survival estimator (Akritas), gradient boosting with regression trees (BlackBoost), partial least squares regression for cox models and related techniques (PlsRcox), regression for a parametric survival model (SurvReg), generalized boosted regression models (GBM), and conditional inference trees (CTree), were used to determine the most reliable model.
(8) The MLIIC signature was established according to the prognostic IIC-RNAs via performing the RSF algorithm.Log-rank score test survival trees was executed by a previously established methodology [16].Firstly, the x-variable x was supposed to be ranked as x 1 ≤ x 2 ≤ … ≤ x n .Then, the "ranks" of every survival time T j (j ∈ [1, …, n]) were calculated.The specific equation utilized was given below: where Γ k = # [t : T t ≤ T k ] and Γ j represented the index of the order for T j .The log-rank score test was presented below: a where a and s 2 a represented the sample mean and sample variance of [a j : j = 1,. .., n], respectively.The measure of node separation was determined utilizing log-rank score splitting by | S (x, c) |.The best split was achieved by maximizing this value over x and c.

Drug susceptibility prediction
A detailed description of this part is provided in the supplementary file [40][41][42][43].

Multi-omics alteration characteristics of the MLIIC signature score
A detailed description of this part is provided in the supplementary file [44].

Statistical analysis
The samples were classified into subgroups depending on the cut-off threshold of our MLIIC signature ascertained by the R package 'survminer' .To assess OS disparities between the two MLIIC signature groups, Kaplan-Meier survival plots were generated using the 'survival' R package.For each of the individual clinical variables, including the MLIIC features, the C-index of OS was calculated.Furthermore, the prognostic implications of the MLIIC signature were elucidated through time-dependent receiver operating characteristic (ROC) curves, facilitating by the 'timeROC' R package.In order to assess the variations between two groups and multiple groups for continuous variables, we utilized the Wilcoxon rank sum test and the Kruskal-Wallis test, respectively.The correlation between two variables was calculated using the Spearman correlation analysis, and its significance was assessed using a two-sided hypothesis test.A P-value less than 0.05 was considered statistically significant.All P-values were two-sided.For all statistical analyses, they were conducted in the R project, version 4.1.2.

Identification of IIC-RNAs
To meticulously assess the immune cell-associated RNA, we first investigated 115 purified cell lineages containing 19 sorts of primary immune cells in 16 datasets by reviewing the literature from 2007 to 2022 (Fig. 1).A total of 5474 RNAs were extracted from each immune cell line, among the 15% most highly expressed, and used to screen for these related immune-related RNAs.The TSI scores of 5474 RNAs were calculated to determine the igRNAs that were ubiquitously represented in a total of 19 immune cell types.It is essential to mention that RNAs with low TSI scores present generally high expression patterns in diverse immune cell types, demonstrating the critical role of the immune effects.Subsequently, 2724 igRNAs were proved to be essential factors in the modulation of elemental immunity and were accompanied by a characteristic threshold of TSI < 0.2.Through analyzing these differential expressions of 2724 igRNAs, it was observed that 212 igRNAs were remarkably upregulated in 115 immune cell lines and downregulated in 20 TNBC cell lines depicted in Figure S1.Finally, we denoted these 212 igRNAs as the IIC-RNAs of TNBC.

Development of the MLIIC signature
A total of 6 ML algorithms for classification, encompassing LassoLR, Boruta, Xgboost, SVM, RF, and Pamr, were effectively deployed to discern nine noteworthy IIC-RNAs from the aformentioned screened igRNAs (Fig. 2A).The robustness of the identified IIC-RNAs' prognostic potential of the OS of TNBC patients was further substantiated through univariate Cox proportional hazards regression.Remarkably, this analysis unveiled four pivotal IIC-RNAs within the TCGA dataset (Fig. 2B).Following this, we extended our inquiry by engaging three distinct survival-oriented ML algorithms, including LassoCox (Fig. 2C), CoxBoost (Fig. 2D), and RSF (Fig. 2E), all aimed at rigorously scrutinizing the efficacy of the four prognosis-related IIC-RNAs (Fig. 2F).The expression pattern of the 4 IIC-RNAs in immune cells, including meiotic defect 1 (MEI1), H6 family homeobox 1 transcription factor gene (HMX1), membrane occupation and recognition nexus repeat containing 3 (MORN3), vimentin (VIM) were represented in Figure S2A. Figure S2B shows the differences in 4 IIC-RNAs between immune and tumor cells, and Figure S2C demonstrates the relationship between 4 IIC-RNAs and the prognosis for TNBC.19 ML algorithms for scoring were used to determine the most reliable signature (Fig. 2G).Then, we established an MLIIC signature according to the four prognostic IIC-RNAs by employing the RSF algorithm.

Comparison of prognostic value between the MLIIC signature and previous signatures
In the TCGA dataset, the MLIIC signature was markedly related to survival status, tumor stage, and TNM staging system (Fig. 3A).Besides, the MLIIC signature indicated the prognostic potential for superior accuracy over age, gender, TNM staging system, and the C-index of tumor staging in the TCGA dataset (Fig. 3B).Due to the rapid advancements in omics technologies, numerous studies have been reported to construct and analyze signatures based on specific gene combinations with promising predictive efficacy.We then aimed to systematically compare these relevant signatures with our MLIIC signature in the past decade.After a detailed investigation, we included a total of 148 signatures in terms of RNA signatures (Table S2).It should be noted that the MLIIC signature exhibited superior performance with respect to the C-index in the TCGA TNBC (Fig. 3C), METABRIC (Fig. 3D), GSE96058 (Fig. 3E), and GSE103091 (Fig. 3F) datasets, compared to nearly all of the previous models.Moreover, the MLIIC signature was compared with the MAPS signature and CMPS signature [45] with respect to the C-index in LumA, LumB, HER2, and Basel subtypes of BRCA in the TCGA dataset (Figure S3C).

Prediction of biological mechanisms associated with MLIIC signature
Considering the upregulation of immune-related features shown in the low-MLIIC group, we preferred to unearth the underlying biological mechanisms.The MLIIC signature score prominently displayed a robust inverse correlation with a plethora of immunologic pathways, encompassing the B cell receptor signaling pathway, T cell receptor signaling pathway, T cell-mediated immunity, the initiation of the immune response, as well as the intricate process of antigen processing and presentation (Fig. 4A).Significant differences in the immunological pathways in two MLIIC signature score groups were further proved by t-distributed stochastic neighbor embedding (t-SNE) (Fig. 4B).The chromosome distribution of genes significantly correlated with the MLIIC signature score is shown in Fig. 4C.The genes significantly associated with the MLIIC signature score were enriched in immune infiltration and activation pathways via Metascape (Fig. 4D).In GSEA of GO and KEGG terms, the low MLIIC signature group showed enrichment of the T cell receptor signaling pathway and B cell receptor signaling pathway as expected (Fig. 4E).Taken together, our results revealed that a low MLIIC signature represented a potency of superior immune response under immunotherapy.

Immune characteristics related to the MLIIC signature
Then, to delve into the immune status represented by MLIIC features, the subsequent focus was on the relationship between MLIIC features and immune infiltrating cells and immune modulators.As shown in Fig. 5A and B, in the TCGA dataset, the lower MLIIC signature subgroup exhibited higher levels of immune infiltrating cells and modulators, confirming an inflammatory but comparatively immunopromoted TME, potentially contributing to the choices of immunotherapy.MLIIC signature was significantly and negatively correlated with several classical immune checkpoints (Figure S4A).

The MLIIC signature is a predictive indicator of immunotherapy response
On account of the superior predictive capability of our MLIIC signature towards the benefit of immunotherapy, we performed validation of its effectiveness in several immunotherapy datasets.
In the GSE35640 (Fig. 6A) and GSE91061 (Fig. 6B), melanoma patients with low MLIIC signature scores had significantly and even increased survival periods (Fig. 6A).In the GSE78220 dataset, we also observed that melanoma patients with low MLIIC signature scores had significantly longer survival times (Fig. 6C) and were able to respond positively to anti-PD-1 immunotherapy (Fig. 6D).In the Van Allen dataset, melanoma patients with low MLIIC signature scores experienced increased survival times (Fig. 6E).As predicted, patients suffering from melanoma with low MLIIC signature scores appeared to be responsive to anti-CTLA-4 immunotherapy (Fig. 6F).In the Nathanson dataset, consistent with previous trends, melanoma patients with low MLIIC signature scores survived longer (Fig. 6G) and frequently responded to anti-CTLA-4 immunotherapy (Fig. 6H).In the IMvigor dataset, urothelial carcinoma patients with low MLIIC signature scores had significantly and even increased survival periods (Fig. 6I).Moreover, urothelial carcinoma patients with low MLIIC signature scores were more inclined to be responsive to anti-PD-L1 immunotherapy (Fig. 6J).In the Braun dataset, patients suffering from renal cell carcinoma with low MLIIC signature scores experienced increased survival time (Fig. 6K), which was consistent with a trend toward their response to anti-PD-1 immunotherapy (Fig. 6L).Furthermore, patients in the GSE179351, including those with colorectal adenocarcinoma (Fig. 6M) and pancreatic adenocarcinoma (Fig. 6N) dataset with low MLIIC signature scores, were also more likely to respond to immunotherapy.Of note, patients with low MLIIC signature scores were more likely to react to targeted therapy in the GSE165252 (esophageal adenocarcinoma) (Fig. 6O) and GSE103668 (TNBC) (Fig. 6P) datasets.Based on the TIDE algorithm, a low MLIIC signature score correlated significantly with the responses of immune checkpoint inhibitors (ICIs) in the TCGA dataset (Fig. 6Q).According to the Submap analysis, a low MLIIC signature score predicted an association with anti-PD-1 immunotherapy responses in the TCGA dataset (Fig. 6R).

Prediction of drug response related to the MLIIC signature score
A lower CMap score indicates a greater likelihood of the drug reversing the molecular attributes of the disease as per the CMap theory.Remarkably, arachidonyltrifluoromethane exhibits the lowest CMap score, suggesting its potential efficacy in treating TNBC patients with a prominent MLIIC signature score (Fig. 7A).Noteworthy findings also include Afatinib, displaying significantly heightened drug sensitivity within the high MLIIC characteristic score cohort (Fig. 7B).Furthermore, a compelling trend emerges with CTRP-derived tinifarnih-P1 (Fig. 7C) and PRISM-derived tucatinib (Fig. 7D), both showcasing robust negative correlation with the MLIIC signature score.Notably, these two drugs exhibit markedly improved drug sensitivity within the high MLIIC signature score subset.

Multi-omics alteration characteristics related to the MLIIC signature score
In the context of our study, we noted varying patterns of chromosomal alterations across two distinct groups based on the MLIIC signature scores (Fig. 8A).The precise genomic   9A.The Kaplan-Meier survival curves indicated that those TNBC patients with low MLIIC signature scores exhibited a markedly higher survival state (Fig. 9B).Furthermore, the Time-dependent ROC curves of the MLIIC signature for 3-, 5-, and 10-year OS assessment (0.679, 0.706, and 0.713, respectively) substantiated the prognostic potential of our MLIIC signature (Fig. 9C).In addition, the C-index of our signature and the focused clinical factors were illustrated in Fig. 9D, which suggested that this MLIIC signature displayed certain superiority versus the C-index.

Discussion
TNBC is a highly complicated, heterogeneous, relapsed BC type characterized by a high propensity to metastasize, poor outcomes, and a lack of treatment targets.TNBC no longer relies on a single traditional treatment but gradually evolves towards classification and precision treatment.Some emerging regimens, including PARP1 inhibitor, androgen receptor (AR) inhibitor, PI3K inhibitor, and especially ICIs, as well as their optimized combinations, are currently under clinical investigation [46].ICI treatment produces a long-lasting and complete tumor regressive effect in some TNBC patients but a temporary, partial, or no response in some patients [47].The highly heterogeneous and complex immune status of TME causes the clinical outcome of TNBC to be unpredictable.This is an intriguing issue for establishing well-defined indicators to determine prognosis and immunotherapy responsiveness.
ML has been proposed as a promising resource in many domains of medicine for classifying and predicting patient outcomes.Its model prognosis depends on the complex, multidimensional, and nonlinear relationships between patient tumor malignancy, surgical treatment, and drug therapy [48].The analysis of immune infiltration, including the quantity and quality of TILs, the integrated evaluation of TIL and other clinical markers, and the use of ML platforms for a more comprehensive characterization of immune features in breast cancer, are worthy of in-depth evaluation [49].Here, using 25 comprehensive and sequential ML algorithms, we successfully constructed an immunerelated MLIIC signature of TNBC based on four finally screened RNAs, including MEI1, MORN3, HMX1, and VIM.MEI1 participants in homologous recombination in meiosis during mammalian spermatogenesis.
Moreover, polymorphic alleles of the human MEI1 gene have been confirmed to be relevant to human azoospermia through meiotic arrest [50].In oncology research, only one report has shown by sequencing studies that MEI1 is one of the vital differential genes in human papillomavirus (HPV) + vs. HPV-tumors.It may be involved in the prognosis of cervical cancer [51].MORN3 is known to be a protein that affects cardiac function and tumor progression.At the same time, Morncide, the targeting peptide of Morn3, possesses the ability to curb tumor growth by activating the p53 pathway [52].The role of MORN3 in tumors has also rarely been studied and is only present in one bioinformatic model construction.However, this study included a relatively large number of 16 genes and did not sufficiently consider the predictive capability of immunotherapy [53].HMX1 is a critical orchestrator in developing craniofacial structures, ocular defects, and morphological abnormalities of the outer ear [54,55].HMX1 is also involved in the activity of NKL homologous frame genes in normal and malignant bone marrow cells [56].Vimentin (VIM) is a typical mesenchymal marker with abnormal methylation expression patterns in breast and prostate cancer [57].Natarajan et al. demonstrated that the VIM gene network modulated various neoplastic processes, such as adipogenesis, senescence, and autophagy [58].Overall, all four genes are now less reported and even less in oncology.Besides, we also performed a semi-quantitative validation of these indicators by multiple IF staining in an external cohort validation with TNBC tissue microassay.Our results presented that these IIC-RNAs have different patterns of high or low expression in high-or low-risk groups, predicting that they exert diverse functional properties in TNBC tumors.
On the one hand, this indicates the specificity and novelty of the genes involved in our MLIIC signature, accompanied by high expression in immune cells and low expression in TNBC.Meanwhile, it also reflects that subsequent studies must explore the relative gaps in the functions associated with these four genes.In terms of the prognosis capability, the superiority of 1-, 2-, 3-, 4-, and 5-year OS prediction of MLIIC signature were also confirmed, compared to 148 previous published signatures with respect to the C-index in all included databases.
TNBC is characterized by higher tumor mutational load, tumor-infiltrating T cell levels, and PD-L1 RNA expression versus other breast cancer subtypes.These immunogenic features make TNBC a potential population for immunotherapy.For monitoring responses to tumor immunotherapy, PD-L1, tumor-associated antigens (TAAs), BRCA1/2 mutation, and ctDNA have been confirmed to be effective candidates, among which PD-L1 was among the first applied biomarkers.Notably, IMpassion130 was the first study to successfully introduce immunotherapy into TNBC, in which PD-L1 was used as a well-established indicator of immunotherapy in metastatic TNBC [59].However, this indicator had a narrow application scope and could not assure whether the PD-L1 positive cohort was the population that could benefit from ICI in early TNBC.Inconsistent findings suggested that PD-L1 is by far the best, but not perfect, predictive biomarker for ICI efficacy.Therefore, identifying predictive biomarkers and the corresponding signature strategies for precision immunotherapy are essential concerns for the current TNBC treatment.In our study, we found that this MLIIC signature with a low score indicated a more robust immune effector cell activation and more benefits from immune therapy.Specifically, the lower MLIIC signature subgroup possessed a higher abundance of TILs, immune modulators, TMB, MSI, CYT, GEP, and IFN-γ and showed a distinct enriched signaling pathway for immune response, T and B cell receptor, antigen processing and presentation, and PD-L1/PD-1 checkpoint.Our signature is effectively capable of reflecting a wealth of information regarding alteration in immune cell infiltration and related pathways, presenting more insight into tumor immunity than reported studies.
BC is characterized by insufficient T-lymphocyte infiltration and is known as an immune "cold" tumor, resulting in an inadequate response to immunotherapy, including PD-1/PD-L1 inhibitors.Our study shows that TNBC has high low-risk immune infiltration, characterized by increased infiltration of cancer-suppressing immune cells and high levels of cancer-suppressing immune cells.The reasons for this phenomenon may have some relevance to the physical structure of breast tissue, immune cell abundance, tumor heterogeneity, age, hormone levels, and the included cohort.Firstly, differences in the immune microenvironment across cancer species are driven by the organ-specific structural and molecular characteristics of the different tissues.Different angiogenesis, immune cell distribution, and stem cell activity exist in sterile tissues (pancreas, brain), filtering and metabolic tissues (liver, kidney), environmental interface tissues (skin, lung, intestine), and body surface tissues (breast).When tumorigenesis occurs, immune cell infiltration and homing are influenced by tissue location-specific factors.For example, in gliomas and RCC, where there is a high degree of microangiogenesis and astrocytes, the abundance and functional status of immune cell infiltration are specific, while in TNBC, the TME formed by a large number of adipocytes surrounded by a large number of adipose cells is primarily influenced by the regulation of crosstalk and changes in paracrine signaling between adipose tissue and cancer cells.
Moreover, adipogenesis in TNBC is associated with cancer metabolism and unfavorable tumor immune microenvironment, consequently affecting immunotherapy.Secondly, as previously mentioned, TNBC is highly heterogeneous and can be classified into different entities based on various immune, metabolic, pathological, and other molecular characteristics.We note that in terms of immune stratification, Shao et al. classified TNBC into three types: immune desert type, immune inactivation type, and inflammatory immune type [60].Among them, the immune desert-type microenvironment has a low tumor infiltration rate, fails to attract immune cells, and is associated with MYC gene amplification.The immunodepleted type is chemotactic, but innate immune inactivation and a low amount of tumor antigen may contribute to immune escape.In contrast, the immune-inflammatory type highly expresses immunecheckpoint molecules.This implies that certain specific cohorts of TNBC subtypes present a high abundance of infiltrative immune features.Besides, the expression of oncoproteins, cancer cell proliferation, progression, and metastasis are all increased by the frequent upregulation of m6A epigenetic regulators in human cancer tissues from a variety of organ origins [61].A recent study reported that by targeting the IL-7/STAT5/SOCS pathways, m6A mRNA methylation regulates T cell homeostasis [62].In our study, m6A epigenetic regulators were significantly highly expressed in the low MLIIC signature score group, further indicating the immune active environment in TNBC patients with low MLIIC signature scores.
Although the predictive efficacy of MLIIC signature has been preliminarily documented in TNBC, we remain concerned that several issues remain to be addressed.Firstly, this study is essentially a retrospective study using previous data, and its actual value needs to be further corroborated using real-world information.MLIIC signature is also in urgent need of validation in a prospective immunotherapy cohort in terms of predicting the effects of immunotherapy.In addition, TNBC belongs to a subtype of malignant BC and possesses its complex classification, including the PAM50 type and six categories based on molecular characteristics.More accurate molecular typing can guide the prognosis and treatment selection of TNBC.The MLIIC signature in this study predicts overall TNBC, but no in-depth exploration of the subdivided TNBC subtypes was made.This is of extremely high value in the diagnosis and treatment of TNBC.Therefore, we would likely construct more accurate diagnostic signatures for different TNBC subtypes in the follow-up.Finally, the MLIIC signature score is closely related to multiple immune cells, tumor process pathways, and other biological mechanisms.In this regard, the key molecules and interaction networks also deserve further profound study in molecular biology.Finally, it is essential to note that our signature is currently an efficient tool that reflects a wide range of treatment information but is not a replacement for the gold standard diagnosis and treatment method.The combination of MLIIC signature and conventional tools is exciting and promising.

Conclusion
Taken together, based on 25 ML algorithms, we successfully constructed a robust and reliable MLIIC signature for predicting TNBC prognosis, mutation, biological function, drug responsiveness, immune infiltration, and immunotherapy responsiveness.Notably, the TNBC patients with a low score of MLIIC signature possessed a superior prognosis, a more active immune microenvironment, and a more substantial immunotherapeutic effect.Our well-established MLIIC signature provides an effective tool to guide TNBC prognosis determination and treatment stratification management.

Fig. 2
Fig. 2 Development of the MLIIC signature based on ML. (A) Venn plot shows the intersected genes identified by six ML algorithms for classification.(B) Univariate Cox regression analysis of the screened nine intersected genes displayed via forest plot.(C) Dimension reduction of the ten prognostic genes by the CoxBoost algorithm.(D) Dimension reduction of the ten prognostic genes by the LassoCox algorithm.(E) Dimension reduction of the ten prognostic genes by RSF algorithm.(F) Venn plot shows the intersected prognostic genes identified by three ML algorithms for survival.(G) Performance of 19 ML algorithms for scoring in terms of signature construction.(H) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the TCGA, METABRIC, GSE96058, and GSE103091 datasets.(I) Time-dependent ROC curves of the MLIIC signature regarding 1-, 2-, 3-, 4-, and 5-year OS in the TCGA, METABRIC, GSE96058, and GSE103091 datasets

Fig. 3
Fig. 3 The superior predictive capability of MLIIC signature for TNBC prognosis.(A) Univariate and multivariate Cox regression analysis of OS of individual clinical variables with and without MLIIC signature in the TCGA dataset displayed via forest plot.(B) Bar plot shows the C-index of the MLIIC signature and various clinical factors in the TCGA, METABRIC, GSE96058, and GSE103091 datasets.(C) The C-index of the MLIIC signature and other published models developed in the TCGA, METABRIC, GSE96058, and GSE103091 datasets displayed via forest plot

Fig. 4
Fig. 4 Biological peculiarities of the MLIIC signature score in the TCGA dataset.(A) MsigDB-based GSVA analysis delineated the biological attributes of two MLIIC signature score groups displayed via heat map.(B) t-SNE plot of GO and KEGG terms delineated the differences in pathway activity in two MLIIC signature score groups.(C) Chromosome distribution of genes significantly correlated with MLIIC signature score.(D) Metascape-based enrichment analysis of genes significantly associated with MLIIC signature score.(E) GSEA of GO and KEGG terms for the MLIIC signature score

Fig. 5
Fig. 5 Immune-related characteristics of the MLIIC signature in the TCGA dataset.(A) Heat map presented the correlation between the MLIIC signature and immune infiltrating cells.(B) From left to right: mRNA expression; expression versus methylation; amplification frequency; and the deletion frequency for 75 immunomodulators genes by MLIIC signature groups displayed by heat map.(C) Box plot showed the TMB levels between two MLIIC signature score groups.(D) Box plot showed the MSI levels between two MLIIC signature score groups.(E) Box plot showed the CYT levels between two MLIIC signature score groups.(F) Box plot showed the GEP levels between two MLIIC signature score groups

Fig. 6
Fig. 6 Predictive value of the MLIIC signature in immunotherapy response.(A) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE36540 dataset.(B) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE91061 dataset.(C) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the GSE78220 dataset.(D) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE78220 dataset.(E) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the Van Allen dataset.(F) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the Van Allen dataset.(G) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the Nathanson dataset.(H) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the Nathanson dataset.(I) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the IMvigor dataset.(J) Box plot displayed the levels of the MLIIC signature score in patients with different immunotherapy responses in the IMvigor dataset.(K) Kaplan-Meier survival curves of the MLIIC signature regarding OS in the Braun dataset.(L) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the Braun dataset.(M-N) Box plot displayed the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE179351 dataset.(O) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE165252 dataset.(P) Box plot displays the levels of the MLIIC signature score in patients with different immunotherapy responses in the GSE103668 dataset.(Q) Contingency table of the two MLIIC signature score groups and immunotherapy responses based on TIDE algorithm.(R) Heat map of the two MLIIC signature score groups and anti-PD-1/anti-CTLA-4 immunotherapy responses based on submap analysis

Fig. 7
Fig. 7 The drug responses between two MLIIC signature score groups.(A) CMap-based drug prediction.(B) Box plot shows the GDSC-based drug prediction.(C) Dot plot and box plot show the CTRP-based drug prediction.(D) Dot plot and box plot shows the PRISM-based drug prediction

Fig. 8
Fig. 8 Multi-omics alteration characteristics of the MLIIC signature score in the TCGA dataset.(A) GISTIC 2.0-based chromosome amplifications and deletions in two MLIIC signature score groups.(B) Waterfall plot shows the genomic alteration landscape in two MLIIC signature score groups.(C) Box plot shows the fraction of genome alteration, the fraction of genome gained, and the fraction of genome lost in two MLIIC signature score groups

Fig. 9
Fig. 9 The superior predictive capability of MLIIC signature for TNBC prognosis in the TNBC tissue array.(A) Representative multiplex IF images of MEI1, HMX1, VIM, and MORN3 in two MLIIC signature score groups.(B) Kaplan-Meier survival curves of the MLIIC signature regarding OS. (C) Time-dependent ROC curves of the MLIIC signature regarding 3-, 5-, and 10-year OS.(D) Bar plot shows the C-index of the MLIIC signature and focused clinical factors