Skip to main content

An ensemble machine learning model for predicting one-year mortality in elderly coronary heart disease patients with anemia

Abstract

Objective

This study was designed to develop and validate a robust predictive model for one-year mortality in elderly coronary heart disease (CHD) patients with anemia using machine learning methods.

Methods

Demographics, tests, comorbidities, and drugs were collected for a cohort of 974 elderly patients with CHD. A prospective analysis was performed to evaluate predictive performances of the developed models. External validation of models was performed in a series of 112 elderly CHD patients with anemia.

Results

The overall one-year mortality was 43.6%. Risk factors included heart rate, chronic heart failure, tachycardia and β receptor blockers. Protective factors included hemoglobin, albumin, high density lipoprotein cholesterol, estimated glomerular filtration rate (eGFR), left ventricular ejection fraction (LVEF), aspirin, clopidogrel, calcium channel blockers, angiotensin converting enzyme inhibitors (ACEIs)/angiotensin receptor blockers (ARBs), and statins. Compared with other algorithms, an ensemble machine learning model performed the best with area under the curve (95% confidence interval) being 0.828 (0.805–0.870) and Brier score being 0.170. Calibration and density curves further confirmed favorable predicted probability and discriminative ability of an ensemble machine learning model. External validation of Ensemble Model also exhibited good performance with area under the curve (95% confidence interval) being 0.825 (0.734–0.916) and Brier score being 0.185. Patients in the high-risk group had more than six-fold probability of one-year mortality compared with those in the low-risk group (P < 0.001). Shaley Additive exPlanation identified the top five risk factors that associated with one-year mortality were hemoglobin, albumin, eGFR, LVEF, and ACEIs/ARBs.

Conclusions

This model identifies key risk factors and protective factors, providing valuable insights for improving risk assessment, informing clinical decision-making and performing targeted interventions. It outperforms other algorithms with predictive performance and provides significant opportunities for personalized risk mitigation strategies, with clinical implications for improving patient care.

Key learning points

a. What is already known

1. Coronary heart disease is a leading cause of mortality in the elderly worldwide, posing a huge burden on the health and social care systems. Anemia is often found in patients with coronary heart disease and is a multifactorial problem in elderly patients.

2. Considering harmful impact of anemia on the elderly patients with coronary heart disease and its complex pathophysiologic mechanisms, it is urgent to find effective measures to take care of those patients.

b. What this study adds

1. An ensemble machine learning model effectively predicts one-year mortality in elderly coronary heart disease patients with anemia. It outperforms other algorithms with predictive performance and provides significant opportunities for personalized risk mitigation strategies, with clinical implications for improving patient care.

2. This study identifies key risk factors and protective factors, providing valuable insights for improving risk assessment, informing clinical decision-making and performing targeted interventions. Early intervention measures are also recommended for elderly coronary heart disease patients with anemia in this study.

Introduction

Coronary heart disease (CHD) is a leading cause of mortality and disability in the elderly worldwide, posing a huge burden on the health and social care systems [1]. It is estimated that about 11.39 million patients suffer from CHD in China, and mortality due to CHD was 121.59/100,000 among urban residents and 130.14/100,000 among rural residents in 2019 [2]. Anemia is often found in patients with CHD and is a multifactorial problem in elderly patients [3, 4]. It was detected in approximately 10–20% of patients with CHD [3, 5]. Anemia would further worsen clinical outcome of CHD patients and was significantly associated with mortality in these patients [6]. It is not only let patients and family physical and psychology agony, but also cause serious load on the society and economics. Accordingly, there is a strong interest among clinical physicians to assess CHD patients combined with anemia. Pathophysiologic mechanisms of anemia development in CHD patients include iron deficiency, blood loss, chronic inflammation, impaired renal function, and renin-angiotensin-aldosterone system inhibition [3, 7, 8]. Emerging clinical researches indicated that anemia is a powerfully independent predictor of mortality in patients with CHD [9,10,11,12]. The reasons for worse outcome of CHD patients with anemia are likely multifactorial. Anemia may decrease blood oxygen levels and worsen myocardial ischaemic injury in CHD. Systemic oxygen supply is maintained by inducing reactive tachycardia and increasing cardiac output in CHD patients with anemia [12, 13].

Given deleterious effects of anemia on the patients with CHD and its intricate pathophysiologic process, it is imperative to swiftly identify effective strategies for the management of such patients. Survival prediction serves as a crucial benchmark for healthcare providers when performing risk evaluations for CHD patients with anemia. Previous studies have indicated that early and prompt identification of the high-risk patients is of great importance to reducing mortality risk in CHD patients [14]. Nonetheless, there exists a dearth of research on the mortality risk within this specific demographic. Hence, it is necessary to develop robust predictive models to pinpoint CHD patients at high risk for premature mortality. Thus far, machine learning methods have been demonstrated to be a remarkably potent approach for developing predictive models in rendering accurate decisions across diverse clinical scenarios. Considering the ability to identify these individuals with more sensitivity and specificity than non-ensemble models, an ensemble machine learning model provided better performance and was increasingly employed across medical specialties [15].

Therefore, this study aimed at systematically and effectively investigating risk factors for survival outcome, and achieving model construction and validation to predict one-year mortality, among elderly CHD patients with anemia using machine learning methods. Performances of all models including Naïve Bayesian, XGBoosting Machine, Decision Tree, Ensemble Model, Support Vector Machine, and Logistic Regression were compared in this study in terms of area under the curve (AUC), Brier score, calibration curve, density curve, and discrimination slope.

Methods

Patients and study design

This study prospectively analyzed 974 elderly patients with CHD in the Department of Geriatric Cardiology, Chinese People’s Liberation Army (PLA) General Hospital (Beijing, China). Patients were included if they (1) aged above 60 years, and (2) diagnosed with CHD. Patients were excluded if they had no anemia. Flow diagram of patient enrollment and study design is shown in the Fig. 1. A total of 397 elderly CHD patients with anemia were enrolled as the model derivation cohort, and these patients were randomly divided into a training cohort (n = 269) and an internal validation cohort (n = 128) according to the ratio of 70:30 [16]. The randomization of patients was achieved using our computer. Randomly splitting the data into the training and internal validation cohorts helps ensure that all models were trained and evaluated on a representative sample of the overall dataset. In addition, a series of 112 elderly CHD patients with anemia were involved in the external validation cohort from Hainan Hospital of Chinese PLA General Hospital (Sanya, China). Subgroup analysis was performed to compare basic characteristics between patients with and without one-year mortality, and the identified variables were included as input features to develop predictive models. The training cohort was used to train and optimize models using five machine learning models. Model validation was performed in the internal and external validation cohorts. The optimal model could be obtained by comparing predictive performances between five machine learning models. This study received the approval from Ethics Committee of Chinese PLA General Hospital, and was performed in accordance with tenets and provisions of Declaration of Helsinki, 1975.

Fig. 1
figure 1

Flow diagram of patient enrollment and study design

Disease determination and outcome

CHD was diagnosed with clinical histories, angina symptoms, cardiac markers, and specific examinations including electrocardiogram (rest and exercise), echocardiography, radionuclide imaging, computed tomography, and coronary angiography on the basis of the American College of Cardiology/American Heart Association (ACC/AHA)/European Society of Cardiology (ESC) guidelines [17, 18]. Anemia criteria was hemoglobin < 120 g/L in women and < 130 g/L in men according to the World Health Organization [11]. Hypertension was considered to be present if systolic blood pressure ≥ 140mmHg, diastolic blood pressure ≥ 90mmHg, and/or undertaking anti-hypertensive treatment [19]. An individual was considered to have diabetes if fasting plasma glucose was ≥ 7.0mmol/L, postprandial blood glucose (2-hour venous blood glucose) was ≥ 11.1mmol/L, and/or undertaking glucose-lowering treatment. Atrial fibrillation (AF) and chronic heart failure (CHF) were defined on the basis of the ACC/AHA/ESC guidelines for AF [20], and the ESC guidelines for CHF [21], respectively. Body mass index (BMI) was calculated as weight(kg)/height(m)2. Estimated glomerular filtration rate (eGFR) was calculated by a modified Modification of Diet in Renal Disease equation based on the data from Chinese patients [22]:

$$175\,{\rm{ \times }}\,{\rm{serum}}\,{\rm{creatinine}}\,{\left( {{\rm{Scr,}}\,{\rm{mg/dL}}} \right)^{\rm{ - }}}^{1.234} \times \,{\bf{age}}\,{\left( {{\bf{year}}} \right)^ - }^{0.179}( \times \,{\bf{0}}.{\bf{79}}\,{\bf{if}}\,{\bf{female}})$$

Chronic kidney disease (CKD) was defined as eGFR < 60mL/minute/1.73m2 on the basis of the Kidney Disease Outcomes Quality Initiative Working Group definition [23]. Tachycardia was defined as resting heart rate (HR) of more than 100 beats per minute. Chinese PLA General Hospital was their designated hospital and had comprehensive medical treatment and final death records, which made it easier for us to track them for effective and accurate judgement of endpoint. The primary outcome of this study was one-year all-cause mortality, which was defined as patients died from any cause within one year after discharge. Mortality was determined by telephone interviews and medical records including legal documents of dead time, place and others.

Data collection and re-evaluation

The following variables were collected for this study: (1) demographics including age, gender, current smoker, BMI, HR, and left ventricular ejection fraction (LVEF); (2) tests including hemoglobin (g/L), albumin (g/dL), Scr (µmol/L), uric acid (µmol/L), high density lipoprotein cholesterol (HDL-C, mmol/L), and eGFR (mL/minute/1.73m2); (3) comorbidities including hypertension, diabetes, AF, CHF, CKD, and tachycardia; and (4) drugs used including aspirin, clopidogrel, β receptor blockers, calcium channel blockers (CCBs), nitrates, angiotensin converting enzyme inhibitors (ACEIs)/angiotensin receptor blockers (ARBs), and statins. Tests were performed at admission in the Department of Biochemistry, Chinese PLA General Hospital. All information was obtained and preserved by trained researchers. To verify the accuracy of the results, other independent researchers performed logistical check and data re-evaluation. Missing values in the dataset were imputed with median.

Model construction and validation

Machine learning models including Naïve Bayesian, XGBoosting Machine, Decision Tree, Ensemble Model, Support Vector Machine, and Logistic Regression were introduced to train and optimize models in the training cohort. Ensemble Model was proposed to combine the results from Naïve Bayesian, XGBoosting Machine, Decision Tree, and Support Vector Machine. To streamline preprocessing steps, we used a Pipeline from scikit-learn, which allowed us to chain together multiple steps into a single object. This ensured that the same steps were applied consistently during training and validation of models. Grid and random hyper-parameter search were used to determine the optimal hyper-parameters, with AUC of receiver operating characteristics (ROC) as the optimization metric [24]. Patients in the internal and external validation cohorts were used to assess the effectiveness of predictive models using AUC, Brier score, calibration curve, density curve, and discrimination slope. In addition, Shaley Additive exPlanation (SHAP) was performed to determine feature value.

Risk stratification system construction

Risk stratification was performed according to the ideal cut-off value determined by the average of the thresholds in the validation cohort [25, 26]. Patients who had predicted probability of one-year mortality that was less than the ideal cut-off value were categorized into the low-risk group, and those who had predicted probability of one-year mortality that was equal to or more than the ideal cut-off value were categorized into the high-risk group.

Statistical implementation and environment

Continuous variables that were not normally distributed were summarized as median with interquartile range [IQR], and categorical variables were presented as proportion. Comparison of continuous variables that were not normally distributed was performed using Wilcoxon rank test, while comparison of categorical variables was performed using Chi-square test or continuous adjusted Chi-square test. Kaplan-Meier analysis with log-rank test was used to compare survival outcome between the low-risk and high-risk groups. Traditional statistics were performed using R programming language (version 4.1, http://www.R-project.org). The whole code of this study was available at https://github.com/Starxueshu/codefor1ymortality. All statistical tests were two-tailed with P value of less than 0.05 indicating statistical importance. Machine learning modeling and interpretation were performed in an open-source web application of Jupyter Notebook in which authors are able to use Python language (version 3.9).

Results

Patient basic clinical characteristics

Basic characteristics of all patients in the model derivation cohort are shown in the Table 1. Median age of these patients was 88.00 [85.00, 91.00] years. As the elderly tend to have more coexisting conditions, 81.6% patients had hypertension, 42.6% patients had diabetes, 21.2% had AF, 39.0% had CHF, 49.9% had CKD, and 5.5% had tachycardia. The majority of all patients have accepted various drugs like aspirin (42.1%), clopidogrel (55.4%), β receptor blockers (68.3%), CCBs (63.7%), nitrates (85.1%), ACEIs/ARBs (47.9%), and statins (52.9%). The overall one-year mortality among these patients was 43.6% (173/397).

Table 1 Basic characteristics among elderly coronary heart disease patients with anemia in the model derivation cohort

Subgroup analysis of patients

Subgroup analysis was performed to compare basic characteristics between patients with and without one-year mortality (Table 2). It showed that patients who died within one year had lower hemoglobin (P < 0.001), albumin (P < 0.001) and HDL-C (P < 0.001), higher HR (P < 0.001), and worse renal (P = 0.049) and cardiac function (P < 0.001). In addition, patients with one-year mortality tended to have higher prevalence of CHF (P < 0.001) and tachycardia (P = 0.009). In terms of drugs, patients with one-year mortality had less aspirin (P = 0.035), clopidogrel (P = 0.003), CCBs (P = 0.004), ACEIs/ARBs (P = 0.001), and statins (P < 0.001), and more β receptor blockers (P = 0.007). In total, HR, CHF, tachycardia and β receptor blockers are risk factors for one-year mortality, and hemoglobin, albumin, HDL-C, eGFR, LVEF, aspirin, clopidogrel, CCBs, ACEIs/ARBs, and statins are protective factors.

Table 2 Characteristic comparison among elderly coronary heart disease patients with anemia according to one-year mortality

Model and internal validation

Characteristic comparison among patients in the training and internal validation cohorts is shown in the Table 3. The above characteristics were included as input features to train and optimize models including HR, CHF, tachycardia, hemoglobin, albumin, HDL-C, eGFR, LVEF, aspirin, clopidogrel, β receptor blockers, CCBs, ACEIs/ARBs, and statins. Five machine learning models were used to train models for predicting one-year mortality. The optimal hyper-parameters of these models are summarized in the Supplementary Table 1. This study has also made all models available as a PKL file on the GitHub repository (https://github.com/Starxueshu/CHDmodel.git).

Table 3 Characteristic comparison among elderly coronary heart disease patients with anemia in the training and internal validation cohorts

As a result, Ensemble Model with the best AUC and Brier score was superior to other machine learning models in the internal validation cohort. Performances of each model are shown in the Table 4. Brier score was 0.174 for Naïve Bayesian, 0.193 for XGBoosting Machine, 0.223 for Decision Tree, 0.170 for Ensemble Model, 0.170 for Support Vector Machine, and 0.228 for Logistic Regression. Their corresponding AUC (95% confidence interval) was 0.828 (0.793–0.863), 0.763 (0.724–0.791), 0.673 (0.634–0.715), 0.828 (0.805–0.870), 0.805 (0.771–0.839), and 0.795 (0.756–0.829), respectively (Fig. 2). Calibration curves showed that all models, in particular Ensemble Model, had favorable predicted probability (Fig. 3). Density curves showed that the majority of models, in particular Ensemble Model, had favorable separation and small overlap between patients with and without one-year mortality (Fig. 4). Violin plots showed that all models showed significant difference of predicted risk between the two groups (Fig. 5).

Table 4 Performances of machine learning models among elderly coronary heart disease patients with anemia for predicting one-year mortality in the internal validation cohort
Fig. 2
figure 2

Area under the curve (AUC) for machine learning models in the internal validation cohort. CI: confidence interval

Fig. 3
figure 3

Calibration curves for machine learning models in the internal validation cohort

Fig. 4
figure 4

Density curves for machine learning models in the internal validation cohort. (A) Logistic Regression; (B) Naïve Bayesian; (C) XGBoosting Machine; (D) Support Vector Machine; (E) Decision Tree; (F) Ensemble Model

Fig. 5
figure 5

Violin plots for machine learning models in the internal validation cohort. (A) Logistic Regression; (B) Naïve Bayesian; (C) XGBoosting Machine; (D) Support Vector Machine; (E) Decision Tree; (F) Ensemble Model

Model and external validation

Basic characteristics of all patients in the external validation cohort are shown in the Table 5. Ensemble Model also exhibited good performance with AUC (95% confidence interval) being 0.825 (0.734–0.916; Supplementary Fig. 1) and Brier score being 0.185 (Table 6). Ensemble Model had favorable separation in density curve (Supplementary Fig. 2). Ensemble Model had discrimination slope being 0.255 in violin plot (Supplementary Fig. 3). The above results indicated that Ensemble Model had favorable discrimination and calibration in the external validation cohort.

Table 5 Basic characteristics among elderly coronary heart disease patients with anemia in the external validation cohort
Table 6 Performances of machine learning models among elderly coronary heart disease patients with anemia for predicting one-year mortality in the external validation cohort

Risk stratification system development

A risk stratification system was developed, which successfully classified patients in the internal validation cohort into the low-risk and high-risk groups. Patients in the high-risk group had more than six-fold probability of one-year mortality compared with those in the low-risk group (P < 0.001; Table 7). Kaplan-Meier analysis further confirmed these findings, showing that patients in the low-risk group had longer survival compared with those in the high-risk group (P < 0.001; Fig. 6). Hazard ratio was found to be 2.512 (95% confidential interval: 1.733–3.642), suggesting that patients in the high-risk group were 2.512 times more likely to experience one-year mortality compared with patients in the low-risk group (P < 0.001). Feature importance for clinical characteristics was further analyzed in the training cohort (Fig. 7A) and internal validation cohort (Fig. 7B). SHAP identified the top five risk factors that associated with one-year mortality were hemoglobin, albumin, LVEF, eGFR, and ACEIs/ARBs. To further elucidate clinical usefulness of the developed model, two patients were extracted from the external validation cohort, and their related clinical characteristics which were identified to be risk (red bar) and protective (blue bar) factors were included as input parameters to determine one-year mortality in the selected patients. The developed model predicted that one-year mortality was 84.40% for the true positive case (Fig. 8A) and 19.48% for the true negative case (Fig. 8B), respectively.

Table 7 Subgroup analysis of risk groups among elderly coronary heart disease patients with anemia in the internal validation cohort
Fig. 6
figure 6

Kaplan-Meier analysis showed survival probability classified by a risk stratification system in the internal validation cohort (log-rank test: P < 0.0001)

Fig. 7
figure 7

Feature importance for risk factors in the training cohort (A) and internal validation cohort (B). LVEF, left ventricular ejection fraction; HDL-C, high density lipoprotein cholesterol; eGFR, estimated glomerular filtration rate; ACEIs, angiotensin converting enzyme inhibitors; ARBs, angiotensin receptor blockers; HR, heart rate; CHF, chronic heart failure; CCBs, calcium channel blockers

Fig. 8
figure 8

Model explanation with a true positive case (A) and a true negative case (B) in the external validation cohort. ACEIs, angiotensin converting enzyme inhibitors; ARBs, angiotensin receptor blockers; LVEF, left ventricular ejection fraction; eGFR, estimated glomerular filtration rate; HDL-C, high density lipoprotein cholesterol; HR, heart rate

Discussion

Anemia and CHD are frequently encountered in elderly populations and are both related to adverse outcome. Nevertheless, because older patients were often excluded from the major clinical trials, rare studies have been performed to explore mortality risk in elderly CHD populations with anemia [27]. There are many physiologic reasons related to anemia in the elderly including changed hormone levels, impaired inflammatory response, stem cell alteration, and reduced erythropoietin induction secondary to renal dysfunction [28]. Meanwhile, anemia is intricately associated with a multitude of ailments, prompting unfavorable consequences and introducing therapeutic challenges for elderly patients with CHD.

Model performances and explainability

Ensemble Model is a machine learning model that could facilitate to minimize error-causing factors and improve predictive performance by aggregating the prediction of multiple models [29]. At present, several studies have employed Ensemble Model among patients with cardiovascular diseases. For instance, Yang et al. [30]. performed a study evaluating the potential of machine learning models including ElasticNet, Random Forest, XGBoost Machine, Deep Learning, Ensemble Model, Support Vector Machine, and Logistic Regression to predict cardiovascular risk in hypertensive population. Ensemble Model showed superior performance with AUC being 0.760 than Logistic Regression with AUC being 0.737 and other tested models. Chen et al. [31]. developed Ensemble Model integrated by two machine learning methods (random down-sampling and random forest) to improve the accuracy of disease prediction and risk stratification in CHD patients. This model achieved good performance with AUC being 0.895 in random testing and 0.905 in sequential testing, respectively.

Our study developed an accurate model to predict one-year mortality among elderly CHD patients with anemia. Ensemble Model and other machine learning models were introduced for analysis in the study. Ensemble Model outperformed other models with AUC being 0.828 and Brier score being 0.170, indicating excellent predictive effectiveness. Subgroup analysis identified that clinical characteristics were significantly associated with one-year mortality with HR, CHF, tachycardia and β receptor blockers being risk factors and hemoglobin, albumin, HDL-C, eGFR, LVEF, aspirin, clopidogrel, CCBs, ACEIs/ARBs, and statins being protective factors. Most variables could be modulated through clinical interventions to enhance survival outcome and decrease mortality risk. All potential effects of confounding factors and more in-depth analysis of mentioned variables could indeed enhance the interpretation of elderly CHD patients with anemia.

Machine learning model’s ability to manage complex interactions and confounding variables played a crucial role in deriving meaningful insights from the available data [32]. The developed machine learning model in our study effectively discriminated between the low-risk and high-risk elderly CHD patients with anemia. This ability could potentially identify the high-risk patients and implement timely and effective interventions. Our risk stratification system clearly demonstrated that patients in the high-risk group had significantly higher one-year mortality compared with those in the low-risk group. This underscores immediate applicability of our model in predicting survival outcome of this specific population.

The developed model could help establish personalized therapeutic plans. For elderly CHD patients with anemia who were at high risk for one-year mortality, clinical interventions should be more comprehensive and intensive. Medication management is crucial including anti-platelets (aspirin and clopidogrel) to prevent blood clots, CCBs and ACEIs/ARBs to manage blood pressures, and statins to lower blood lipids. Comorbidities including CHF and tachycardia should be prevented and treated as a significant aspect of personalized therapeutic plans. Regular monitoring, treatment and follow-up are also essential involving frequent medical check-ups and home monitoring to track vital health metrics including hemoglobin, albumin, HDL-C, eGFR, and LVEF. Patient education and support through cardiac rehabilitation programs could play a role in managing clinical condition and improving life quality. Patients who were at low risk for one-year mortality require less intensive but still proactive management. Medication management includes standard cardiovascular drugs, potentially at low doses. Patient education, metric self-monitoring, routine monitoring (typically annual or bi-annual visits), and ongoing motivation and advice from support groups are also essential to maintaining their health.

Model clinical value analysis

Elevated HR is an important compensatory mechanism for maintaining oxygen delivery during anemia [33]. John et al. [34]. demonstrated that HR had an inverse linear relationship with hemoglobin with a mean increase of 3.9 beats per minute per gram of hemoglobin decrease. Nevertheless, numerous studies have indicated that elevated HR is an independent risk factor for cardiovascular diseases [35, 36]. Our study confirmed that tachycardia had deleterious effects on CHD patients with anemia, and patients with one-year mortality had higher HR than those surviving over one year. β receptor blockers are the main drugs for CHD, which could significantly reduce mortality risk in CHD patients [37,38,39]. Cardioprotective effects of β receptor blockers in CHD are largely based on their HR-lowering role [40]. However, John et al. [34, 41]. found that elevated HR in response to anemia could not be eliminated with β-adrenergic blockade. In clinical practice, clinicians need to distinguish between pathologic increase in HR and physiologic increase that is due to compensation for anemia to avoid excessive use of HR-lowering drugs.

Elderly patients frequently encounter challenges such as malnutrition, infection, renal dysfunction, and overloaded fluid, which result in hypoproteinemia and mortality. It have been proved that low albumin predicted poor outcome of CHD patients [42]. A meta-analysis showed that low albumin was associated with an increased cardiovascular risk in healthy individuals [43]. The reason for these findings may be that albumin plays protective roles through its anti-inflammatory, anti-oxidant, and anti-thrombotic effects, and it is affected by inflammation, infection, nutritional status, and fluid load [44]. Thus, albumin could serve as a potential prognostic biomarker for cardiovascular diseases, helping discern disease progression promptly and accurately.

Compared with those surviving over one year, patients with one-year mortality had lower HDL-C and less statins. Consistent with our findings, previous studies have demonstrated that HDL-C is inversely associated clinical outcome of CHD patients [45, 46]. HDL-C could regulate atherosclerotic process and mediate cardioprotective effects like anti-inflammatory, anti-oxidant, and anti-thrombotic properties [47, 48]. Elevated HDL-C after treatment with stains could reduce cardiovascular diseases and consequent mortality [49]. Therefore, clinicians should actively monitor and manage HDL-C at a reasonable level to enhance survival outcome of CHD patients with anemia.

Previous studies have demonstrated that ACEIs/ARBs are effective therapies in the reduction of hemoglobin in patients with polycythemia [50, 51]. The mechanism through which ACEIs/ARBs cause a decline in hemoglobin is that both ACEIs and ARBs could inhibit erythroid precursors by reducing erythropoietin induction [52, 53]. Although ACEIs/ARBs may theoretically aggravate hemoglobin reduction, our study suggested that ACEIs/ARBs could decrease mortality risk in elderly CHD patients with anemia. ACEIs/ARBs, when combined with other therapeutic drugs, may provide more beneficial effects in the treatment of elderly CHD populations with anemia.

In our study, patients with one-year mortality had elevated proportion of CHF, lower LVEF and eGFR, and less aspirin, clopidogrel, and CCBs. In clinical practice, patients with anemia are inadequately prescribed anti-platelet drugs presumably due to concern about bleeding [54]. For instance, Nikolsky et al. [12]. found that up to 18% of CHD patients with anemia were no longer receiving anti-platelet drugs in one-year follow-up. It is widely acknowledged that anti-platelet drugs could effectively achieve CHD prevention [55]. Consequently, decreased anti-platelet drugs could also contribute to poor outcome of CHD patients with anemia. A meta-analysis performed by Sripal et al. [56]. demonstrated that CCBs were associated with reduced cardiovascular risk in patients with CHD. In elderly CHD patients with anemia, our study found that CCBs were associated with lower mortality risk. Impaired renal function results in a decrease in absolute production of erythropoietin, which contributes to suppression of bone marrow and anemia in cardiovascular diseases [57, 58]. Impaired renal function could both lead to limited oxygen supply and increased cardiac workload [58]. CHF is accountable for adverse outcome of CHD patients and is one of the primary causes of mortality. Our study showed that CHF was associated with increased mortality risk in elderly CHD patients with anemia. Therefore, effective prevention and management of CHF could reduce mortality risk in elderly CHD patients with anemia.

Limitation

Despite promising performance of our ensemble machine learning model in predicting one-year mortality in elderly CHD patients with anemia, several limitations must be acknowledged. Firstly, study population was derived from a cohort of 974 patients in a large tertiary hospital, with external validation performed on an additional 112 patients from another large tertiary hospital. While this study provided a solid foundation for initial validation, sample size and gender imbalance may limit the generalizability of the findings. This demographic reality should be factored in when evaluating model’s relevance and generalizability to broader populations. Thus, larger multicenter studies are necessary to confirm the robustness of our model across diverse study populations and clinical scenarios. Secondly, our dataset, though comprehensive, is still constrained by the available variables. There may be other unmeasured or unknown factors affecting mortality risk that were not captured in our analysis, such as socioeconomic status, lifestyle factors, changed drugs or disease progression. More diverse datasets with a broader range of variables could potentially further enhance model’s predictive ability, but this model would become complex and clinical practicality would be compromised. Thirdly, while machine learning models including Ensemble Model provide high accuracy and robustness, they often function as black boxes making it challenging to interpret underlying mechanisms driving the prediction. Although SHAP elucidates the importance of variables, a fully transparent model that clinicians could easily interpret is still needed to gain wider acceptance in clinical practice. Lastly, this study concentrated on the one-year mortality, excluding long-term mortality from its scope. Consequently, future research incorporating extended follow-up periods could provide valuable insights for predicting long-term mortality in those patients.

Conclusions

An ensemble machine learning model effectively predicts one-year mortality in elderly CHD patients with anemia. It outperforms other algorithms with superior ability in identifying these patients at high risk for one-year mortality. This model identifies key risk factors and protective factors, providing valuable insights for improving risk assessment, informing clinical decision-making and performing targeted interventions. Early intervention measures including preventing CHF and tachycardia, improving hemoglobin, albumin, HDL-C, eGFR, and LVEF, and using aspirin, clopidogrel, CCBs, ACEIs/ARBs, and statins are recommended for elderly CHD patients with anemia.

Data availability

No datasets were generated or analysed during the current study.

References

  1. Benjamin EJ, Blaha MJ, Chiuve SE, Cushman M, Das SR, Deo R, de Ferranti SD, Floyd J, Fornage M, Gillespie C, et al. Heart Disease and Stroke Statistics-2017 update: a Report from the American Heart Association. Circulation. 2017;135(10):e146–603.

    Article  Google Scholar 

  2. writing committee. Of the report on cardiovascular h, diseases in c: report on Cardiovascular Health and diseases in China 2021: an updated Summary. Biomed Environ Sci. 2022;35(7):573–603.

    Google Scholar 

  3. Kaiafa G, Kanellos I, Savopoulos C, Kakaletsis N, Giannakoulas G, Hatzitolios AI. Is anemia a new cardiovascular risk factor? Int J Cardiol. 2015;186:117–24.

    Article  Google Scholar 

  4. Spence RK. The economic burden of anemia in heart failure. Heart Fail Clin. 2010;6(3):373–83.

    Article  Google Scholar 

  5. Kansagara D, Dyer E, Englander H, Fu R, Freeman M, Kagen D. Treatment of anemia in patients with heart disease: a systematic review. Ann Intern Med. 2013;159(11):746–57.

    Article  Google Scholar 

  6. Rymer JA, Rao SV. Anemia and coronary artery disease: pathophysiology, prognosis, and treatment. Coron Artery Dis. 2018;29(2):161–7.

    Article  Google Scholar 

  7. Goel H, Hirsch JR, Deswal A, Hassan SA. Anemia in Cardiovascular Disease: marker of Disease Severity or Disease-modifying therapeutic target? Curr Atheroscler Rep. 2021;23(10):61.

    Article  Google Scholar 

  8. Anand IS, Kuskowski MA, Rector TS, Florea VG, Glazer RD, Hester A, Chiang YT, Aknay N, Maggioni AP, Opasich C, et al. Anemia and change in hemoglobin over time related to mortality and morbidity in patients with chronic heart failure: results from Val-HeFT. Circulation. 2005;112(8):1121–7.

    Article  Google Scholar 

  9. Sabatine MS, Morrow DA, Giugliano RP, Burton PB, Murphy SA, McCabe CH, Gibson CM, Braunwald E. Association of hemoglobin levels with clinical outcomes in acute coronary syndromes. Circulation. 2005;111(16):2042–9.

    Article  Google Scholar 

  10. da Silveira AD, Ribeiro RA, Rossini AP, Stella SF, Ritta HA, Stein R, Polanczyk CA. Association of anemia with clinical outcomes in stable coronary artery disease. Coron Artery Dis. 2008;19(1):21–6.

    Article  Google Scholar 

  11. Sarnak MJ, Tighiouart H, Manjunath G, MacLeod B, Griffith J, Salem D, Levey AS. Anemia as a risk factor for cardiovascular disease in the atherosclerosis risk in communities (ARIC) study. J Am Coll Cardiol. 2002;40(1):27–33.

    Article  Google Scholar 

  12. Nikolsky E, Aymong ED, Halkin A, Grines CL, Cox DA, Garcia E, Mehran R, Tcheng JE, Griffin JJ, Guagliumi G, et al. Impact of anemia in patients with acute myocardial infarction undergoing primary percutaneous coronary intervention: analysis from the controlled Abciximab and device investigation to Lower Late Angioplasty complications (CADILLAC) trial. J Am Coll Cardiol. 2004;44(3):547–53.

    Article  Google Scholar 

  13. Ohana-Sarna-Cahan L, Atar S. Clinical outcomes of patients with acute coronary syndrome and moderate or severe chronic anaemia undergoing coronary angiography or intervention. Eur Heart J Acute Cardiovasc Care. 2018;7(7):646–51.

    Article  Google Scholar 

  14. Gaye B, Canonico M, Perier MC, Samieri C, Berr C, Dartigues JF, Tzourio C, Elbaz A, Empana JP. Ideal Cardiovascular Health, Mortality, and vascular events in Elderly subjects: the three-city study. J Am Coll Cardiol. 2017;69(25):3015–26.

    Article  Google Scholar 

  15. Stevens CA, Lyons AR, Dharmayat KI, Mahani A, Ray KK, Vallejo-Vaz AJ, Sharabiani MT. Ensemble machine learning methods in screening electronic health records: a scoping review. Digit Health. 2023;9:20552076231173225.

    Google Scholar 

  16. Shi X, Cui Y, Wang S, Pan Y, Wang B, Lei M. Development and validation of a web-based artificial intelligence prediction model to assess massive intraoperative blood loss for metastatic spinal disease using machine learning techniques. Spine J 2023.

  17. Fox K, Garcia MA, Ardissino D, Buszman P, Camici PG, Crea F, Daly C, De Backer G, Hjemdahl P, Lopez-Sendon J, et al. Guidelines on the management of stable angina pectoris: executive summary: the Task Force on the management of stable angina Pectoris of the European Society of Cardiology. Eur Heart J. 2006;27(11):1341–81.

    Article  Google Scholar 

  18. Thygesen K, Alpert JS, White HD, Joint ESCAAHAWHFTFftRoMI, Jaffe AS, Apple FS, Galvani M, Katus HA, Newby LK, Ravkilde J, et al. Universal definition of myocardial infarction. Circulation. 2007;116(22):2634–53.

    Article  Google Scholar 

  19. Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–47.

    Article  Google Scholar 

  20. Fuster V, Ryden LE, Cannom DS, Crijns HJ, Curtis AB, Ellenbogen KA, Halperin JL, Le Heuzey JY, Kay GN, Lowe JE, et al. ACC/AHA/ESC 2006 guidelines for the management of patients with Atrial Fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice guidelines and the European Society of Cardiology Committee for Practice Guidelines (Writing Committee to revise the 2001 guidelines for the management of patients with Atrial Fibrillation): developed in collaboration with the European Heart Rhythm Association and the Heart Rhythm Society. Circulation. 2006;114(7):e257–354.

    Google Scholar 

  21. Dickstein K, Cohen-Solal A, Filippatos G, McMurray JJ, Ponikowski P, Poole-Wilson PA, Stromberg A, van Veldhuisen DJ, Atar D, Hoes AW, et al. ESC guidelines for the diagnosis and treatment of acute and chronic heart failure 2008: the Task Force for the diagnosis and treatment of Acute and Chronic Heart failure 2008 of the European Society of Cardiology. Developed in collaboration with the Heart Failure Association of the ESC (HFA) and endorsed by the European Society of Intensive Care Medicine (ESICM). Eur Heart J. 2008;29(19):2388–442.

    Article  Google Scholar 

  22. Ma YC, Zuo L, Chen JH, Luo Q, Yu XQ, Li Y, Xu JS, Huang SM, Wang LN, Huang W, et al. Modified glomerular filtration rate estimating equation for Chinese patients with chronic kidney disease. J Am Soc Nephrol. 2006;17(10):2937–44.

    Article  Google Scholar 

  23. National Kidney F. K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Am J Kidney Dis. 2002;39(2 Suppl 1):S1–266.

    Google Scholar 

  24. Gao L, Cao Y, Cao X, Shi X, Lei M, Su X, Liu Y. Machine learning-based algorithms to predict severe psychological distress among cancer patients with spinal metastatic disease. Spine J 2023.

  25. Lei M, Han Z, Wang S, Guo C, Zhang X, Song Y, Lin F, Huang T. Biological signatures and prediction of an immunosuppressive status-persistent critical illness-among orthopedic trauma patients using machine learning techniques. Front Immunol. 2022;13:979877.

    Article  Google Scholar 

  26. Lei M, Han Z, Wang S, Han T, Fang S, Lin F, Huang T. A machine learning-based prediction model for in-hospital mortality among critically ill patients with hip fracture: an internal and external validated study. Injury. 2023;54(2):636–44.

    Article  Google Scholar 

  27. Madhavan MV, Gersh BJ, Alexander KP, Granger CB, Stone GW. Coronary artery disease in patients >/=80 years of age. J Am Coll Cardiol. 2018;71(18):2015–40.

    Article  Google Scholar 

  28. Patel KV. Variability and heritability of hemoglobin concentration: an opportunity to improve understanding of anemia in older adults. Haematologica. 2008;93(9):1281–3.

    Article  Google Scholar 

  29. Younis EMG, Zaki SM, Kanjo E, Houssein EH. Evaluating ensemble learning methods for multi-modal emotion Recognition using Sensor Data Fusion. Sens (Basel) 2022, 22(15).

  30. Xi Y, Wang H, Sun N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: a study involving 143,043 Chinese patients with hypertension. Front Cardiovasc Med. 2022;9:1025705.

    Article  Google Scholar 

  31. Chen B, Ruan L, Yang L, Zhang Y, Lu Y, Sang Y, Jin X, Bai Y, Zhang C, Li T. Machine learning improves risk stratification of coronary heart disease and stroke. Ann Transl Med. 2022;10(21):1156.

    Article  Google Scholar 

  32. Tomihama RT, Camara JR, Kiang SC. Machine learning analysis of confounding variables of a convolutional neural network specific for abdominal aortic aneurysms. JVS Vasc Sci. 2023;4:100096.

    Article  Google Scholar 

  33. Weiskopf RB, Viele MK, Feiner J, Kelley S, Lieberman J, Noorani M, Leung JM, Fisher DM, Murray WR, Toy P, et al. Human cardiovascular and metabolic response to acute, severe isovolemic anemia. JAMA. 1998;279(3):217–21.

    Article  Google Scholar 

  34. Feiner JR, Finlay-Morreale HE, Toy P, Lieberman JA, Viele MK, Hopf HW, Weiskopf RB. High oxygen partial pressure decreases anemia-induced heart rate increase equivalent to transfusion. Anesthesiology. 2011;115(3):492–8.

    Article  Google Scholar 

  35. Palatini P, Casiglia E, Julius S, Pessina AC. High heart rate: a risk factor for cardiovascular death in elderly men. Arch Intern Med. 1999;159(6):585–92.

    Article  Google Scholar 

  36. Jouven X, Empana JP, Schwartz PJ, Desnos M, Courbon D, Ducimetiere P. Heart-rate profile during exercise as a predictor of sudden death. N Engl J Med. 2005;352(19):1951–8.

    Article  Google Scholar 

  37. Hagsund T, Olsson SE, Smith JG, Madsen Hardig B, Wagner H. beta-blockers after myocardial infarction and 1-year clinical outcome - a retrospective study. BMC Cardiovasc Disord. 2020;20(1):165.

    Article  Google Scholar 

  38. Freemantle N, Cleland J, Young P, Mason J, Harrison J. beta blockade after myocardial infarction: systematic review and meta regression analysis. BMJ. 1999;318(7200):1730–7.

    Article  Google Scholar 

  39. Bangalore S, Makani H, Radford M, Thakur K, Toklu B, Katz SD, DiNicolantonio JJ, Devereaux PJ, Alexander KP, Wetterslev J, et al. Clinical outcomes with beta-blockers for myocardial infarction: a meta-analysis of randomized trials. Am J Med. 2014;127(10):939–53.

    Article  Google Scholar 

  40. Palatini P. Elevated heart rate in cardiovascular diseases: a target for treatment? Prog Cardiovasc Dis. 2009;52(1):46–60.

    Article  Google Scholar 

  41. Lieberman JA, Weiskopf RB, Kelley SD, Feiner J, Noorani M, Leung J, Toy P, Viele M. Critical oxygen delivery in conscious humans is less than 7.3 ml O2 x kg(-1) x min(-1). Anesthesiology. 2000;92(2):407–13.

    Article  Google Scholar 

  42. Danesh J, Collins R, Appleby P, Peto R. Association of fibrinogen, C-reactive protein, albumin, or leukocyte count with coronary heart disease: meta-analyses of prospective studies. JAMA. 1998;279(18):1477–82.

    Article  Google Scholar 

  43. Pignatelli P, Farcomeni A, Menichelli D, Pastori D, Violi F. Serum albumin and risk of cardiovascular events in primary and secondary prevention: a systematic review of observational studies and bayesian meta-regression analysis. Intern Emerg Med. 2020;15(1):135–43.

    Article  Google Scholar 

  44. Zhang Z, Pereira SL, Luo M, Matheson EM. Evaluation of blood biomarkers Associated with risk of Malnutrition in older adults: a systematic review and Meta-analysis. Nutrients 2017, 9(8).

  45. Acharjee S, Roe MT, Amsterdam EA, Holmes DN, Boden WE. Relation of admission high-density lipoprotein cholesterol level and in-hospital mortality in patients with acute non-ST segment elevation myocardial infarction (from the National Cardiovascular Data Registry). Am J Cardiol. 2013;112(8):1057–62.

    Article  Google Scholar 

  46. Ishida M, Itoh T, Nakajima S, Ishikawa Y, Shimoda Y, Kimura T, Fusazaki T, Morino Y. A low early high-density lipoprotein cholesterol level is an independent predictor of In-hospital death in patients with Acute Coronary Syndrome. Intern Med. 2019;58(3):337–43.

    Article  Google Scholar 

  47. Choi BG, Vilahur G, Yadegar D, Viles-Gonzalez JF, Badimon JJ. The role of high-density lipoprotein cholesterol in the prevention and possible treatment of cardiovascular diseases. Curr Mol Med. 2006;6(5):571–87.

    Article  Google Scholar 

  48. Young CE, Karas RH, Kuvin JT. High-density lipoprotein cholesterol and coronary heart disease. Cardiol Rev. 2004;12(2):107–19.

    Article  Google Scholar 

  49. Brown BG, Zhao XQ, Chait A, Fisher LD, Cheung MC, Morse JS, Dowdy AA, Marino EK, Bolson EL, Alaupovic P, et al. Simvastatin and niacin, antioxidant vitamins, or the combination for the prevention of coronary disease. N Engl J Med. 2001;345(22):1583–92.

    Article  Google Scholar 

  50. Wang AY, Yu AW, Lam CW, Yu LM, Li PK, Goh J, Lui SF. Effects of losartan or enalapril on hemoglobin, circulating erythropoietin, and insulin-like growth factor-1 in patients with and without posttransplant erythrocytosis. Am J Kidney Dis. 2002;39(3):600–8.

    Article  Google Scholar 

  51. Plata R, Cornejo A, Arratia C, Anabaya A, Perna A, Dimitrov BD, Remuzzi G, Ruggenenti P. Commission on Global Advancement of Nephrology RSotISoN: angiotensin-converting-enzyme inhibition therapy in altitude polycythaemia: a prospective randomised trial. Lancet. 2002;359(9307):663–6.

    Article  Google Scholar 

  52. Ishani A, Weinhandl E, Zhao Z, Gilbertson DT, Collins AJ, Yusuf S, Herzog CA. Angiotensin-converting enzyme inhibitor as a risk factor for the development of anemia, and the impact of incident anemia on mortality in patients with left ventricular dysfunction. J Am Coll Cardiol. 2005;45(3):391–9.

    Article  Google Scholar 

  53. Cheungpasitporn W, Thongprayoon C, Chiasakul T, Korpaisarn S, Erickson SB. Renin-angiotensin system inhibitors linked to anemia: a systematic review and meta-analysis. QJM. 2015;108(11):879–84.

    Article  Google Scholar 

  54. Lawler PR, Filion KB, Dourian T, Atallah R, Garfinkle M, Eisenberg MJ. Anemia and mortality in acute coronary syndromes: a systematic review and meta-analysis. Am Heart J. 2013;165(2):143–53. e145.

    Article  Google Scholar 

  55. Breddin K, Loew D, Lechner K, Oberla K, Walter E. The german-austrian aspirin trial: a comparison of acetylsalicylic acid, placebo and phenprocoumon in secondary prevention of myocardial infarction. On behalf of the German-Austrian Study Group. Circulation. 1980;62(6 Pt 2):V63–72.

    Google Scholar 

  56. Bangalore S, Parkar S, Messerli FH. Long-acting calcium antagonists in patients with coronary artery disease: a meta-analysis. Am J Med. 2009;122(4):356–65.

    Article  Google Scholar 

  57. Mozos I. Mechanisms linking red blood cell disorders and cardiovascular diseases. Biomed Res Int 2015, 2015:682054.

  58. McCullough PA, Lepor NE. Piecing together the evidence on anemia: the link between chronic kidney disease and cardiovascular disease. Rev Cardiovasc Med. 2005;6(Suppl 3):S4–12.

    Google Scholar 

Download references

Acknowledgements

We appreciate all staff for their continued cooperation and contribution in the development of machine learning models.

Funding

No funding.

Author information

Authors and Affiliations

Authors

Contributions

L.C., Y.N., H.W., Y.L., Y.Z., Q.Z., M.L. and S.F. contributed to study design, performed data collection and analysis, and drafted whole paper. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Yali Zhao, Qian Zhang, Mingxing Lei or Shihui Fu.

Ethics declarations

Ethics approval and consent to participate

This study received the approval from Ethics Committee of Chinese PLA General Hospital, and was performed in accordance with tenets and provisions of Declaration of Helsinki, 1975.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Table 1: Machine learning models and their optimal hyper-parameters

40537_2024_966_MOESM2_ESM.png

Supplementary Figure 1: Area under the curve (AUC) of receiver operating characteristics (ROC) for Ensemble Model in the external validation cohort

40537_2024_966_MOESM3_ESM.png

Supplementary Figure 2: Density curves for machine learning models in the external validation cohort. (A) Logistic Regression; (B) Naïve Bayesian; (C) XGBoosting Machine; (D) Support Vector Machine; (E) Decision Tree; (F) Ensemble Model

40537_2024_966_MOESM4_ESM.png

Supplementary Figure 3: Violin plots for machine learning models in the external validation cohort. (A) Logistic Regression; (B) Naïve Bayesian; (C) XGBoosting Machine; (D) Support Vector Machine; (E) Decision Tree; (F) Ensemble Model

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, L., Nie, Y., Wen, H. et al. An ensemble machine learning model for predicting one-year mortality in elderly coronary heart disease patients with anemia. J Big Data 11, 99 (2024). https://doi.org/10.1186/s40537-024-00966-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40537-024-00966-x

Keywords