 Research
 Open Access
A MapReducebased Adjoint method for preventing brain disease
 Manal Zettam^{1}Email author,
 Jalal Laassiri^{1} and
 Nourddine Enneya^{1}
 Received: 4 May 2018
 Accepted: 25 July 2018
 Published: 2 August 2018
Abstract
In this paper, we present a statistical model performed on the basis of a patient dataset. This model predicts efficiently the brain disease risk. Multiple regression was used to build the statistical model. The least squares estimation problem usually used to estimate the parameters of regression model is solved via parallelized algebraic Adjoint method. As the parallelized algebraic Adjoint method is not the only Mapreducebased method used to solve the least square problem, experimentations were carried out to classify the Adjoint method amongst the other methods. The calculated job completion time shows the competitive trait of the Mapreducebased Adjoint method.
Keywords
 Brain disease
 Adjoint method
 Multiple regression
 MapReduce
Introduction
Further studies highlight the relevance of physical exercises and diet to prevent Alzheimer’s disease [8–11]. Moreover, the relationship between burden and Alzheimer’s disease is pinpointed in Bu et al. [12], the one binding bacterial infection and Alzheimer’s disease is identified in Maheshwari and Eslick [13] and finally the one relating the Lyme and Alzheimer’s diseases is reported in MacDonald [14].
To the best of our knowledge, existing studies of Alzheimer’s disease prediction do not rely on a software solution based on factors such as ages, daily work’s hours, and the existence of a parent with Alzheimer’s disease. Therefore, we propose a solution which receives a dataset of patients with a variable number of attributes and then constructs a statistical model to spot eventual Alzheimer’s disease patients. We also parallelize the Adjoint method via MapReduce. The parallelized algebraic Adjoint method has been presented briefly for the first time by our previous work in Zettam et al. [15].
The proposed solution estimates the Alzheimer’s disease risk based on a statistical model. Statistical models for prediction can be discerned in three main classes: regression, classification, and neural networks [16].
Regression analysis is one of the most predominant empirical tools. It is used to predict the unknown value of a variable from the known value of one or more variables also called the predictors [17]. The simple, multiple and logistics regression are the most used forms of regression in the literature [18]. The adequate choose of the regression model form depends on the number of predictors and the type of the outcome variable. The book referenced in Hosmer and Lemeshow [19] presents a detailed overview of logistic regression and its applications. In their part, the references [20, 21] give detailed overviews of simple and multiple regressions with examples of their applications in real life problems. In medical field, several studies used the regression model such as predicting longterm mortality in oesophageal [22] and relative survival in cancer registries [23].
Classification has two distinct meanings. The first type is known as unsupervised learning (or clustering), the second as supervised learning [17]. In the statistical literature, supervised learning is usually referred to as discrimination, by which is meant the establishing of the classification rule from given correctly classified data [17]. Chatap and Shrivastava [24] presented a detailed survey on classification methods involved in medical field such as the CART method [25], The CSO decision tree algorithm [26], Chi squared automated interaction detection [27], Quick, Unbiased, Efficient, Statistical Tree (QUEST) [28], Discriminate Analysis [29]. Further information can be found in Michie et al. [17].
The term neural network encompasses a large class of models and learning methods. Neural network method is a nonlinear statistical model. Neural network was developed decades ago by scientists attempting to model the learning process of human brain [30]. The most known method of neural network is called the single hidden layer backpropagation network. The discovery of back propagation in the late 80s by Rumelhart et al. [31] was an impetus to the adoption of neural network in several fields such as medical field. In this field, the neural network methods have proven their efficiency as a diagnosing tool. Indeed, since the study performed by Szolovits et al. [32] many studies have been published such as colorectal cancer [33], multiple sclerosis lesions [34], colon cancer [35], pancreatic disease [36], gynecological diseases [37], and early diabetes [38]. Readers may refer to Amato et al. [39] for more details.
Other statistical models which not fit in the three main classes are used in the prediction literature such as those presented in CesaBianchi and Lugosi [40] and Chen et al. [41]. Those models differ from the ones we presented above.
As stated before, the choice of a suitable statistical model depends on the type of predictors and the nature of the outcome. Furthermore, the use of variance analysis instead of regression to provide a quantitative outcome is a common issue pointed out by a number of statisticians such as Anderson et al. [42], Tribout [43]. These authors clearly report the main differences between regression and variance analysis. In addition, the reference [43] claims that some of software solutions aiming at facilitating their use combine regression and variance analysis under the acronym ANOVA.
In this study, the regression model is used to perform the prediction model due to the nature of predictors and outcome variable. The rest of the paper is organized as follows. The second section addresses the case study. This section is discerned in many subsections that present the variables used for modeling, detail the sampling stage, relate the application of multiple regression, give a brief overview of the Adjoint method used to solve the least squares estimation problem and introduce the MRAM method. Then, the third section presents the technique used to evaluate the strength of the resulting model. Finally, the last section sums up the current work.
The Alzheimer’s disease prediction case study
The regression analysis is a statistical model that indicates how the variables are related on the basis of an equation. Formally, the variable we are trying to predict is called dependent variable, the variable or variables to predict the value of the dependent variable are called independent variables (predictors). The simple regression is a regression with single independent variable. The multiple regression is a regression with multiple independent variables. The procedures to accomplish simple and multiple regression are in somehow similar.
The simple regression
Assuming the case where the Alzheimer’s disease risk is predicted on the base of one predictor, for instance, the age of a patient. The population undertaken in this study is a population of patients recorded in a dataset. The aim is to predict the percentage of Alzheimer’s disease risk denoted y on the base of the patient age denoted x_{1}.
The multiple regression
Assuming the case where the Alzheimer’s disease risk is predicted on the base of several predictors, for instance, the age of a patient, the geographical area, the number of work hour, the physical exercises ‘hours, the existence of a parent with Alzheimer’s disease, the feeding, and the existence of Lyme disease risk. The population undertaken in this study is a population of patients recorded in a dataset. The aim is to predict the percentage of Alzheimer’s disease risk denoted on the base of the predictors pinpointed out above.
The Alzheimer’s disease prediction statistical model
As we pointed out earlier in this paper, we believe that seven predictors have a great impact on predicting the Alzheimer’s disease risk. The first predictor denoted x_{1} is the age of an individual. The second predictor denoted x_{2} is the geographical area. The third one denoted x_{3} is the work’s hours per a day. The fourth one denoted x_{4} is the physical exercises’ hours. The fifth one denoted x_{5} is the existence of a parent with Alzheimer’s disease. The sixth one denoted x_{6} is the quality of feeding. The seventh and the last predictors denoted x_{7} is the existence of Lyme disease. In conducting a statistical study, we would like to answer the following questions: do these variables really impact the Alzheimer’s disease risk? Is there a relationship between the variables? If so, define this relationship. Can the values of these parameters be adjusted in order to efficiently predict the Alzheimer’s disease risk?
Throughout this paper, the steps undertaken to estimate the unknown parameters \(\beta_{i \in [0,7]}\) are explained in details. The next section explains the sampling stage.
The Alzheimer’s disease prediction sampling stage
The sampling stage is a fundamental stage that has a great impact on the accuracy of the prediction model. Indeed, a small sample or a sample with similar individuals could lead to an inaccurate model [42]. Thus, sampling efficiently means predict efficiently. To tackle the problem of small samples a great number of statistical methods renowned for predicting efficiently based on small sample such as Hurvich and Tsai [47]. In addition, the central limit theorem could be applied when the population is large. This theorem states that the sampling distribution of the sample mean can be approximated by a normal probability distribution in the case of large sample. In practice, the sampling distribution can be approximated by a normal distribution when the sample size is greater than or equal to 30 [42].
The function notsimilar takes a patient as an income and returns a Boolean as an outcome. The function compares each attribute of the income to the attributes of the sample if there is any similarity the function returns false. Otherwise it returns true.
The Adjoint method for the least squares estimation problem
This part of code was tested against large scale data to discover its limits. Unfortunately, this method suffers from shortcomings when the patients ‘sample is large and when the number of predictors is colossal. To overcome those shortcomings a new computational approach is presented thereafter. This method is massively parallel to absorb the massive calculations and to increase the method performance.
MRAM: MapReduce with Adjoint method
MapReduce is a programming model for data processing [48]. It enables distributed algorithms in parallel on clusters of machines with varied features. MApReduce also handles the parallel computation issues thus the users deploy their efforts on programming model. Since its advent MapReduce has gained popularity in both scientific community and firms due to its effectiveness in parallel processing [49]. Indeed, the parallelization of QR factorization and SVD matrix decomposition methods is a relevant example of the scientific community interest toward MapReduce. The authors of Benson et al. [50] reported the matrix decomposition methods implemented on MapReduce programming. As pointed out earlier, the QR factorization is the most common method used to solve the least squares estimation problem. To the best of our knowledge, the Adjoint method has not been yet implemented on MapReduce framework. Thus, in this paper an implementation of Adjoint method on MapReduce is detailed in the aim to solve the least squares estimation problem.
Working within map reduce requires redesigning the traditional algorithms. As a matter of fact, the computation is expressed as two phases: Map and reduce. Each phase has keyvalue pairs as input and output. Two functions should also be specified: the map function and the reduce function. The types of keyvalue pairs may be chosen by the programmer.
A MapReducebased Adjoint method (MRAM) is proposed by this paper to make conventional Adjoint method work effectively in distributed environment. Our method has two steps. The following part describes in detail the two steps of our method.
Evaluation and experimental results
In this section, we evaluate the accuracy and the performance of the proposed model on simulated data based on actual data of Riskalz dataset and of previous studies. To validate the resulting model and to evaluate its strength, the proposed solution involves additional steps that are detailed thereafter.
Prediction accuracy measures
Fisher’s, Student’s test and correlation coefficient
Fisher’s Ftest, also called global significance test; is used to determine if there is a significant relationship between the dependent variable and the set of independent variables. However, Student’s t test, called individual significance test, is used to determine whether each of the independent variables is significant. A Student test is performed for each modelindependent variable.
A correlation test is performed between the independent variables of the model. If the correlation coefficient between two variables is greater than 0.70, it is not possible to determine the effect of a particular independent variable on the dependent variable.
A Fisher’s test, based on Fisher’s distribution, can be used to test whether a relationship is meaningful. With a single independent variable, the Fisher’s test leads to the same conclusion as the Student test. On the other hand, with more than one independent variable, only the F test can be used to test the overall meaning of a relationship.
The logic underlying the use of the Fisher’s test to determine whether the relationship is statistically significant or not, is based on the construction of two independent estimates of σ^{2}.
Experiments
In this section, we test the proposed method on three datasets to confirm its robustness. For each case study, a brief description is given. At the end of this section, we carried out experiments and we compared the actual and predicted values for each case study.
a. Student performance case study
The dataset was collected by using school reports and questionnaires. The collected data approaches students achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features.
Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por).
In the current study, apply our approach to predict the G3 attributes on the basis of the reminder ones. An exhaustive list of attributes and their description could be found at http://archive.ics.uci.edu/ml/datasets/Student+Performance.
b. Parkinsons telemonitoring case study
The dataset is composed of a range of biomedical voice measurements from 42 people with earlystage Parkinson’s disease recruited to a 6month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient’s homes.
The main aim of the data is to predict the motor and total UPDRS scores (‘motor_UPDRS’ and ‘total_UPDRS’) from the 16 voice measures. For more details readers could refers to https://archive.ics.uci.edu/ml/datasets/Parkinsons+Telemonitoring.
c. The Levenson self report psychopathy scale value case study

Participant = Identification number assigned to participant

Eye tracker = Method of eye tracking (1 = head mounted; 2 = tower)

Primary = Primary subscale of the Levenson Self Report Psychopathy Scale

Secondary = Secondary subscale of the Levenson Self Report Psychopathy Scale

Emotion: ANG = Angry expression, DIS = Disgust expression, FEAR = Fear expression, HAP = Happy expression, SAD = Sad expression, SUR = Surprise expression

Intensity: 5 = 55, 9 = 90

Sex: F = Female, M = male

Region: Eyes = Eyes, Mouth = Mouth
Thus, ANG 5 F refers to an angry expression at 55% intensity, expressed by a female face and ANG 5 F Eyes refers to the eye region of the same face.
d. The comparative study
Discussion
In this section we conducted a comparative study with the aim to position the proposed method within the methods solving the least square problem. Therefore, we use the Hadoop job performance model to estimate the job completion time given by Khan et al. [52].
This set of algorithms represents is the set of parallel method based on MapReduce to solve the least square problem.
The estimated \(\beta_{r}\) and \(\beta_{w}\) values for different HDFS file sizes
HDFS Size (GB)  Write (MB/s)  Read (MB/s)  \(\beta_{w}\) (s/MB)  \(\beta_{r}\) (s/MB) 

1  67.72  60.25  0.015  0.017 
32  61.39  85.91  0.016  0.012 
64  81.22  83.91  0.012  0.012 
128  79.56  76.15  0.013  0.013 
Number of reads and writes at each step (in bytes)
Cholesky (Bytes)  Indirect TSQR (bytes)  Direct TSQR (bytes)  Householder QR (bytes)  MRAM (bytes)  

\(R_{1}^{m}\)  8mn + Km  8mn + Km  8mn + Km  8mn + Km  Kmn + (m − 1)(n − 1) 
\(W_{1}^{m}\)  8m_{1}n^{2} + 8m_{1}n  8m_{1}n^{2} + 8m_{1}n  \(8mn + 8m_{1} n^{2} + Km + 64m_{1}\)  8mn + Km  Kmn + 8mn 
\(R_{1}^{r}\)  8m_{1}n^{2} + 8m_{1}n  8m_{1}n^{2} + 8m_{1}n  0  0  kmn + 8mn 
\(W_{1}^{r}\)  8n^{2} + 8n  8r_{1}n^{2} + 8r_{1}n  0  0  8 k + 8 k 
\(R_{2}^{m}\)  8n^{2} + 8n  8r_{1}n^{2} + 8r_{1}n  8m_{1}n^{2} + Km_{1}  8mn + Km  – 
\(W_{2}^{m}\)  8n^{2} + 8n  8r_{1}n^{2} + 8r_{1}n  8m_{1}n^{2} + Km_{1}  16m_{1}  – 
\(R_{2}^{r}\)  8n^{2} + 8n  8r_{1}n^{2} + 8r_{1}n  8m_{1}n^{2} + Km_{1}  0  – 
\(W_{2}^{r}\)  8n^{2} + 8n  8n^{2} + 8n  \(8m_{1} n^{2} + 32m_{1} + 8n^{2} + 8n\)  0  – 
\(R_{3}^{m}\)  \(8mn + Km + m_{3} \left( {8n^{2} + 8n} \right)\)  \(8mn + Km + m_{3} \left( {8n^{2} + 8n} \right)\)  \(8mn + Km + m_{3} \left( {8m_{1} n^{2} + 64m_{1} }\right)\)  –  – 
\(W_{3}^{m}\)  8mn + Km  8mn + Km  8mn + Km  –  – 
\(R_{3}^{r}\)  0  0  0  –  – 
\(W_{3}^{r}\)  0  0  0  –  – 
The computed lower bounds T_{lb} in seconds
HDFS size (GB)  Cholesky (s)  Indirect TSQR (s)  Direct TSQR (s)  Householder QR (s)  MRAM (s) 

32  802  802  1232  8224  625 
64  536  536  618  10,055  467 
128  366  366  10,475  30,994  450 
The Tables 2 and 3 confirms the performance of the proposed solution is competitive with existing methods in terms of number of operations and computational time.
Conclusion
In this paper, we carry out a comparative study between the parallel methods aiming to solve the least square estimation problem and our proposal. The results promote the use of the proposed method as the results confirm its efficiency and rapidity. Moreover, we presents a detailed description of the parallel MapReducebased Adjoint method. The application of the method to predict the Alzheimer’s disease risk confirms its robustness.
Declarations
Authors’ contributions
Authors propose a solution which receives a dataset of patients with a variable number of attributes and then constructs a statistical model to spot eventual Alzheimer’s disease patients. Authors parallelize the Adjoint method via MapReduce to this aim. All authors read and approved the final manuscript.
Acknowledgements
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Not applicable.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Li L, Ge RL, Zhou SM, Valerdi R. Guest editorial integrated healthcare information systems. IEEE Trans Inf Technol Biomed. 2012;16(4):515–7. https://doi.org/10.1109/TITB.2012.2198317.View ArticleGoogle Scholar
 Kumar A, Hancke GP. A Zigbeebased animal health monitoring system. IEEE Sens J. 2015;15(1):610–7. https://doi.org/10.1109/JSEN.2014.2349073.View ArticleGoogle Scholar
 Luke DA, Stamatakis KA. Systems science methods in public health: dynamics, networks, and agents. Annu Rev Public Health. 2012;33(1):357–76. https://doi.org/10.1146/annurevpublhealth031210101222.View ArticleGoogle Scholar
 Ferreira LK, Busatto GF. Neuroimaging in Alzheimer’s disease: current role in clinical practice and potential future applications. Clinics. 2011;66(Suppl 1):19–24. https://doi.org/10.1590/S180759322011001300003.View ArticleGoogle Scholar
 Soucy JP, Bartha R, Bocti C, Borrie M, Burhan AM, Laforce R, RosaNeto P. Clinical applications of neuroimaging in patients with Alzheimer’s disease: a review from the Fourth Canadian consensus conference on the diagnosis and treatment of Dementia 2012. Alzheimer’s Res Ther. 2013;5(1):S3. https://doi.org/10.1186/alzrt199.View ArticleGoogle Scholar
 Thambisetty M, Lovestone S. Bloodbased biomarkers of Alzheimer’s disease: challenging but feasible. Biomarkers Med. 2010;4(1):65–79.View ArticleGoogle Scholar
 Lutz MW, Sundseth SS, Burns DK, Saunders AM, Hayden KM, Burke JR, Roses AD. A geneticsbased biomarker risk algorithm for predicting risk of Alzheimer’s disease. Alzheimer’s Dementia Transl Res Clin Intervent. 2016;2(1):30–44. https://doi.org/10.1016/j.trci.2015.12.002.View ArticleGoogle Scholar
 LiuAmbrose T, Eng JJ, Boyd LA, Jacova C, Davis JC, Bryan S, Hsiung GYR. Promotion of the mind through exercise (PROMoTE): a proofofconcept randomized controlled trial of aerobic exercise training in older adults with vascular cognitive impairment. BMC Neurol. 2010;10(1):14. https://doi.org/10.1186/147123771014.View ArticleGoogle Scholar
 Scarmeas N, Luchsinger JA, Schupf N, et al. Physical activity, diet, and risk of Alzheimer disease. JAMA. 2009;302(6):627–37. https://doi.org/10.1001/jama.2009.1144.View ArticleGoogle Scholar
 Nemati Karimooy H, Hosseini M, Nemati M, Esmaily HO. Lifelong physical activity affects mini mental state exam scores in individuals over 55 years of age. J Bodyw Mov Ther. 2012;16(2):230–5. https://doi.org/10.1016/j.jbmt.2011.08.003.View ArticleGoogle Scholar
 Winchester J, Dick MB, Gillen D, Reed B, Miller B, Tinklenberg J, Cotman CW. Walking stabilizes cognitive functioning in Alzheimer’s disease (AD) across 1 year. Arch Gerontol Geriatr. 2013;56(1):96–103. https://doi.org/10.1016/j.archger.2012.06.016.View ArticleGoogle Scholar
 Bu XL, Yao XQ, Jiao SS, Zeng F, Liu YH, Xiang Y, Wang YJ. A study on the association between infectious burden and Alzheimer’s disease. Eur J Neurol. 2015;22(12):1519–25. https://doi.org/10.1111/ene.12477.View ArticleGoogle Scholar
 Maheshwari P, Eslick GD. Bacterial infection and Alzheimer’s disease: a metaanalysis. J Alzheimer’s Dis. 2015;43(3):957–66. https://doi.org/10.3233/JAD140621.View ArticleGoogle Scholar
 MacDonald AB. Plaques of Alzheimer’s disease originate from cysts of Borrelia burgdorferi, the Lyme disease spirochete. Med Hypotheses. 2006;67(3):592–600. https://doi.org/10.1016/j.mehy.2006.02.035.View ArticleGoogle Scholar
 Zettam M, Laassiri J, Enneya N. A software solution for preventing Alzheimer’s disease based on MapReduce framework. In: 2017 IEEE international conference on information reuse and integration (IRI). San Diego, CA; 2017. p. 192–7. https://doi.org/10.1109/iri.2017.77.
 Hastie T, Tibshirani R, Friedman J. The elements of statistical learning data mining, inference, and prediction. New York: Springer; 2009. https://doi.org/10.1007/9780387848587_1.MATHGoogle Scholar
 Michie D, Spiegelhalter DJ, Taylor CC, Campbell J, editors. Machine learning, neural and statistical classification. Upper Saddle River: Ellis Horwood; 1994.MATHGoogle Scholar
 Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol. 1996;49(11):1225–31. https://doi.org/10.1016/S08954356(96)000029.View ArticleGoogle Scholar
 Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. Hoboken: John Wiley & Sons Inc.; 2005. https://doi.org/10.1002/0471722146.fmatter.MATHGoogle Scholar
 Rencher AC, Christensen WF. Methods of multivariate analysis. 3rd ed. Hoboken: Wiley; 2012.View ArticleMATHGoogle Scholar
 Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2008.MATHGoogle Scholar
 Lecleire S, Di Fiore F, Antonietti M, Ben Soussan E, Hellot MF, Grigioni S, P Ducrotté. Undernutrition is predictive of early mortality after palliative selfexpanding metal stent insertion in patients with inoperable or recurrent esophageal cancer. Gastrointest Endosc. 2006;64(4):479–84. https://doi.org/10.1016/j.gie.2006.03.930.View ArticleGoogle Scholar
 JanssenHeijnen MLG, Houterman S, Lemmens V, Brenner H, Steyerberg EW, Coebergh JWW. Prognosis for longterm survivors of cancer. Ann Oncol. 2007;18(8):1408–13. https://doi.org/10.1093/annonc/mdm127.View ArticleGoogle Scholar
 Chatap NJ, Shrivastava AK. A survey on various classification techniques for medical image data. Int J Comput Appl. 2014;97(15):1–5.Google Scholar
 Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. New ed. Boca Raton: Taylor & Francis Ltd.; 1984.MATHGoogle Scholar
 Quinlan JR. Comparing connectionist and symbolic learning methods. In: Hanson SJ, Rivest RL, Drastal GA, editors. Proceedings of a workshop on computational learning theory and natural learning systems: constraints and prospects, vol. 1. Cambridge: MIT Press; 1994. p. 445–56.Google Scholar
 Kass GV. An exploratory technique for investigating large quantities of categorical data. J Roy Stat Soc Ser C (Appl Stat). 1980;29(2):119–27.Google Scholar
 Lim TS, Loh WY, Shih YS. A comparison of prediction accuracy, complexity, and training time of thirtythree old and new classification algorithms. Mach Learn. 2000;40(3):203–28. https://doi.org/10.1023/A:1007608224229.View ArticleMATHGoogle Scholar
 Klecka WR. Discriminant analysis. 1st ed. Beverly Hills: SAGE Publications Inc.; 1980.View ArticleGoogle Scholar
 Hinton GE. How neural networks learn from experience. Sci Am. 1992;267(3):144–51.View ArticleGoogle Scholar
 Rumelhart DE, Hinton GE, Williams RJ. Learning representations by backpropagating errors. Nature. 1986;323(6088):533–6. https://doi.org/10.1038/323533a0.View ArticleMATHGoogle Scholar
 Szolovits P, Patil RS, Schwartz WB. ARtificial intelligence in medical diagnosis. Ann Intern Med. 1988;108(1):80–7. https://doi.org/10.7326/00034819108180.View ArticleGoogle Scholar
 Spelt L, Andersson B, Nilsson J, Andersson R. Prognostic models for outcome following liver resection for colorectal cancer metastases: a systematic review. Eur J Surg Oncol. 2012;38(1):16–24. https://doi.org/10.1016/j.ejso.2011.10.013.View ArticleGoogle Scholar
 Mortazavi D, Kouzani AZ, SoltanianZadeh H. Segmentation of multiple sclerosis lesions in MR images: a review. Neuroradiology. 2012;54(4):299–320. https://doi.org/10.1007/s0023401108867.View ArticleGoogle Scholar
 Ahmed FE. Artificial neural networks for diagnosis and survival prediction in colon cancer. Mol Cancer. 2005;4(1):29. https://doi.org/10.1186/14764598429.View ArticleGoogle Scholar
 BartoschHärlid A, Andersson B, Aho U, Nilsson J, Andersson R. Artificial neural networks in pancreatic disease. Br J Surg. 2008;95(7):817–26. https://doi.org/10.1002/bjs.6239.View ArticleGoogle Scholar
 Siristatidis CS, Chrelias C, Pouliakis A, Katsimanis E, Kassanos D. Artificial neural networks in gynaecological diseases: current and potential future applications. Med Sci Monit Int Med J Exp Clin Res. 2010;16(10):RA231–6.Google Scholar
 Shankaracharya DO, Samanta S, Vidyarthi AS. Computational intelligence in early diabetes diagnosis: a review. Rev Diabet Stud RDS. 2010;7(4):252–62. https://doi.org/10.1900/RDS.2010.7.252.View ArticleGoogle Scholar
 Amato F, López A, PeñaMéndez EM, Vaňhara P, Hampl A, Havel J. Artificial neural networks in medical diagnosis. J Appl Biomed. 2013;11(2):47–58. https://doi.org/10.2478/v101360120031x.View ArticleGoogle Scholar
 CesaBianchi N, Lugosi G. Prediction, learning, and games. Cambridge: Cambridge University Press; 2006.View ArticleMATHGoogle Scholar
 Chen Y, Crespi N, Ortiz AM, Shu L. Reality mining: a prediction algorithm for disease dynamics based on mobile big data. Inf Sci. 2017;379:82–93. https://doi.org/10.1016/j.ins.2016.07.075.View ArticleGoogle Scholar
 Anderson DR, Sweeney DJ, Williams TA, Camm JD, Cochran JJ. Statistiques pour l’économie et la gestion, 5e édition. De Boeck Universite; 2015.Google Scholar
 Tribout B. Statistiques pour économistes et gestionnaires. London: Pearson Education; 2008.Google Scholar
 Tresch MC, Cheung VCK, d’Avella A. Matrix factorization algorithms for the identification of muscle synergies: evaluation on simulated and experimental data sets. J Neurophysiol. 2006;95(4):2199–212. https://doi.org/10.1152/jn.00222.2005.View ArticleGoogle Scholar
 Giglio L, Kendall JD, Justice CO. Evaluation of global fire detection algorithms using simulated AVHRR infrared data. Int J Remote Sens. 1999;20(10):1947–85. https://doi.org/10.1080/014311699212290.View ArticleGoogle Scholar
 Murray RE, Ryan PB, Reisinger SJ. Design and validation of a data simulation model for longitudinal healthcare data. AMIA Ann Symp Proc. 2011;2011:1176–85.Google Scholar
 Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297–307. https://doi.org/10.2307/2336663.MathSciNetView ArticleMATHGoogle Scholar
 White T. Hadoop: the definitive guide. Farnham: O’Reilly Media Inc; 2009.Google Scholar
 Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13. https://doi.org/10.1145/1327452.1327492.View ArticleGoogle Scholar
 Benson AR, Gleich DF, Demmel J. Direct QR factorizations for tallandskinny matrices in MapReduce architectures. In: 2013 IEEE international conference on big data; 2013. p. 264–72. https://doi.org/10.1109/BigData.2013.6691583.
 Lu P, Pei S, Tolliver D. Regression model evaluation for highway bridge component deterioration using national bridge inventory data. J Transp Res Forum. 2016;55(1):5–16.Google Scholar
 Khan M, Jin Y, Li M, Xiang Y, Jiang C. Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst. 2016;27(2):441–54. https://doi.org/10.1109/TPDS.2015.2405552.View ArticleGoogle Scholar