 Research
 Open Access
 Published:
The best statistical model to estimate predictors of underfive mortality in Ethiopia
Journal of Big Data volume 7, Article number: 63 (2020)
Abstract
The underfive mortality rate is one of the most important indicators of the socioeconomic wellbeing and public health conditions of a country. Underfive death in Ethiopia has reduced, but the rate is still higher than the sustainable development goal target of 20 deaths per 1000 live births. This study aimed to identify the best statistical model to estimate predictors of underfive mortality in Ethiopia. Ethiopian demography and health survey of 2016 data were accessed and used for the analysis. A total of 14,370 women were included. Various count models (Poisson, Negative Binomial, ZeroInflated Poisson, ZeroInflated Negative Binomial, Hurdle Poisson, and Hurdle Negative Binomial) were considered to identify risk factors associated with the death of underfive in Ethiopia. The mean number of underfive death was 0.9 and its variance was 1. 697. The hurdle negative binomial model had the smallest AIC, Deviance, and BIC, suggesting the best goodness of fit. Besides, the predictive value and probabilities for many counts in the hurdle negative binomial model fitted the observed counts best. The result of hurdle negative binomial model showed that region, mother’s age, educational level of the father, education level of the mother, father’s occupation, family size, age of mother at first birth, vaccination of child, contraceptive use, birth order, preceding birth interval, twin children, place of delivery, antenatal visit predict underfive death in Ethiopia. The rate of Underfive death remains high. Concerned governmental organizations should work properly to reduce underfive mortality through encouraging child vaccinations and antenatal care visits. Attention should also be provided to multiple births and the spacing among order of birth. The Hurdle negative binomial model provided a better fit for the data. It is argued the Hurdle negative binomial model for count data with excess zeros of unknown sources such as the number of underfive death should be fitted.
Introduction
Child mortality is defined as the probability of live births per household in a specific year who die before celebrating the age of five. Child mortality has received international attention through Sustainable.
Development Goals (SDGs) [1]. Every year, underfive children die in millions [2]. In 2016, for instance, 5.6 million children died, and globally, still about 15,000 children die every single day. The burden of underfive deaths remains unevenly distributed and in southern Asia and sub Saharan Africa in particular, remains high. For instance, about 80 percent of underfive deaths occur in two regions, subSaharan Africa and South Asia. Including Ethiopia, six countries account for half of the global underfive deaths [1, 3, 4].
In Ethiopia, the mortality rate of underfive is 67 per 1000 live births, with large disparities in across regions. Every 15 children die before reaching their fifth year. Every year, more than 257,000 children die under the age of five, of which 120,000 die in the neonatal period. If present trends continue, more than 3,084,000 children will die until the 2030 [5, 6].
To identify the risk factor of underfive mortality, different countries conducted various studies [7,8,9,10,11,12]. Many smallscale surveys were done on a specific set of variables. These studies investigated the risk factors of underfive mortality through binary logistic and survival analysis [13,14,15]. Though, binary logistic regression undercounts the total number of mortality since multiple mortalities are collapsed into a single unit to fulfill the requirements of binary logistic regression, provides sufficient information for studying the pattern of multiple child deaths. In this study, the count regression model is the preferred model of analysis. The response variable is the number of underfive death (count) and the main aim is to see how this count changes as the explanatory variable increase. Classical Poisson regression is the most wellknown method for modeling count data. Yet its underlying assumption of equaldispersion (i.e., an equal mean and variance) limits its use in many realworld applications with over or under dispersed data (i.e., the variance is larger than the mean or smaller than the mean). Excess variation may result in incorrect inference about parameter estimates, standard errors, tests, and confidence intervals. Overdispersion frequently arises for various reasons. One is; excessive zero counts or censoring. Overdispersed count data are common in many areas which in turn, lead to the development of the statistical methodology for modeling overdispersed data [16]. The negative binomial distribution looks like the Poisson distribution, except that it has a longer, fatter tail to the extent the variance exceeds the mean. Depending on the degree of overdispersion, the negative binomial model could capture more zeros than the Poisson model [17]. Nevertheless, the model may be insufficient in with respect to empirical applications bearing zero values in the data. Zeroinflated models provide a way of modeling the excessive proportion of zero values by allowing overdispersion. When number of zeros is large, provides a good fit than Poisson or negative binomial model [18]. However, these models are not suitable for underdispersion. A flexible alternative that captures both over and underdispersion is the hurdle model [19]. On the other hand, most of the scholars on child mortality considered either prevalence alone, Chi square association, or logistic and survival models to analyze such [11, 12, 20,21,22] outcomes. In logistic regression, the dependent variable is dichotomized to be either dead or alive, which undercounts the total number of underfive mortality. This implies that multiple child deaths are collapsed into a single unit for doing a logistic regression. Therefore, this study tried to identify the best statistical model to estimate predictors of underfive mortality in Ethiopia. The main strength of the model used in this case accounts for survey design features such as weighting, clustering, and stratification since a failure to account for design features leads to invalid statistical inference such as standard errors, and over or under estimation.
Contribution
To our knowledge, this is the first paper assessing the effects of different covariates on the number of underfive mortality rate using the count regression models such as hurdle negative binomial at a multiregional level with a huge dataset. Besides, identifying the specific predictor associated with underfive is important in prioritizing interventions and revealing patterns for better results. Further, to offer flawless information to researcher how to use over dispersed, under dispersed and zeroinflated regression models.
Methods
Data source
The data used for this study was taken from the 2016 EDHS a nationally representative survey of women’s aged (15–49 years) groups drawn from the Central Statistics Agency (CSA), Ethiopia. The survey is the fourth comprehensive survey designed to provide estimates in all urban and rural areas of Ethiopia about the targeted health and demographic variables.
Dependent variable
The response variable of this study is number of underfive deaths per mother.
Independent variables
Possible predictors of underfive death use such as region, mother’s age, educational level of the father, education level of the mother, father’s occupation, mother’s occupation, family size, age of mother at first birth, religion, vaccination of child, contraceptive use, birth order, preceding birth interval, child twin, place of delivery, antenatal visit, breastfeeding, and residence were included in the analysis.
Statistical software
The secondary data were recorded in SPSS software version 21 and then exported to R software version 3.5.3 and analyzed by using R software.
Statistical models
In some epidemiological studies, the response of interest consists of a count; such as number of the underfive deaths. In this case, the count regression model better analyzes the response measurements. From count regression models, this study applied Poison, Negative Binomial, ZeroInflated Poisson, ZeroInflated Negative Binomial, Hurdle Poisson, and Hurdle Negative Binomial models.
Poisson regression model
The most popular model for count data is the Poisson model, which assumes that the mean and variance of the dependent variables to be equal [23]. The probability function for \({\text{Y}}\) is given by
where, \(y_{i}\) is the number of underfive deaths the \(i{\text{th}}\) mother in a given time with a mean parameter \(\mu_{i}\).The mean and variance of the Poisson distribution is given as
Negative binomial regression model (NB)
Sometimes the variance exceeds the mean which is referred to as over dispersion [24]. Various models exist that account for overdispersion and for this study it can be modelled using a negative binomial (NB) regression model. The probability function is given by
where \(\alpha\) is the over dispersion parameter, \(\varGamma (.)\) is the gamma function, \(\alpha = 0\) and, the Negative Binomial distribution is the same as Poisson distribution. The mean and variance are expressed as:
Zeroinflated poisson (ZIP) regression model
The zeroinflated Poisson (ZIP) regression model is a modification of this familiar Poisson regression model that allows for an overabundance of zero counts in the data [18]. Specifically, if \(Y_{i}\) is the number of underfive mortality per mothers is independent random variables having a zeroinflated Poisson distribution, the zeros are assumed to arise in two ways corresponding to distinct underlying states. The first state occurs with probability \(\pi_{i}\) and produces only zeros (mothers who are never born), while the other state occurs with probability \(1  \pi_{i}\) and leads to a standard Poisson count with mean \(\mu\) and hence a chance of further zeros (mothers whose child may not be dead). In general, the zeros from the first state are called structural zeros and those from the Poisson distribution are called sampling zeros [25]. This twostate process gives a simple twocomponent mixture distribution with probability mass function
The parameter \(\mu_{i}\) and \(\pi_{i}\) depends on the covariates \(x_{i}\) and \(z_{i}\), respectively. The mean and the variance of the ZIP regression model, respectively, are:
To apply the zeroinflated Poisson model in practical modeling situations, [18, 26, 27] suggested the following joint models for \(\mu\) and \(\pi\)
where \(X\) and \(Z\) are covariate matrices and \(\beta ,\gamma\) are \((p + 1) \times 1\) and \((q + 1) \times 1\) vectors of unknown parameters respectively. The two sets of covariates may or may not coincide. For a random sample of observations \(y_{1} ,y_{2} , \ldots \ldots ,y_{n}\) the, loglikelihood function is given by
where \({\rm I}(.)\) is the indicator function for the specified event, i.e. equal to 1 if the event is true and 0 otherwise.
Zeroinflated negative binomial (ZINB) regression model
The ZIP model may sometimes, fail to fit such data either because of overdispersion in relation to the Poisson distribution. We extend the ZIP mixed regression model to ZINB mixed regression model. The ZINB regression is used for count data that exhibit overdispersion and excess zeros [28, 29].
Suppose \(Y_{i}\) is the number of underfive mortality per mother then, the probability mass function of ZINB is given by:
where \(\mu_{i}\) the mean of the underlying negative binomial distribution and \(\alpha\) is the overdispersion parameter. The ZINB distribution reduces to the ZIP distribution as \(\alpha \to 0\). The mean and variance, \(E(Y_{i} ) = (1  \pi_{i} )\mu_{i} \;{\text{and}}\;Var(Y_{i} ) = (1  \pi_{i} )(1 + \pi_{i} \mu_{i} + \alpha \mu_{i} )\) respectively [29].
In the terminology of generalized linear models (GLMs) \(\ln (\mu_{i} )\) and \(\log it(\pi_{i} )\) are the natural links for the negative binomial mean and Bernoulli probability of success [18, 30]
where \(x_{i}\) and \(z_{i}\) are respectively vectors of covariates for the negative binomial and the logistic components, \(\beta\) and \(\gamma\) are the corresponding vectors of regression coefficients.
The hurdle poisson model
The hurdle regression handles the excess zeros by relaxing the assumption that zeros and positives come from a single data generating process [18]. A hurdle model is introduced by [30] for the analysis of overdispersed or underdispersed count data. Hurdle Poisson (HP) model is a twocomponent model consisting of a hurdles component models zero versus nonzero counts. A truncated Poisson count component is employed for the nonzero counts [19, 31]. Its probability density function is given as:
where \(\pi_{o} = p(y_{i} = 0)\) and \(\mu_{i} = Exp(x_{i} \beta )\).
In the PLH model, the most natural choice to model the probability of zeros is a logistic regression model.
where \(z_{{^{i} }}^{{}} = (1,z_{i1} ,z_{i2} , \ldots \ldots ,z_{iq} )\) is the \(i{\text{th}}\) row of covariate matrix \(Z\) and \(\gamma = (\gamma_{1} ,\gamma_{2} , \ldots ,\gamma_{q} )\) are unknown \(q\)dimensional column vector of parameters. While the effect of covariates \(z_{i}\) on strictly positive (that is censored). Count data are modeled through Poisson regression:
where \(x_{{^{i} }}^{{}} = (1,x_{i1} ,x_{i2} , \ldots \ldots ,x_{ip} )\) is the \(i{\text{th}}\) row of covariate matrix \(X\) and \(\beta = (\beta_{1} ,\beta_{2} , \ldots ,\beta_{p} )\) are unknown \(p\)dimensional column vector of parameters. This model was proposed originally by [31].
The loglikelihood function of a LogitPoisson regression can, therefore, be expressed as the sum of loglikelihood functions of two components as below:
.
The hurdle negative binomial model
Similarly, for the hurdle models, the Hurdle Negative Binomial can be used instead of Poisson distribution for overdispersion [19]. We consider a hurdle negative binomial (HNB) regression model in which the response variable \(Y_{i} = (i = 1,2,3, \ldots \ldots ,n)\) has the distribution
where \(\alpha \ge 0\) is a dispersion parameter that is assumed not to depend on covariates. In addition, we suppose \(0 < \pi_{0} < 1\;{\text{and}}\;\pi_{0} = \pi_{0} (z_{i} )\).
The most natural choice to model probability of excess zeros is to use a logistic regression model:
where \(z_{{^{i} }}^{{}} = (1,z_{i1} ,z_{i2} , \ldots \ldots ,z_{iq} )\) is the \(i{\text{th}}\) row of covariate matrix \(Z\) and \(\gamma = (\gamma_{1} ,\gamma_{2} , \ldots ,\gamma_{q} )\) are unknown \(q\)dimensional column vector of parameters. Impact of covariates on count data is modeled through NB regression
\(x_{ij}\) is the covariates, \(\beta\) is the coefficient of the independent variables in the regression model and \(p\) is the number of these independent variables.
We can obtain the loglikelihood function for the hurdle negative binomial regression model as
Assessing model adequacy
We use the likelihood ratio test (LRT) to assess the goodnessoffit between two nested models. The LRT is only valid test to compare hierarchically nested models. None nested model was compared by using Akaike information [32].
Result
A total of 14,370 women were included of which 7720 (53.3%) of the mothers have not faced any underfive death, only 78 (0.5%) of them lost 7 children. Further screening of the num
ber of death of underfive showed that the variance (1.697) is greater than the mean (0.9) indicating overdispersion (Table 1).
Sociodemographic and economic characteristics of underfive mortality in Ethiopia
Some of the socioeconomic, demographic, health and environmentalrelated factors on the child death per mother are summarized in Table 1. The majority, 80.9% of child death occurs with uneducated mothers and 3.7% of deaths occur with mothers who have a secondary level education or above. A part, (66.2%) of the child death occur with uneducated fathers and 7.7% of the deaths occur with parents who attained secondary and above. The highest percent of child death (25.8%) is observed with children whose order of birth is four and above and the lowest percent (38.5%) of the death is observed with children whose order of birth first. It is also noted that (58.6%) of the child death is attributed to poor women. Besides, the lowest percent of child death occurred with mothers whose visit to a health care institution during pregnancy is at least 4 times (85.0%) while the highest percentage of child death occurred with mothers who did not receive any antenatal check during pregnancy (7.3%). Working fathers have a lower percent of child deaths (46.6%) as compared to nonworking fathers (53.4%) (Additional file 1: Table S1).
Model selection criteria
AIC, BIC, and Deviance statistics were used to identify the appropriate count model among the six commonly considered count models from the results in Table 2, we can observe that the lognormal survival model has the smallest AIC, BIC, and Deviance statistic. This indicates that the Hurdle negative binomial survival model is a better fit to data as compared to the other count regression models (Table 2).
Table 4 indicated the zero counts captured by sixcount regression models. Poisson and the NB model underestimated zero counts, the zeroinflated models overestimated zero counts and the hurdle models captured all zero values. Hurdle based model was better estimated than the other count model (Table 3).
Figure 1 and Table 4 shows the predictive probabilities distribution curve of sixcount regression models and the observed proportions. The hurdle negative binomial model is a better choice than the other count models as the predicted probability for the hurdle negative binomial model is closed to the observed probability. Therefore, it is possible to conclude that the HNB model is more appropriate than other count models in terms of fitting the number of underfive deaths per mother.
Factors associated with underfive mortality in Ethiopia
Table 2 presents summaries of the Negative binomial hurdle model. The negative binomial component shows the magnitude of severity of underfive mortality. Compared with nonvaccinated children, the rates of nonzero underfive death for vaccinated children decreased by 28.5% (IRR = 0.715, 95% CI 0.630, 0.811).compared to children born in Tigray region, the risk of underfive child mortality is 1.475 times(IRR = 1.475, 95% CI 1.286, 1.678) higher among underfive children born to mothers from BenishangulGumuz regions. Relative to children whose mothers did not receive any antenatal follow up during pregnancy, the rate of nonzero underfive death whose mothers ANC visit at least 4 times during pregnancy was decreased by 22.3% (IRR = 0.777, 95% CI 0.671, 0.091). As birth order increases the underfive mortality also increases. The death rate of nonzero underfive whose order of birth is four and above increased by 47.9% (IRR = 1.621, 95% CI 1.496, 1.756) as compared to children have a first order of birth. The result, also indicated that the educational level of mother’s and father’s was a significant factor for affecting the number of underfive deaths. Compared with noneducated mothers, among mothers who have a primary level of education, the rate of nonzero underfive death decreased by 23.4% (IRR = 0.766, 95% CI 0.700, 0.837) Similarly, compared to noneducated fathers, among fathers with secondary and above level of education, the incidence rate of nonzero underfive death for father’s attended secondary and above education was decreased by 32% (IRR = 0.680, 95% CI 0.586, 0.789).The result also showed that the incidence rate of nonzero underfive death in multiple births was 1.471 times (IRR = 1.471, 95% CI 1.359, 1.593) that of the single births. Interpretation of other significant variables was done in a similar way (Additional file 2: Table S2).
The zeroinflated hurdle negative binomial part also indicted that the estimated odds that the number of underfive death becomes zero with vaccinated children increased by 78.1% (AOR = 1.781; 95% CI 1.587, 1.999) as compared to nonvaccinated children. An increase in family size by 1, the estimated odds that the number of underfive deaths become zero increased by 22.6% (AOR = 1.226; 95% CI 1.200, 1.253). The result also revealed that the estimated odds that the number of underfive deaths become zero with mothers who made ANC visit at least 4 times during the pregnancy was 2.316 times (AOR = 1.316; 95% CI 2.019, 2.657) that of children whose mothers who have not have any antenatal follow up. The estimated odds that the number of underfive deaths with mothers using contraceptives was 1.291 times (AOR = 1.291; 95% CI 1.161, 1.437) that of mothers who didn’t use contraceptive. The analysis further indicated that the estimated odds the number of underfive death becomes zero with children who are born in the private sector was 2.083 times(AOR = 2.083; 95%CI: 1.546, 2.807) that of children born at home. The probability of underfive deaths decreased with increase in the educational level of the mother. The estimated odds that the number of underfive deaths become zero with mothers who have secondary and above education was 1.500 times (AOR = 1.500; 95% CI 1.212, 1.855) that of the noneducated mothers (Table 2).
Discussion
The objective of this study was to investigate the important risk factors of underfive child mortality in Ethiopia using the latest (EDHS2016) dataset. Identified these factors help policymakers, expedites the efforts made towards making life better and evaluate progresses made towards achieving the MDG4. The outcomes of this study shed more light on the risk factors of underfive death in Ethiopia. The study for instance determined that parent level of education is an important socioeconomic predictor of underfive mortality, that is, mortality rate decreased with increase in the level of education of the parents. Thus, it is found that an educated father takes a better care of the underfive child and his wife during antenatal and postnatal periods. This is line with a finding from a previous study which found that higher level of maternal and father education, lowers child mortality [21, 33,34,35,36]. Relative to single births, the risk of underfive deaths due to multiple births is very high, this study is similar to previous studies in finding out the link between birth type and higher risk of child mortality [12, 21, 35,36,37]. The study found that underfive mortality decreased as the length of the preceding birth interval increased which is consistent with different findings [11, 12, 21].
Moreover, the study revealed that death of underfive children from mothers who use contraceptive was significantly less than that of children from mothers who do not use contraceptive which agree with prior study findings by [11, 15]. Similar to findings of study by [37], vaccinated children had a risk of mortality that is lower than that of the nonvaccinated.
The estimated result showed that as mothers age at first birth increases, the risk of underfive mortality reduced, and similar to findings from previous studies by [11, 12, 15, 34, 35] Mothers who had their first child at a younger age have a higher underfive mortality risk. In addition to this, similar to reports of findings by [34, 35], this reported that for every unit increase in the ages of a mother, the risk of underfive mortality increased. The result also revealed that children born from working fathers had a higher risk of mortality than those of from nonworking fathers. In a developing country like Ethiopia, a father has more responsibility for household income and didn’t have enough time to care for his children. More importantly, in Ethiopia more than 80% the population is rural and their farming activities denies them time for caring their children, a finding that is consistent with [11]. Further, it is learnt that increase the number of antenatal visits during pregnancy reduces the risk of underfive mortality, a finding that is also confirmed by previous researches [11]. Children born in a healthcare facility that is in the public or private sectors were at lower risk than those born at home. This might be due to the proper health care and attention these facilities provide them during and after delivery, a finding which is confirmed by other studies as well [8, 11, 12].
The study also revealed that household size is an important variable affecting the number of underfive mortality. As household size increased the risk of underfive mortality decreased significantly, and this finding is consistent with [15, 35, 37, 38]. Similar to studies by [12, 37, 38] which found underfive mortality increase with increase in household size. The underfive child mortality in this study significantly and positively associated with the birth order.
In this study, the Deviance, AIC, and BIC statistic and predictive probability curve indicated that the Hurdle negative binomial model was the best model for the number of underfive death with about 53.7% zero counts. Several studies reported similar results that the Hurdle negative binomial model was the best model for count outcomes [39,40,41,42].
Conclusions
This study aimed to visualize and identify the responsible factors associated with the number of underfive child mortality in Ethiopia using the latest DHS data, which is essential for formulating appropriate health programs and policy implications. In general, the report showed that despite a concerted effort by the government of Ethiopia and several stakeholders in the health sector to improve underfive child survival, the child death rate did not decrease. The mean number of underfive deaths was 0.9 with a variance of 1.697, indicating that the data is over dispersed. The hurdle negative binomial model had the smallest AIC, Deviance, and BIC indicating that it is the best goodness of fit. Besides, the predictive value and probabilities for many counts in the hurdle negative binomial model fitted the observed counts best. This study revealed that region, mother’s age, education level of a father, education level of the mother, fathers occupation, family size, age of mother at first birth, vaccination of the child, contraceptive use, birth order, preceding birth interval, child twins, place of delivery, and antenatal visit were important determinants of number of underfive death in Ethiopia. The Hurdle negative binomial model provided a better fit for the data. We applied the Hurdle negative binomial model for count data with excess zeros of unknown sources such as the number of underfive deaths. The government, policymakers, and health care department are recommended that they should focus on the parental education and awareness, contraceptive use and breastfeeding practices is important to reduce underfive mortality and be in line with MDG4. Moreover, the study found that early marriage is associated with child mortality, and the government and the ministry of health should aware and educate about the impacts of early marriage on child mortality.
Limitations
Some variables are not included because of the large number of missing values like the weight of child at birth, anemia, and the size of child at birth. The interaction term is not considered in this study due to the convergence issue.
Availability of data and materials
The data set was accessed from the Measure DHS website (http://www.measuredhs.com).
Abbreviations
 ANC:

Antenatal care
 CSA:

Central statistical agency
 EDHS:

Ethiopian Demographic and Health Survey
 HNB:

Hurdle negative binomial
 HP:

Hurdle poisson
 LRT:

Likelihood ratio test
 NB:

Negative binomial
 SDG:

Sustainable development goal
 ZINB:

Zeroinflated negative binomial
 ZIP:

Zeroinflated poisson
 MDG4:

Millennium development goal 4
References
 1.
You D, et al. Global, regional, and national levels and trends in under5 mortality between 1990 and 2015, with scenariobased projections to 2030: a systematic analysis by the UN Interagency Group for Child Mortality Estimation. Lancet. 2015;386(10010):2275–86.
 2.
WHO. Under five mortality rates. Global Health Observatory (GHO) Data. 2016.
 3.
Hug L, Sharrow D, You D. Levels & trends in child mortality: report 2017. Estimates developed by the UN Interagency Group for Child Mortality Estimation. 2017.
 4.
WHO, Children: reducing mortality. 2017.
 5.
USAID, Maternal, Neonatal and Child Health In Ethiopia 2018.
 6.
EDHS, Ethiopian Demographic and Health Survey 2016.
 7.
Nasejje JB, Mwambi HG, Achia TN. Understanding the determinants of underfive child mortality in Uganda including the estimation of unobserved household and community effects using both frequentist and Bayesian survival analysis approaches. BMC public health. 2015;15(1):1003.
 8.
Muriithi DM, Muriithi DK. Determination of infant and child mortality in Kenya using coxproportional hazard model. Am J Theoret Appl Stat. 2015;4(5):404–13.
 9.
Mekonnen D. Infant and child mortality in Ethiopia. The role of socioeconomic, demographic and biological factors in the previous 5 years period of 2000 and 2005, 2011.
 10.
Madise NJ, Banda EM, Benaya KW. Infant mortality in Zambia: socioeconomic and demographic correlates. Soc Biol. 2003;50(1–2):148–66.
 11.
Getachew Y, Bekele S. Survival analysis of underfive mortality of children and its associated risk factors in Ethiopia. J Biosens Bioelectron. 2016;7(213):2.
 12.
Bereka SG, Habtewold FG. Underfive mortality of children and its determinants in Ethiopian Somali Regional State, Eastern Ethiopia. Health Sci J. 2017;11(3):1.
 13.
Adeolu M, et al. Environmental and socioeconomic determinants of child mortality: evidence from the 2013 Nigerian demographic health survey. Am J Public Health Res. 2016;4(4):134–41.
 14.
Kanmiki EW, et al. Socioeconomic and demographic determinants of underfive mortality in rural northern Ghana. BMC Int Health Human Rights. 2014;14(1):24.
 15.
Bedada D. Determinant of underfive child mortality in Ethiopia. Am J Stati Probabil. 2017;2(2):12–8.
 16.
Sellers KF, Shmueli G. Data dispersion: now you see it… now you don’t. Commun StatTheory Methods. 2013;42(17):3134–47.
 17.
Hilbe JM. Negative binomial regression. Cambridge: Cambridge University Press; 2011.
 18.
Lambert D. Zeroinflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34(1):1–14.
 19.
Gurmu S. Generalized hurdle count data regression models. Econ Lett. 1998;58(3):263–8.
 20.
Ayele DG, Zewotir TT, Mwambi H. Survival analysis of underfive mortality using Cox and frailty models in Ethiopia. J Health Popul Nutrit. 2017;36(1):25.
 21.
Gebretsadik S, Gabreyohannes E. Determinants of underfive mortality in high mortality regions of Ethiopia: an analysis of the 2011 Ethiopia Demographic and Health Survey data. Int J Popul Res. 2016. 2016.
 22.
Aheto JMK. Predictive model and determinants of underfive child mortality: evidence from the 2014 Ghana demographic and health survey. BMC Public Health. 2019;19(1):64.
 23.
Hoffman JP. Generalized linear models: an applied approach. Harlow: Pearson Education; 2004.
 24.
Molla, D.T. and B. Muniswamy. Power of tests for negative binomial regression coefficients in count data. Int J Mathe Archive. ISSN 2229–5046 [A UGC Approved Journal], 2012. 3(8).
 25.
Jansakul N, Hinde J. Score tests for zeroinflated Poisson models. Comput Stat Data Anal. 2002;40(1):75–96.
 26.
Afifi AA, et al. Methods for improving regression analysis for skewed continuous or counted responses. Annu Rev Public Health. 2007;28:95–111.
 27.
Agarwal DK, Gelfand AE, CitronPousty S. Zeroinflated models with application to spatial count data. Environ Ecol Stat. 2002;9(4):341–55.
 28.
Long JS, Freese J. Regression models for categorical dependent variables using Stata. 2006: Stata press.
 29.
Zuur AF, et al. Zerotruncated and zeroinflated models for count data, in Mixed effects models and extensions in ecology with R. Cham: Springer; 2009. p. 261–93.
 30.
Lawless JF. Negative binomial and mixed poisson regression. Canadian J Stat. 1987;15(3):209–25.
 31.
Mullahy J. Specification and testing of some modified count data models. J Econ. 1986;33(3):341–65.
 32.
Ismail N, Zamani H. Estimation of claim count data using negative binomial, generalized poisson, zeroinflated negative binomial and zeroinflated generalized poisson regression models. In Casualty Actuarial Society EForum. 2013.
 33.
Khan JR, Awan N. A comprehensive analysis on child mortality and its determinants in Bangladesh using frailty models. Arch Public Health. 2017;75(1):58.
 34.
Yaya S, et al. Prevalence and determinants of childhood mortality in Nigeria. BMC Public Health. 2017;17(1):485.
 35.
Alam M, et al. Statistical modeling of the number of deaths of children in Bangladesh. Biom Biostat Int J. 2014;1(3):00014.
 36.
Rahman MS, Rahman MS, Rahman MA. Determinants of death among under5 children in Bangladesh. J Res Opinion. 2019;6(3):2294–302.
 37.
Berhie KA, Yirtaw TG. Statistical analysis on the determinants of under five mortality in Ethiopia. Am J Theoret Appl Stat. 2017;6(1):10–21.
 38.
Ahmed Z, Kamal A, Kamal A. Statistical analysis of factors affecting child mortality in Pakistan. JCPSP. 2016;26(6):543.
 39.
Yusuf Olushola K. Statistical modeling of fertility experience among women of reproductive age in Nigeria. Int J Stat Appl. 2018;8(1):23–33.
 40.
Hidayat B, Pokhrel S. The selection of an appropriate count data model for modelling health insurance and health care demand: case of Indonesia. Int J Environ Res Public Health. 2010;7(1):9–27.
 41.
Kazembe LN. A Bayesian two part model applied to analyze risk factors of adult mortality with application to data from Namibia. PLoS ONE. 2013;8(9):e73500.
 42.
Gebremedhin TA, Mohanty I. Child schooling in Ethiopia: the role of maternal autonomy. PLoS ONE. 2016;11(12):e0167639.
Acknowledgements
We would like to thank the Ministry of Health and Central Statistical Agency, Government of Ethiopia, for making the data freely available for research purposes. In addition, authors would like to acknowledge Berhanu Engidaw (Assist. Professor) from the English department of Bahir Dar University for providing language reviewing and editing service for the manuscript.
Future research implications
While several studies that deployed different statistical models and techniques were made on determinants of mortality among underfive children through different statistical models and techniques, little attention is provide to a comprehensive study that examines the patterns and associated risk factors of the underfive mortality, as well as its change over time and space. Therefore, we plan to conduct further study to determine the factors and trends of the number of underfive children using spatiotemporal multivariate decomposition analysis. This is useful to locate the hot and cold area of the underfive mortality of children in Ethiopia in order to be able to formulate appropriate health programs and policies.
Funding
None.
Author information
Affiliations
Contributions
SMF drafted the proposal, did the analysis, wrote the results, and prepared the manuscript. HMF participated in editing, analysis, and write up of the result and GMA critically revised the manuscript for its scientific content. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study used available secondary data accessed under the National Data Sharing and Accessibility Policy (NDSAP) of the Government of Ethiopia. The data set had no identifiable information on the survey participants; therefore, no ethical approval is required for this work.
Consent for publication
Not applicable.
Competing interests
The authors declare that no competing interests exist.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1: Table S1.
Sociodemographic and economic characteristics of underfive mortality in Ethiopia.
Additional file 2: Table S2:
Hurdle negative binomial regression coefficients for the number of underfive mortality in Ethiopia.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fenta, S.M., Fenta, H.M. & Ayenew, G.M. The best statistical model to estimate predictors of underfive mortality in Ethiopia. J Big Data 7, 63 (2020). https://doi.org/10.1186/s40537020003390
Received:
Accepted:
Published:
Keywords
 Under five mortality
 Ethiopia
 Hurdle negative binomial
 Count models