Poisson logit hurdle model with associated factors of perinatal mortality in Ethiopia

Introduction Perinatal mortality is that the total number of deaths within the perinatal period. This includes stillbirth (fetal death) and early neonatal death (ENND) i.e., death of live newborns before the age of 7 completed days. Perinatal mortality rate (PMR) is calculated as a proportion of the overall number of perinatal deaths per total number of births [1–3]. This perinatal period is the most critical in an individual’s life, and also the rate of death during this period is higher than any other period of life. Perinatal mortality is especially associated with maternal conditions during late pregnancy and intrapartum conditions. Perinatal mortality reflects the quality of delivery care and it is also a key indicator of socioeconomic development and also the overall health status of any country [4]. Abstract

Globally, over 5 million perinatal deaths occur every year, ending preventable perinatal deaths will continue to be a significant part of the global public health agenda beyond 2015. In developed countries, perinatal mortality is a rare event. Ninety-seven percent of globally reported stillbirths and 98% of neonatal deaths occurred in developing countries [5]. Perinatal mortality is a major public health problem, particularly in developing countries, and has huge economic, social, and health implications for families and nations. As a result, reducing stillbirths and early neonatal deaths continued to be an important part of the third Sustainable Development Goal (SDG-3), to end preventable child deaths by 2030 [2,6]. The highest mortality rate is found in sub-Saharan Africa. PMR and intra partum stillbirths are 5 and 14 times higher in developing regions compared with developed regions consecutively [7].
The analysis of count data with many zeros can be done in various fields, including medical and public health. Several models have been developed in recent decades to analyze these count data like perinatal mortality data [8]. The usual Poisson regression model and the Negative Binomial model were not significant in explaining and handling over-dispersion due to excess zeros [9]. Therefore, Hurdle and zero-inflated count models are the most applicable methods to deal with count data having excessive zero counts [10]. Zero-inflated models and hurdle models provide a way of modeling the excessive proportion of zero values and allow for over-dispersion [11][12][13]. The choice between the zero-inflated model and the hurdle model is often dependent on the nature of the data. Although these two models are similar in many aspects, conceptually there is a little difference between the two models depending on the application and the data collection. The main difference zero inflated and hurdle models are generally used in the setting of excess zeroes. Zero inflated models are typically used if the data contains excess structural and sampling zeros, whereas hurdle models are generally used when there are only excess sampling zeroes.
The hurdle model includes a mass at zero and a truncated distribution whereas the zero-inflated model is based on a mass at zero and a regular distribution even if the inferential results are often very similar. Hurdle models are more general in the sense that they can handle both zero deflation and zero inflation data [14,15]. The hurdle models do not necessarily have to be set at zero. Zero-inflated models are less general than hurdle models due to the assumption that two different types of zeros (structural or true zeros, vs. sampling zeros) may exist in the data and it handles only zero inflation data [14]. Ideally, the hurdle models are more appropriate for cases where a real separation of mechanisms producing the zeros and the positive counts is justified [16].
In many cases because of many zeros in the dependent variable, the mean is not equal to the variance value of the dependent variable. Due to that, the Poisson model is no longer suitable for this kind of data. The hurdle model is better than the zero inflated models in cases of zero excess data there will zero deflation could still occur at specific levels of covariates [14,15,17]. To reduce mortality, determinants of perinatal mortality at country context initiatives, which focus on encouraging evidence-based advocacy and effective interventions targeting perinatal mortality reduction, through local decision making.
In Ethiopia, there are a few studies that have investigated the associated factors of perinatal mortality using a simple logistic regression model. But as far as our search, this is the first study in Ethiopia, which applied appropriate statistical methods to measure the associated factors using the Poisson logit hurdle model using the latest round demographic health survey data (EDHS 2016). Therefore, this study aimed to apply the Poisson logit hurdle model to identify the main associated factors of perinatal mortality in Ethiopia.

Study design and source of data
The dataset used for this study was obtained from 2016 Ethiopian Demographic Health Surveys conducted from January 18 to June 27, 2016, across the country. The survey was a population-based cross-sectional study. For the surveys, the 2016 EDHS sample was stratified and selected in two stages. In the first stage, a total of 645 clusters were randomly selected proportional to the household size from the sampling strata, and in the second stage, 28 households per cluster were selected using systematic random sampling. In this survey, a total of 7230 mothers selected from 645 clusters were included in this study.

Dependent variable
Number of perinatal mortality per mother preceding 5 years from the survey.

Independent variable
Place of residence, Age of mothers (years), Educational status of mothers, Educational status of husbands, mode of delivery, place of delivery, ANC visit, Parity, Type of birth, and history of abortion.

Operational definition
Perinatal mortality is deaths after 28 competed for gestational weeks (still birth) and the first 7 days of birth (early neonatal death) per mother preceding 5 years from the survey.
Antenatal care visit is the number of attending ANC clinic during pregnancy, which categorizes the mother who visited an ANC clinic at least four times (≥ 4), the mother who visited an ANC clinic at most three times (< 4).
Parity is the number of life born babies of the mother. The categories are: the mother who had given at least four life birth (≥ 4) and the mother who had given at most three life birth (< 4).
Mode of delivery is the mode of giving the last birth preceding 5 years from the survey. Categories are: caesarean section (CS) and normal (vaginal).
History of abortion is the history of terminated pregnancy of the mother. Categories: Yes/No.

Statistical method
In this study, the variable of interest is a count variable. When the dependent variable is a count, it is appropriate to use non-linear models based on non-normal distribution to describe the relationship between the response variable and a set of predictor variables. For count data, the standard framework for explaining the relationship between the outcome variable and a set of explanatory variables includes the Poisson, ZIP, and Poisson hurdle models. The advanced models for this study count data are the Poisson hurdle model.

Poisson regression model
Poisson regression has been widely used for fitting count data. It is traditionally conceived as the basic count model upon which a variety of other count models are based [10]. The probability mass function for Poisson regression is: µ i is the parameter of the Poisson distribution. It can be proved that the Poisson distribution is the mean equal to the variance. Such that the Poisson regression model is: β are the vector coefficients X T i . Unfortunately, in much of the cases, the number of perinatal mortality data produces variance that is greater than the mean, well known as over-dispersion. The over-dispersion is a result of extra variation in the number of perinatal death means which can be caused by various factors such as model misspecification, omission of important covariates, and excess zero counts [18]. In this case, applying a Poisson regression model for the number of perinatal death data would result in an underestimation of the standard error of the regression parameters. Therefore, the negative binomial model will be introduced.
In some cases, excess zeros in the number of perinatal death data exist and are considered as a result of overdispersion. In this case, the NB model cannot be used to handle the overdispersion which is due to the high amount of zeros. To do this, zero-inflation (ZI) models including Zero Inflated Poisson (ZIP) models can be used. ZIP models assume that all zeros count come from two different processes: the process generating excess zero counts derived from a binary model, and the process generating non-negative counts for the number of perinatal death including zero values. A Poisson regression model with many zero outcomes on the response variable. The zero-inflated Poisson regression model is more effective for many zero outcomes than Poisson regression.

Zero inflated Poisson regression model
In ZIP regression, the counts Y i equal 0 with probability p i and follow a Poisson distribution with mean µ i , with probability 1 − p i where i = 0, 1, 2,…, n. ZIP model can thus be seen as a mixture of two-component distributions, a zero part, and no-zero components, given by [10]: Assume that there are p predictors for logistic regression function and negative binomial regression function. Hence, the ZIP regression model can be written as follow:

Poisson hurdle regression model
A hurdle model consists of two components-a point mass at zero and a distribution that generates non-zero counts. The first component is a binary component that generates zeros and ones (here "ones" correspond to non-zero values in data) and the second component generates non-zero values from a zero-truncated distribution. The most widely used hurdle models are those with the hurdle value at zero [4]. All zeros in the hurdle model are assumed to be "structural" zeros, i.e., they are generated from a single process, and are observed since the condition is absent. We explore two zero-truncated count distributions for the hurdle model specification [19]. The hurdle model of count data can be expressed as follows for the Poisson distribution. We consider a Hurdle Poisson Regression Model in which the response variable y has the distribution: where µ i is the mean of the untruncated Poisson distribution.
Zero and truncated hurdle model: The Maximum Likelihood Estimation (MLE) method is used to estimate parameters in the count models. This study was a Poisson logit hurdle to accommodate the excess zeros for the number of perinatal death count data. In this paper, Akaike's information criteria (AIC) and log-likelihood values are used for model selection measures. It is also used dispersion parameters to test for overdispersion. The generalized Pearson χ 2 statistic which is the standard measure of goodness of fit is used to evaluate the sufficiency of the analyzing methods. Akaike's information criteria (AIC) and log-likelihood are basic methods of assessing the performance of the models and model selection [10].

Result
A total of 7230 mothers were obtained from EDHS 2016 survey. The frequency and percentage distribution of the number of perinatal mortality per mother in Ethiopia are presented in Fig. 1 based on information from 12,645 respondents. Of 7230 mothers, 95.27% of them never, 4.47% of them once, 0.26% twice, and 0.04% three times experienced perinatal mortality preceding 5 years of the survey. This indicates zero outcomes were large in numbers. However large numbers of perinatal mortality per mother are being observed less frequently. This leads to a positively skewed distribution. And also this excess zero leads that the data better to fitted by a zero-inflated and hurdle model which takes into account excess zeroes. A histogram is highly peaked at zero, we expected an over-dispersion of the response variable and this overdispersion coming up due to an excess zero. Because of a large number of zero outcomes, the histogram is highly picked at the very beginning.

Test of over-dispersion
The over-dispersion parameter test H0: ∅ =0, is not statistically significant, which provides evidence for preferring the PLH over the P (p-value = 0.044). Due to this over-dispersion, we applied the Poisson logit hurdle model.

Count model coefficients
The expected number of perinatal mortality of a mother whose age 30-39 years were 64% less likely to experience as compared to whose age less than 30 years keeping the other variable constant. Mothers whose age 40-49 were 81% less likely to experience a number of perinatal mortality as compared to their counterparts. The expected number of perinatal mortality of mothers having more than 4 parity were more likely to experience a count of perinatal mortality as compared to less than equal to 4. The expected number of perinatal mortality of multiple pregnant mothers was more likely risky to a number of perinatal mortality relative to single pregnant.

Zero hurdle model coefficients
This finding is the estimated Zero hurdle Regression coefficient's odds ratio compared to the group, considering other variables are held constant in the model. The odds of the non-zero perinatal mortality for mothers coming from rural were 1.52 times that as compared to urban. The Zero hurdle model indicated that the estimated odds of the number of perinatal mortality for a mother who delivered in the cesarean section was 2.29 times as likely those who give birth normal delivery. As the same wise, the expected non-zero perinatal mortality was significantly affected by preceding birth interval, multiple pregnancies, secondary + husband education, delivered at an institution, and history of abortion (Table 1).

Discussion
Perinatal mortality is several early neonatal death and stillbirth, which indicates the quality of delivery care and the overall health status of the country [1]. Poisson regression model is used to identify associated factors of count outcomes like perinatal mortality per mother. In the case of zero excessed and over-dispersed data, the Poisson logistics hurdle model is better for count data than the normal Poisson model. This study aimed to apply the Poisson logistics hurdle model to identify associated factors of perinatal mortality in Ethiopia using 2016 EDHS data. Consequently, the expected counts of perinatal mortality for mothers whose ages 30-39 were less likely risky as compared with those whose ages were less than 30 years. Similarly, mothers whose ages 40-49 were less risky to experience perinatal mortality compared with counterparts. This study is inconsistent with a study conducted in Jima zone, which reported that maternal age has no association with perinatal mortality [20] and a study conducted in Uganda reported that mothers whose age greater than 30 years were more risky to perinatal mortality than their counterparts [21]. The possible reason for this difference may be related to methodological difference, that study used binary logistics regression analysis method, it might lose information when it categorizes binary categories.
Moreover; mothers who have four and above parity were more likely risky to experience perinatal mortality as compared to counterparts. This finding is in line with a study conducted in Sudan, and southwest Ethiopia [22,23]. Moreover, this finding is inconsistent with a study conducted in Uganda [21]. The possible reason for this variation might be the participant variation from this study is a rural residence and methodological difference. The expected counts of perinatal mortality of mothers who had multiple pregnancies were 6.29 times more likely risky than single pregnant mothers. This result is agreed with a study conducted in southwest Ethiopia, which stated that twin births are increased perinatal mortality [20]. This finding is also consistent with a study conducted in Iran, which revealed that twin pregnancy has no association with perinatal mortality [24]. The reason might be the study design which was retrospective case-control. In addition to the study design, there was methodological statistical analysis, sociodemographic and educational variation.
This study revealed that mothers who lived in rural were 1.52 times as likely to experience none zero perinatal mortality as compared to urban ones. This finding disagreed with a study conducted in Sudan [22]. The main reason for this variation might be the method of data analysis and sample size variation, which is binary logistics.
In this study, we assessed the mode of delivery with a count of perinatal mortality per mother. The odd of a mother who delivered with cesarean section was 2.26 times as likely to experience non-zero perinatal mortality as compared to vaginal delivery. This finding is in line with a study conducted in Iran, Sudan, and Gamo zone Ethiopia  [22,24,25]. The odds of experiencing perinatal mortality were decreased by 2% for a 1-month increase of the preceding birth interval of the mother. This finding is consistent with a study conducted at Ilu Ababura, Oromia zone, and Addis Ababa in Ethiopia [26,27]. The possible reason could be related to the preparedness of a mother in psychology and nutrition aspect variation. The odds of experiencing none zero perinatal mortality were 6.22 times as likely to occur in multiple pregnancies as compared to single. This finding was supported by [20]. A mother having a secondary and above-educated husband was less likely to perinatal mortality compared with not educated. The reason might be on caring of mothers during deliveries and lactate in terms of nutrition, educated husbands have nutritional knowledge and giving care for mothers and the fetus.
Surprisingly, in this study, the odds of non-zero perinatal mortality were more likely those mothers who delivered an institution as compared to home-delivered. It might be related to the mode of delivery at the institution. This finding disagrees with a study conducted in Sudan. The reason might be a method of analysis and sample size, which used binary logistics regression and the sample size was 808 participants. Mothers who had a history of abortion were 8.19 times as likely to have perinatal mortality compared with no history of abortion. This finding is supported by a study conducted in Iran [24]. The possible reason might be the effect of abortion on the organs of the mother and fetus.  This study has a contribution to literature by introducing advanced statistical model, which is the Poisson logit hurdle model on count data to investigate the associated factors of perinatal mortality. Several studies were used binary logistic regression model, which lost some important information due to categorization like (yes/ no) and identification of zero type (true zero/false zero) [20,23,25].
The strengths of this study were using a recent nationally representative data set. And also, this study used the Poisson logit hurdle model to identify the associated factors of perinatal mortality in Ethiopia. One of the limitations of this study was important clinical factors were not assessed and the design of this study was crosssectional that cannot estimate any causal relationship.

Conclusion
This study has a contribution to literature by applying advanced Poisson logit hurdle model. The Poisson logit hurdle model is better for the perinatal mortality count data set. Based on the findings of this study, the main protective associated factors were 40-49 years age of mother, having long preceding birth interval, and secondary + husband education. Parity four and above, rural residence, Caesarean section, multiple pregnancies, institutional delivery, having a history of abortion were increased perinatal mortality. The implication of this study that targeted intervention can be designed focusing on family planning and mode of delivery to minimize perinatal mortality in the country.