Analysis of significant factors for dengue fever incidence prediction

Background Many popular dengue forecasting techniques have been used by several researchers to extrapolate dengue incidence rates, including the K-H model, support vector machines (SVM), and artificial neural networks (ANN). The time series analysis methodology, particularly ARIMA and SARIMA, has been increasingly applied to the field of epidemiological research for dengue fever, dengue hemorrhagic fever, and other infectious diseases. The main drawback of these methods is that they do not consider other variables that are associated with the dependent variable. Additionally, new factors correlated to the disease are needed to enhance the prediction accuracy of the model when it is applied to areas of similar climates, where weather factors such as temperature, total rainfall, and humidity are not substantially different. Such drawbacks may consequently lower the predictive power for the outbreak. Results The predictive power of the forecasting model-assessed by Akaike’s information criterion (AIC), Bayesian information criterion (BIC), and the mean absolute percentage error (MAPE)-is improved by including the new parameters for dengue outbreak prediction. This study’s selected model outperforms all three other competing models with the lowest AIC, the lowest BIC, and a small MAPE value. The exclusive use of climate factors from similar locations decreases a model’s prediction power. The multivariate Poisson regression, however, effectively forecasts even when climate variables are slightly different. Female mosquitoes and seasons were strongly correlated with dengue cases. Therefore, the dengue incidence trends provided by this model will assist the optimization of dengue prevention. Conclusions The present work demonstrates the important roles of female mosquito infection rates from the previous season and climate factors (represented as seasons) in dengue outbreaks. Incorporating these two factors in the model significantly improves the predictive power of dengue hemorrhagic fever forecasting models, as confirmed by AIC, BIC, and MAPE.


Background
Incidences of dengue hemorrhagic fever (DHF) and dengue fever (DF) have increased dramatically in past decades and have become a global threat. According to the World Health Organization (WHO), an estimated 500 million cases of DF and 250,000-500,000 cases of DHF occur annually [1,2]. The number of people residing in at-risk areas of DF outbreak totals 3.6 billion, or 55 % of the world's population [3]. Thailand recorded its first case of DF in 1958 [4]. Since then, the disease has become a major public health problem as the number of cases has continued to expand. Dengue virus infection can cause dengue diseases including classical DF and its severe form, namely, DHF and/or dengue shock syndrome (DSS) [5]. Approximately 10-15 % of infected patients are symptomatic, with~500,000 hospitalizations annually involving the severe form of the disease [6]. The annual hospitalization and death rates of patients by the severe form is highest in tropical and subtropical regions, especially in Southeast Asia, South and Central America, the Caribbean and South Pacific [5]. Huge efforts to control and monitor the dengue epidemic are currently underway in many countries. The major vector of dengue is the mosquito Aedes aegypti [4]. Females of this species transmit the virus to humans when taking a blood meal. Early warning system of dengue outbreak and advising the relevant departments to deploy mosquito control prior to disease outbreak is essential. Barbazan et al. [7] demonstrated that the seasonal transmission of DENV serotypes in an endemic area was significantly related to the prevalence and virulent strains and also associated to the high pathogenesis. Some studies suggested that active school-based dengue detection could be used as an indicator for reducing the longitudinal risk of viral transmission in rural areas [8].
Although the disease is transmitted to humans via female mosquitoes, entomologic surveillance to determine dengue transmission has been based on different larval indices [9,10] including house index (percentage of houses positive for larvae) and the Breteau index (number of positive containers per 100 houses). Even though, these indices have become widely used for dengue control program, prevalence of dengue infection is still high especially in the rainy season. With the advent of molecular biology techniques, it was possible to detect dengue viruses in mosquito vectors [11]. The virus infection in mosquito was then considered as an index to determine dengue epidemic. Several reports demonstrated the relationship between dengue outbreak and virus infection in Ae. aegypti mosquitoes. This correlation seems to be more practical and effective tool for planning dengue control [12][13][14][15][16]. Nonetheless, dengue incidence is difficult to predict because it varies widely over time [17]. Many DF prediction models are based on statistical and data mining techniques such as ARIMA [18], SAR-IMA [19][20][21], the K-H model [17], support vector machines (SVMs) [22], and artificial neural networks (ANNs) [23]. All of these approaches adopt a similar Fig. 1 Map of morbidity rate of dengue in Thailand reported by Health Info in Thailand (http://www.healthinfo.in.th/). The study areas were the three provinces of Nakhon Pathom, Ratchaburi, and Samut Sakhon in the central region of Thailand. The high morbidity rate (per 100,000 populations) of DHF between 2007 and 2012 is indicated by the red color basic set of predictors, such as temperature and rainfall level. To enhance the predictive power of DF models, we incorporated two novel predictors, female mosquito infection rate and season.

Study site for mosquito collection
From 2007 to 2012, Ae. aegypti mosquitoes were collected from three provinces in the central region of Thailand, including Nakhon Pathom, Ratchaburi, and Samut Sakhon. These areas were selected primarily for three reasons: high mosquito density, minor differences in climatic factors, and a high DHF morbidity rate as reported in Thailand health information system (http://www.hiso.or.th) and as illustrated in Fig. 1.

Ethics statement
The study was approved by the Ethics Committee of Research Affairs Unit, Faculty of Medicine, Chulalongkorn University (COA No. 328/2014).

Dengue mosquito collection
Ae. aegypti larvae and adults mosquitoes were collected from three provinces in the central region of Thailand. The collections were performed in three districts of each province (two sub-districts per district; two villages per sub-district; 40 dwellings per village). Twice per season, from January 2007 to December 2012, mosquito larvae were collected from water-filled containers indoors and around the houses; adult mosquitoes were collected by highly experienced officers from Thailand's National Institute of Health using human bait. Larvae and adults were visually identified as members of Ae. aegypti and were pooled, then maintained, in cryogenic vials. Each vial contained five larvae or mosquitoes and was stored in liquid nitrogen for subsequent dengue virus detection. Dengue virus infected mosquito rates were obtained from a previous report by Chompoosri et al. [14].

Dengue virus detection in Ae. aegypti mosquitoes
Detection of the four dengue virus serotypes in Ae. aegypti larvae and adults was modified from the method described by Tuksinvaracharn et al. [11]. The genomic viral RNA was extracted from pooled larvae and mosquitoes using the Invisorb® Spin Virus RNA Mini Kit (Invitex Gmbh, Germany) according to the manufacturer's protocols. One-step RT-PCR was performed with five oligonucleotide primers (D1 and four type-specific primers, including TS1, TS2, TS3, and TS4) that were designed by Lanciotti et al [24]. Amplification was carried out in a 25 μl total mixture using the Superscript III one-step RT-PCR kit (Invitrogen, USA) with 10 μM of each primer and 6 μl of RNA. The RT-PCRs were performed in a PCR Mastercycler® Pro (Eppendorf, Germany) under the conditions of 50°C for 30 min and 94°C for 2 min, followed by 40 cycles of 94°C for 30 s, 50°C for 30 s, and 72°C for 30 s; finally, the last cycle was at 72°C for 7 min followed by a final holding at 4°C. Aliquots of the PCR amplicons were analyzed by electrophoresis on 2 % agarose gels, stained with ethidium bromide, and visualized with Quantity One Quantification Analysis Software version 4.5.2 (Gel Doc EQ System; Bio-Rad, Hercules, CA).

Incidence of DHF in the study areas and dengue virus detection in blood samples
Incidences of DHF in the study areas were obtained from the Bureau of Epidemiology, Department of Disease Control, Ministry of Public Health, Thailand. The data were expressed as the morbidity rate of DHF per 100,000 individuals. Blood specimens were taken from suspected dengue infection patients, with 3 ml of blood collected into EDTA collecting tubes from each patient. Identification of dengue serotypes was performed by one-step RT-PCR [25]. Viral RNA was extracted from 100 μl of plasma from each patient, and RT-PCR for type-specific primers was carried out using a one-step RT-PCR kit (Qiagen Gmbh, Hilden, Germany). Each amplification was validated with positive and negative controls. PCR products were electrophoresed in 2 % agarose gel, stained with ethidium bromide (0.5 μg/ml), and visualized on a UV transilluminator (Gel Doc EQ System; Bio-Rad, Hercules, CA). The study was approved by the Ethics Committee of Research Affairs Unit, Faculty of Medicine, Chulalongkorn University (COA No. 328/2014).

Independent and dependent variables for a forecasting model
Besides the abovementioned mosquito infection rate parameters, data for all other factors relevant to DHF outbreaks were collected from various sources. Table 1 lists the independent and dependent variables considered in the proposed forecasting model. Values of all variables were collected between 2007 and 2012. Mosquito and blood sample collections were performed only until 2012 owing to budget limitations.
All collected data were cleaned before performing the analysis. Data cleansing transforms the data and removes those with missing values. After data cleansing, observations in each district were pooled seasonally; 144 samples remained and were used for model construction. Seasonal temperature, rainfall, humidity, and wind had indicated significantly high correlation coefficients (p < 0.0001) among themselves, as shown in Table 2-resulting in a multicollinearity problem in model fitting, and decreasing the reliability of the model. Therefore, we used the season variable as a proxy for meteorological conditions. Dengue rates in each season of the studied regions were explored, indicating right-skewed distribution, as illustrated in Fig. 2. Multivariate Poisson regression (MPR) [26], frequently applied to the analysis of count data [27] due to non-normal distribution, was adopted to find variables associated with the number of dengue cases; the main significant variables were initially selected for the model using the backward elimination scheme. Subsequently, two-variable interactions were added, and their effects were tested hierarchically. However, count data in the Poisson model usually displayed larger variation than its mean, referred to as "overdispersion." Here, we accommodated the overdispersed model by adjusting the parameter covariance matrix and likelihood function, yielding a more appropriate standard error estimation and likelihood ratio test.
A previous study [12] revealed that dengue infection rates in female mosquitoes of three provinces were highest in summer, while morbidity rates of DHF tended to be highest in the rainy season. Consequently, female mosquito infection rate in the previous season (one lag season) is used in predicting the number of dengue infections. As depicted in Table 3, four main variables are first considered in the model fitting process.

Model construction Multivariate poisson regression
In our previous study [15], we showed the significance of the infected female mosquito but did not study the correlation among the climate factors. In this paper, we deploy the season variable instead of climate factors. Additionally, we proposed to exploit the MPR technique that accounts for multiple predictors. Retrospective data are collected on a seasonal basis and the model temporally extrapolates the dependent variable by several seasons. Typically, the regression model expresses the natural logarithm of outcome as a linear function of a set of predictors, as shown in Eq. 1.
where ln(μ i ) is the natural logarithm of predicted seasonal dengue incidence of the i th observation; ln(pop i ) is the natural logarithm of population and used as an offset accounting for variation of population among regions; β 0 is the constant, denoting the baseline number of dengue incidences; β 1 and β 2 are regression parameters, denoting the effect of Season1 (Rainy) and Season2 (Summer) compared with Season3 (Winter); and β j ' s denote the effect of independent variables X j on dengue incidence, representing Fmosquito, Mmosquito, and AegRate, where j = 3, 4, and 5, respectively. Initially, four main variables were considered in the model fitting; variables were then removed one by one based on the backward elimination procedure. Twofactor interactions of the remaining variables were then  added. The final model was ultimately selected based on three measures: the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the mean absolute percentage error (MAPE). All competing models were also compared in nested order for model selection.

Multivariate poisson regression model validation
The constructed model was evaluated by three performance measures; MAPE, AIC, and BIC. The MAPE is given by Eq. (2).
where X i and F i are the observed and predicted values, respectively, and n is the total number of observations. The AIC [28] and BIC [29], illustrated in Eqs. (3) and (4), were considered in model selection to assess the goodness-of-fit of the model.
where k is the number of model parameters, and L is the maximized value of the likelihood function for the model. Lower MAPE, AIC, and BIC values indicate increased predictive power.

Results and discussion
The collected data from 2007 to 2012 were used for model construction. The number of dengue cases over time was then predicted based on the chosen model. Finally, the forecasted cases were compared with the actual dengue cases reported by NTCAESI. The dataset in this experiment includes all variables listed in Table 3 from the three provinces.

Model selection
The    Table 4. When one model was a special case of another, models can be compared in hierarchical order whereas simpler model in the null hypothesis was tested against more complex model in the alternative hypothesis. When the hypothesis testing indicated insignificance, the simpler model was adequate and the model under the null hypothesis is supported. As shown in Table 4, Model-3 yielded the lowest AIC and BIC. Although Model-4 and Model-1 gave smaller values of MAPE than Model-3, the extra terms did not significantly affect dengue case prediction. This discovery was not surprising because models with more attributes usually provide greater prediction power. Additionally, models are traditionally compared under the null hypothesis that the simpler model with fewer terms is better-similar to the principle of parsimony [26]. As all of these assessments revealed Model-3 to be the best model, Model-3 has been adopted as a representative model for predicting dengue incidences throughout the remainder of this study.

Multivariate poisson regression model analysis
Having selected a model, we quantitatively associated each variable with dengue cases. Table 5 lists the estimation of regression coefficients, standard errors, Wald statistics, and p-values of the selected model.
The regression coefficients in Table 5 indicate that a 1 % increase in the number of infected female mosquitoes from the previous season will generate a 1.02-fold (e 0.02 ) increase in the number of dengue incidences. The spread of dengue may be explained by several factors. In addition to the transmission of dengue virus to humans from mosquito bites, viral transmission among mosquitoes may also occur through transovarian [14] transmission. When increasing numbers of mosquitoes are infected, there is also naturally an increased risk that people living in such mosquito-infested areas may contract the disease. Outbreak risk is highest during the rainy season (Season1), being 1.73 (e 0.55 ) times higher than that of the winter season (baseline). The severity of the outbreak is raised by a factor of e 0.24 = 1.27 during the seasonal changeover from winter to summer (Season2) and by a factor of e 0.55-0.24 = 1.36 during the changeover from summer to the rainy season. This result is attributed to the large volumes of standing water on private properties that accumulate during the rainy season. Standing water revives mosquito eggs that have lain dormant over the previous seasons, with subsequent surges in mosquito emergence.

Prediction performance
Owing to high mobility rate in the rainy season, this study only demonstrates the prediction performance for this season. According to the results illustrated in Fig. 3, both actual and predicted value tended to demonstrate similar trends across the year, reflecting good performance of the adopted MPR model. In addition, Kolmogorov-Smirnov test [30] is utilized to verify the prediction performance and to test whether the actual and predicted value are consistent. The null hypothesis of consistency between actual and predicted value is not rejected, with D = 0.17 (p-value = 0.9639), indicating consistency of the actual and predicted values. Because our model accounts for the overdispersed problem, covariance estimation is adjusted to improve reliability. As a result, the prediction performance of the model is significantly improved.

Conclusions
As mentioned previously, no specific treatment exists for dengue infection, and effective vaccines remain at the developmental stage. Therefore, interrupting pathogen transmission by mosquito control is the most effective means of controlling dengue infection. In Thailand, although mosquito surveillance has been in regular operation for many years, surveillance has not appeared to fully prevent dengue outbreaks. Seasonal factor has been previously studied by Wongkoon et al. [31] which is similar to the work in this report. Nonetheless, the main  [12,32,33] and male mosquitoes can transmit the viruses to females via sexual transmission [12,33]. We found that the infected female mosquito together with season are directly correlated to the number of dengue cases and significantly useful for the forecasting model as confirmed by the results shown in Fig. 3. Secondly, Wongkoon et al. [31] used the container, house, and Breteau indices to determine dengue transmission as similar to several other previous reports [34][35][36]. However those indices may not correlate to the dengue virus transmission due to the increasing of dengue cases in Thailand. Therefore, female mosquito and season could be used as novel variables for effectively determination of the dengue outbreak in Thailand.
Infected female mosquito has also been used to predict dengue cases in our previous work [15]. However, the prediction techniques of these works are different; data mining-based technique (SVM, Neural network, Decision tree, and K-nearest neighbor) were used in the previous work to construct the forecasting model whereas statistics-based technique (Multivariate Poisson regression) was used in this work. Statistics is well established methodology of science and useful for verifying relationships among parameters when the relationships are linear while data mining techniques are useful for knowledge finding hidden in the data. In this paper, we focus on the analysis of linear correlation between dengue cases and infected data of mosquito. As such, methodology for model analysis and selection are different and they are the major contribution in this paper.
The present work demonstrates the important roles of female mosquito infection rate and season in dengue outbreak prediction. Statistic-based analysis illustrated that there is a positive relation between these variables and the number of dengue cases. Hence, integrating these two factors in the forecast model significantly improves the model's DHF predictive power, as confirmed by AIC, BIC, and MAPE. The proposed model efficiently estimated the dengue incidence trends in the trial  experiments reported here and could assist in dengue outbreak surveillance and control at the early stages, before outbreaks spread. Although dengue virus infection rate in mosquito is effective for prediction of dengue outbreak, but the technique is costly and time consuming therefore it has never been used to determine dengue outbreak in previous reports. To date technique for rapid detection of dengue virus such as loop-mediated isothermal amplification (LAMP) is developed [37]. LAMP reactions can be observed by naked eyes [38] and the technique has low cost therefore it could be used to determine dengue virus infection rate in mosquito in the field survey. Dengue infection rate in mosquitoes could be incorporated in the dengue control measure in the near future. Currently, we are extending the model to other factors that could potentially enhance model performance. Landscape, dengue serotypes, and demographic transitions in the target areas are some of the additional factors now undergoing further investigation.