 Research
 Open access
 Published:
Attentionbased recurrent neural network for influenza epidemic prediction
BMC Bioinformatics volume 20, Article number: 575 (2019)
Abstract
Background
Influenza is an infectious respiratory disease that can cause serious public health hazard. Due to its huge threat to the society, precise realtime forecasting of influenza outbreaks is of great value to our public.
Results
In this paper, we propose a new deep neural network structure that forecasts a realtime influenzalike illness rate (ILI%) in Guangzhou, China. Long shortterm memory (LSTM) neural networks is applied to precisely forecast accurateness due to the longterm attribute and diversity of influenza epidemic data. We devise a multichannel LSTM neural network that can draw multiple information from different types of inputs. We also add attention mechanism to improve forecasting accuracy. By using this structure, we are able to deal with relationships between multiple inputs more appropriately. Our model fully consider the information in the data set, targetedly solving practical problems of the Guangzhou influenza epidemic forecasting.
Conclusion
We assess the performance of our model by comparing it with different neural network structures and other stateoftheart methods. The experimental results indicate that our model has strong competitiveness and can provide effective realtime influenza epidemic forecasting.
Background
Influenza is an infectious respiratory disease that can cause serious public health hazard. It can aggravate the original underlying disease after infection, causing secondary bacterial pneumonia and acute exacerbation of chronic heart and lung disease. Furthermore, the 2009 H1N1 pandemic caused between 151,700 and 575,400 deaths in worldwide during the first year the virus circulated [1]. Therefore, precise online monitoring and forecasting of influenza epidemic outbreaks has a great value to public health departments. Influenza detection and surveillance systems provide epidemiologic information that can help public health sectors develop preventive measures and assist local medical institutions in deployment planning [2].
Influenzalikeillness (ILI) is an infectious respiratory infection measurement defined by the World Health Organization (WHO). ILI with a measured fever higher than 38^{∘}C, and cough, with onset within the previous 10 days [3]. Our prediction target, ILI%, is equal to the ratio of the influenzalike cases number to the visiting patients’ number. In the field of influenza surveillance, ILI% is often used as an indicator to help determine if there is a possible influenza epidemic. When the ILI% baseline is exceeded, the influenza season has arrived, reminding the health administrations to take timely preventive measures.
In recent years, more and more researchers have concentrated on precise online monitoring, early detection and influenza epidemic outbreaks forecasting. Thus, influenza epidemic outbreaks forecasting has become the most active research direction. The information from website search or social network applications, such as Twitter and Google Correlate[4–6], provides sufficient data support for this research area. Previous methods are commonly built on linear models, such as least absolute shrinkage and selection operator (LASSO) or penalized regression[4, 6, 7]. Some people also implement deep learning models when solving influenza epidemic forecasting problems[8, 9]. However, these methods can’t efficiently provide the precise forecasting of ILI% one week in advance. First, the online data is not accurate enough and lacks necessary features, which cannot fully reflect the trend of the influenza epidemic. Second, influenza epidemic data is usually very complex, nonstationary, and very noisy. Traditional linear models cannot handle multivariable inputs appropriately. Third, previously proposed deep learning methods didn’t consider the timesequence property of influenza epidemic data.
In this paper, we use influenza surveillance data as our data set, which is provided by the Guangzhou Center for Disease Control and Prevention. This data set includes multiple features and is count separately of each district in Guangzhou. Our approach takes advantage of these two characteristics. Meanwhile, we consider the timesequence property, making our approach solve the influenza epidemic forecasting problem in Guangzhou with pertinence. Due to the relevant specifications of data collection, our method is also applicable in other regions.
We concentrate on implementing deep learning models to address the influenza outbreaks forecasting problem. Recently, deep learning methods have obtained remarkable performances in various research areas from computer vision, speech recognition to climate forecasting[10–12]. We implement longshort term memory (LSTM) neural networks[13] as a fundamental method for forecasting, because the influenza epidemic data naturally has time series attribute. Considering that different types of input data correspond to different characteristics, one single LSTM with a specific filter may not capture the time series information comprehensively. By using a multichannel architecture, we can better capture the time series attributes from the data. Not only ensures the integration of various relevant descriptors in the highlevel network, but also ensures that the input data will not interfere with each other in the underlying network. The structured LSTM can provide robust fitting ability that has been provided in several papers [14, 15]. We further enhance our method using attention mechanism. In attention layer, the probability of occurrence of each value in the output sequence depends on the values in the input sequence. By designing this architecture, we can better deal with input stream relationships among multiple regions more appropriately. We named our model as AttMCLSTM, which stands for attentionbased multichannel LSTM.
Our main contributions can be summarized as follows: (1) We test our model on Guangzhou influenza surveillance data set, which is authentic and reliable. It contains multiple attributes and time series features. (2) We propose an attentionbased multichannel LSTM structure that associates different wellbehaved approaches. The structure takes the forecasting problem and the influenza epidemic data attributes into account. The proposed model can be seen as an alternative to forecast influenza epidemic outbreaks in other districts. (3) The proposed model makes full use of information in the data set, solving the actual problem of influenza epidemic forecasting in Guangzhou with pertinence. The experimental results demonstrate the validity of our method. To the best of our knowledge, this is the first study that applies LSTM neural networks to the influenza outbreaks forecasting problem.
The rest of this paper is organized as follows. In the second section, we illustrate details of our method. In the third section, we evaluate performances of our method by comparing it with different neural network structures and other prior art methods. In the fourth section, we discuss conclusions and prospects for future works.
Methods
The accurateness of the forecasting problems can be enhanced by combining multiple models[16–26]. In this paper, we devise an novel LSTM neural network structure to settle the influenza epidemic forecasting problem in Guangzhou, China. Our model can extract characteristics more effectively from time series data, and take different impacts of different parts of data into consideration. In order to illustrate our model clearly, we illustrate our data set first. The following sections will give further illustrations on the data set, the overall idea of our model, details of LSTM neural networks, attention mechanism, attentionbased multichannel LSTM, data normalization, and evaluation method.
Data set description
The influenza surveillance data we used includes 9 years data. Statistics on influenza epidemic data in 9 regions are counted each year. The data set includes 6 modules, and each of these modules has multiple features. The data set has one record each week, and data for 52 weeks is counted each year.
Design of the proposed model
In Fig. 1, we demonstrate the flow diagram of our method. The integrated flow diagram has two parts, training part and test part. In the training part, first, we select 19 relevant features after data cleaning and normalization processes. We further illustrate the chosen modules and features in Table 1. Table 1 doesn’t include basic information module, which includes time information, districts, and population. We use modelbased ranking method as our feature selection method. In order to implement modelbased ranking method, we delete one feature at a time, and input the rest of features into the same forecasting model every time. If the forecasting accuracy is low, this means that the feature we removed is relevant to our forecasting objective. After ranking all the forecasting accuracy, we select 19 features that are relevant to the forecasting objective. Then we separate the data set into training data set and test data set. The training data set contains 80 percent of data to extract annual trend and seasonal periodicity. In the test part, we test our model on the test data set. Then, we preform denormalization process to reconstruct the original values. Finally, we assess our model and compare it with other models.
Data normalization
MinMax normalization is a linear transformation strategy[27]. This method maintains the relationship among all the original data. MinMax normalization transforms a value x to y, y is defined as Eq. 1.
Where min is the smallest value in the data, max is the biggest value in the data. After data normalization, the features of data will be scaled between 0 and 1.
We preform denormalization process to reconstruct the original data. Given a normalized value y, its original value x is defined as Eq. 2.
Longshort term memory neural network
Recurrent neural networks have the ability to dynamically combine experiences because of their internal recurrence[28]. Different from other traditional RNNs, LSTM can deal with the gradient vanishing problem[29]. The memory units of LSTM cells retain time series attributes of given context[29]. Some researches have proven that LSTM neural networks can yield a better performance compared with other traditional RNNs when dealing with longterm time series data[30].
The structure of a single LSTM cell illustrate in Fig. 2. The gates control the flow of information, that is, interactions between different cells and cell itself. Input gate controls the memory state updating process. Output gate controls whether the output flow can alter other cells’ memory state. Forget gate can choose to remember or forget its previous state. LSTM is implemented by following composite functions:
Where σ represent the logistic sigmoid function. i, f, o, and c represent the input gate, forget gate, output gate, cell input activation vectors respectively. h represents the hidden vector. The weight matrix subscripts have the intuitive meaning. Like, W_{hi} represents the hiddeninput gate matrix etc.
Attention mechanism
Traditional EncodeDecode structures typically encode an input sequence into a fixedlength vector representation. However, this model has drawbacks. When the input sequence is very long, it is difficult to learn a feasible vector representation.
One fundamental theory of attention mechanism[31] is to abandon the conventional EncoderDecoder structure. Attention mechanism trains a model that selectively learns the input streams by conserving the intermediate outputs of LSTM. In attention structure, the output sequences are affiliated with the input sequences. In other words, the probability of occurrence of each value in the output sequence depends on the value in the input sequence. Figure 3 illustrates the attention mechanism.
Attention layer calculates the weighted distribution of X_{1}, …, X_{T}. The input of S_{t} contains the output of the attention layer. The probability of occurrence of the output sequence …, y_{t−1},y_{t}, … depends on input sequence X_{1}, X_{2}, …, X_{T}. h_{i} represents the hidden vector. A_{t,i} represents the weight of i^{th} input at time step t. Attention layer inputs n parameters y_{1}, …, y_{n}, context sequence c, and outputs vector z, z is the weighted distribution of y_{i} for a given context c. Attention mechanism is implemented by following composite function:
Where m_{i} is calculated by tanh layer, s_{i} is the softmax of the m_{i} projected on a learned direction. The output z is the weighted arithmetic mean of all y_{i}, W represents the relevance for each variable according to the context c.
Attentionbased multichannel LSTM
In Fig. 4, we illustrate the overall architecture of our model. We separate our data set into two categories. First, we classify average temperature, maximum temperature, minimum temperature, rainfall, air pressure and relative humidity together as climaterelated data category. Then, the rest of features are classified together as influenzarelated data category. In our data set, each region has its own influenzarelated data, and they share the same climaterelated data every week.
Because our data set has the above characteristics, the inputs of AttMCLSTM contains two parts. First, the influenzarelated data is input into a series of LSTM neural networks (LSTM 1, …, LSTM 9) to capture correlative features. Second, the climaterelated data is input into a single LSTM neural network (LSTM 10) to capture the longterm time series attribute of influenza epidemic data. For the first part, each LSTM neural network acquires the influenzarelated data from one distinct region. In order to make full use of the complementarity among every regions, the outputs of LSTM neural networks (LSTM 1, …, LSTM 9) are concatenated in a higher layer (Merge 1). This higher layer can obtain the fused descriptors of underlying neural networks. After we capture the features of every regions, we still want to weight intermediate sequences. The reason is that the data of each region has different influences on the final forecasting result. Therefore, the intermediate sequences pass through an attention layer (Attention) and a fully connected layer (Dense 1) in turn. Thereafter, we concatenate the outputs of these two parts together (Merge 2). Finally, the intermediate sequences are passed through two fully connected layers (Dense 2, Dense 3). So far, we acquire the highlevel features of the input data, and they are used to solve the influenza epidemic forecasting.
By designing a multichannel structure, we can better extract the timesequence property of each type of data. Not only ensures the integration of various relevant descriptors in the highlevel network, but also ensures that input data will not interfere with each other in the underlying network. In the attention layer, the probability of occurrence of each value in the output sequence depends on the value in the input sequence. This structure allows us to handle the relationship of input data between different districts more appropriately.
Evaluation method
To evaluate our method, we use the mean absolute percentage error (MAPE) as the criteria standard. Its formula is express as Eq. 12.
Where y_{i} denotes the i^{th} actual value, and x_{i} denotes the i^{th} predicted value. If the value of MAPE is low, the accuracy of the method is high.
Experiments
In this section, we did two experiments to verify the AttMCLSTM model. In the first experiment, we evaluate the numbers of consecutive weeks of data that we need to forecast ILI% for the next week. In the second experiment, we compare our model with different neural network structures and other methods. Each experiment result is the average of 10 repeated trials.
Selection of consecutive weeks
In this experiment, we set the numbers of consecutive weeks as 6, 8, 10, 12, 14 respectively. The hyperparameters of each layer are listed in Table 2. The activation functions we used are linear activation function. The loss function and optimizer are mape and adam respectively.
We use the first 370 consecutive weeks’ data in training phase and the remaining data in the test phase. Each data sample includes 6 features in climaterelated data category and 9 different districts’ influenzarelated data. Each influenzarelated data contains 13 features. The climaterelated data and each district’s influenzarelated data are input into the climaterelated channel and the influenzarelated channel respectively. The forecasting results are shown in Table 3.
Performance validation
In this experiment, we verify the validity of our model.
First, we compare AttMCLSTM with MCLSTM by comparing their forecasting accuracy. The purpose of doing this is to verify the effect of the attention mechanism. For both models, we use the same multichannel architecture (as shown in Fig.4). The only difference between these two models is that we delete the attention layer in MCLSTM. The parameters settings and data inputs method are as described in the first experiment.
Second, we compare MCLSTM with LSTM by comparing their forecasting accuracy. The purpose of doing this is to verify the effect of the multichannel structure. For MCLSTM, parameters settings and data inputs method are as described in the first experiment. For LSTM, we input entire features into one LSTM layer to capture the fused descriptors. Instead of separating data set according to different regions, we sum corresponding influenzarelated features in each week from every regions together. Therefore, each data record includes 19 selected features. The data that contains these 19 features are passed through a fully connected layer to acquire highlevel features. The units’ number of LSTM layer and fully connected layer are 32 and 1 respectively.
Third, we demonstrate that LSTMs can yield better performance than RNNs when dealing with time series data.
Discussion
The results of the first experiment indicate that 10 consecutive weeks data can appropriately reflect the time series attribute of influenza data. If the length of input data is shorter than 10, the input data doesn’t contain enough time series information. On the contrary, if the length of input data is longer than 10, the noise inside the input data increased, leading to a decrease in forecasting accuracy. Therefore, in our experiments, each data record includes 10 consecutive weeks’ data.
The results of the second experiment show that AttMCLSTM can yield the best performance. In Table 4, from the first two rows, we can conclude that using attention mechanism can improve the MAPE from 0.105 to 0.086. The reason is that the attention layer can better deal with the relationships of input streams among every regions more appropriately. From the second row and the third row, we can conclude that using multichannel structure can improve the MAPE from 0.118 to 0.105. The reason is that the multichannel structure can better capture the time series attributes from different input streams. From the last two rows, we can conclude that using LSTM can improve the MAPE from 0.132 to 0.118. The reason is that LSTM neural network can better deal with time series data. This result also demonstrates the time series attribute of influenza epidemic data.
Figure 5 shows the actual values and predicted values of four models. We can see that the result of AttMCLSTM is close to the actual output. There are more obvious differences between the predicted results and the actual value by using the other three models. So, this can verify that adopting AttMCLSTM to analyze the sequential information can help to extract timesequence characteristic more accurately and comprehensively.
Conclusion and future work
In this paper, we propose a new deep neural network structure (AttMCLSTM) to forecast the ILI% in Guangzhou, China. First, we implement the multichannel architecture to capture time series attributes from different input streams. Then, the attention mechanism is applied to weight the fused feature sequences, which allows us to deal with relationships between different input streams more appropriately. Our model fully consider the information in the data set, targetedly solving the practical problem of influenza epidemic forecasting in Guangzhou. We assess the performance of our model by comparing it with different neural network structures and other stateoftheart models. The experimental results indicate that our model has strong competitiveness and can provide effective realtime influenza epidemic forecasting. To the best of our knowledge, this is the first study that applies LSTM neural networks to the influenza outbreaks forecasting. Continuing work will further improve the expansion ability of our model by introducing transfer learning.
Availability of data and materials
All data information or analyzed during this study are included in this article.
Abbreviations
 ILI:

Influenzalike illness
 LSTM:

Long shortterm memory
 LASSO:

Least absolute shrinkage and selection operator
 MAPE:

Mean absolute percentage error
References
Yang S, Santillana M, Kou SC. Accurate estimation of influenza epidemics using google search data via argo. Proc Natl Acad Sci. 2015; 112(47):14473–8.
Brownstein JS, Mandl KD. Reengineering real time outbreak detection systems for influenza epidemic monitoring. In: AMIA Annual Symposium Proceedings, vol. 2006. American Medical Informatics Association: 2006. p. 866.
Organization WH, et al.Who interim global epidemiological surveillance standards for influenza. 2012:1–61.
Santillana M, Zhang DW, Althouse BM, Ayers JW. What can digital disease detection learn from (an external revision to) google flu trends?Am J Prev Med. 2014; 47(3):341–7.
Achrekar H, Gandhe A, Lazarus R, Yu S. H., Liu B. Predicting flu trends using twitter data. In: Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference On. IEEE: 2011. p. 702–7. https://doi.org/10.1109/infcomw.2011.5928903.
Broniatowski DA, Paul MJ, Dredze M. National and local influenza surveillance through twitter: an analysis of the 20122013 influenza epidemic. PLoS ONE. 2013; 8(12):83672.
Santillana M, Nsoesie EO, Mekaru SR, Scales D, Brownstein JS. Using clinicians’ search query data to monitor influenza epidemics. Clin Infect Dis Off Publ Infect Dis Soc Am. 2014; 59(10):1446.
Xu Q, Gel YR, Ramirez LLR, Nezafati K, Zhang Q, Tsui K. L.Forecasting influenza in hong kong with google search queries and statistical model fusion. PLoS ONE. 2017; 12(5):0176690.
Hu H, Wang H, Wang F, Langley D, Avram A, Liu M. Prediction of influenzalike illness based on the improved artificial tree algorithm and artificial neural network. Sci Rep. 2018; 8(1):4895.
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. science. 2006; 313(5786):504–7.
Zou B, Lampos V, Gorton R, Cox IJ. On infectious intestinal disease surveillance using social media content. In: Proceedings of the 6th International Conference on Digital Health Conference. ACM: 2016. p. 157–61. https://doi.org/10.1145/2896338.2896372.
Huang W, Song G, Hong H, Xie K. Deep architecture for traffic flow prediction: Deep belief networks with multitask learning. IEEE Trans Intell Transp Syst. 2014; 15(5):2191–201.
How DNT, Loo CK, Sahari KSM. Behavior recognition for humanoid robots using long shortterm memory. Int J Adv Robot Syst. 2016; 13(6):1729881416663369.
Yang Y, Hao J, Sun M, Wang Z, Fan C, Strbac G. Recurrent deep multiagent qlearning for autonomous brokers in smart grid. In: IJCAI, vol. 18: 2018. p. 569–75. https://doi.org/10.24963/ijcai.2018/79.
Yang Y, Hao J, Wang Z, Sun M, Strbac G. Recurrent deep multiagent qlearning for autonomous agents in future smart grid. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems: 2018. p. 2136–8. https://doi.org/10.24963/ijcai.2018/79.
ShafieKhah M, Moghaddam MP, SheikhElEslami M. Price forecasting of dayahead electricity markets using a hybrid forecast method. Energy Convers Manag. 2011; 52(5):2165–9.
Xiaotian H, Weixun W, Jianye H, Yaodong Y. Independent generative adversarial selfimitation learning in cooperative multiagent systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems: 2019. p. 1315–1323. International Foundation for Autonomous Agents and Multiagent Systems.
Yaodong Y, Jianye H, Yan Z, Xiaotian H, Bofeng F. Largescale home energy management using entropybased collective multiagent reinforcement learning framework. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems: 2019. https://doi.org/10.24963/ijcai.2019/89.
Hongyao T, Jianye H, Tangjie Lv, Yingfeng C, Zongzhang Z, Hangtian J, Chunxu R, Yan Z, Changjie F, Li W. Hierarchical deep multiagent reinforcement learning with Temporal Abstraction. In: arXiv preprint arXiv:1809.09332: 2018.
Peng J, Guan J, Shang X. Predicting parkinson’s disease genes based on node2vec and autoencoder. Front Genet. 2019; 10:226.
Peng J, Zhu L, Wang Y, Chen J. Mining relationships among multiple entities in biological networks. IEEE/ACM Trans Comput Biol Bioinforma. 2019. https://doi.org/10.1109/tcbb.2019.2904965.
Peng J, Xue H, Shao Y, Shang X, Wang Y, Chen J. A novel method to measure the semantic similarity of hpo terms. IJDMB. 2017; 17(2):173–88.
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. Dincrna: a comprehensive webbased bioinformatics toolkit for exploring disease associations and ncrna function. Bioinformatics. 2018; 34(11):1953–6.
Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M, Zhou W, Liu G, Jiang H, Jiang Q. Lncrna2target v2. 0: a comprehensive database for target genes of lncrnas in human and mouse. Nucleic Acids Res. 2018; 47(D1):140–4.
Hu Y, Zhao T, Zang T, Zhang Y, Cheng L. Identification of alzheimer’s diseaserelated genes based on data integration method. Front Genet. 2018; 9. https://doi.org/10.3389/fgene.2018.00703.
Peng J, Hui W, Li Q, Chen B, Jiang Q, Wei Z, Shang X. A learningbased framework for mirnadisease association prediction using neural networks. bioRxiv. 2018:276048. https://doi.org/10.1101/276048.
Panda SK, Jana PK. Efficient task scheduling algorithms for heterogeneous multicloud environment. J Supercomput. 2015; 71(4):1505–33.
Murtagh F, Starck JL, Renaud O. On neurowavelet modeling. Dec Support Syst. 2004; 37(4):475–84.
Hochreiter S, Schmidhuber J. Long shortterm memory. Neural Comput. 1997; 9(8):1735–80.
Palangi H, Deng L, Shen Y, Gao J, He X, Chen J, Song X, Ward R. Deep sentence embedding using long shortterm memory networks: Analysis and application to information retrieval. IEEE/ACM Trans Audio Speech Lang Process (TASLP). 2016; 24(4):694–707.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems: 2017. p. 5998–6008.
Acknowledgements
We thank the reviewers’ valuable comments for improving the quality of this work.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 18, 2019: Selected articles from the Biological Ontologies and Knowledge bases workshop 2018. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume20supplement18.
Funding
Publication costs are funded by The National Natural Science Foundation of China (Grant Nos.: U1836214), Tianjin Development Program for Innovation and Entrepreneurship and Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission (NO.: 17ZXRGGX00150).
Author information
Authors and Affiliations
Contributions
XZ and BF contributed equally to the algorithm design and theoretical analysis. YY, YM, JH, SC, SL, TL, SL, WG, and ZL contributed equally to the the quality control and document reviewing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Xianglei Zhu and Bofeng Fu are equal contributors.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhu, X., Fu, B., Yang, Y. et al. Attentionbased recurrent neural network for influenza epidemic prediction. BMC Bioinformatics 20 (Suppl 18), 575 (2019). https://doi.org/10.1186/s1285901931318
Published:
DOI: https://doi.org/10.1186/s1285901931318