 Methodology Article
 Open Access
 Published:
Attention mechanism enhanced LSTM with residual architecture and its application for proteinprotein interaction residue pairs prediction
BMC Bioinformatics volume 20, Article number: 609 (2019)
Abstract
Background
Recurrent neural network(RNN) is a good way to process sequential data, but the capability of RNN to compute long sequence data is inefficient. As a variant of RNN, long short term memory(LSTM) solved the problem in some extent. Here we improved LSTM for big data application in proteinprotein interaction interface residue pairs prediction based on the following two reasons. On the one hand, there are some deficiencies in LSTM, such as shallow layers, gradient explosion or vanishing, etc. With a dramatic data increasing, the imbalance between algorithm innovation and big data processing has been more serious and urgent. On the other hand, proteinprotein interaction interface residue pairs prediction is an important problem in biology, but the low prediction accuracy compels us to propose new computational methods.
Results
In order to surmount aforementioned problems of LSTM, we adopt the residual architecture and add attention mechanism to LSTM. In detail, we redefine the block, and add a connection from front to back in every two layers and attention mechanism to strengthen the capability of mining information. Then we use it to predict proteinprotein interaction interface residue pairs, and acquire a quite good accuracy over 72%. What’s more, we compare our method with random experiments, PPiPP, standard LSTM, and some other machine learning methods. Our method shows better performance than the methods mentioned above.
Conclusion
We present an attention mechanism enhanced LSTM with residual architecture, and make deeper network without gradient vanishing or explosion to a certain extent. Then we apply it to a significant problem– proteinprotein interaction interface residue pairs prediction and obtain a better accuracy than other methods. Our method provides a new approach for proteinprotein interaction computation, which will be helpful for related biomedical researches.
Background
Recurrent neural network(RNN), proposed by Hochreiter, is a major neural network in deep learning, which does as a bridge to connect the the information from past to present. It is based on the back propagation algorithm and contains the factor caused by time, therefore RNN is a kind of back propagation through time(BPTT) algorithm. What’s more, it can tackle the sequencial data including temporal and spatial data owing to its property.
Look at the standard RNN Fig. 1, the information is forward propagation from inputs to outputs. We can describe those information flow by a series of equations. Symbols and notations in this paper mainly refer to the book [1] written by Alex Graves. But here we’ll write it briefly. x denotes the input vector value, \(x_{i}^{t}\) denotes the value of input i^{th} of vector x at time t, and w_{ij} denotes the weight from the unit i to unit j. For the hidden layer unit h, we denote the input of hidden layer unit h at time t:
the output of the hidden layer unit h at time t is denoted as \(b_{h}^{t}\), and the activation function is θ_{h}, so
the output layer’s input can be calculated at the same time:
Like the standard back propagation algorithm, BPTT is also a repeated application of chain rule. For the gradients of loss functions in RNN, the influence from loss function to hidden is not only through hidden layer’s output, but also through its next time step:
where
Then we can get the derivative of whole network weight respectively :
Long short term memory [2](LSTM), as a variant of RNN, proposed by Hochreiter and shown in Fig. 2, consists of one block which has three gates(input/forget/output gate) whose every activation probability is from 0(the gate closes)to 1(the gate opens), and some cells which can remember information and transit it to the next step, while the hidden layer unit in RNN is replaced by three gates. The output values of input gate and forget gate are determined by the prior cells states and the input values.
The subscripts ι,ϕ and ω denote the input, forget and output gate of the block respectively, and c denotes one of the C memory cells. The peephole weight from cell c to the input, forget and output gates is denoted as w_{cι},w_{cϕ} and w_{cω} respectively. \(s_{c}^{t}\) denotes the state of cell c at time t. f, g and h is the activation function of the gates, cell input and output, respectively. Let I denote the number of inputs, K denote the number of outputs and H denote the number of cells in the hidden layer.
Viewing to the Fig. 2 framework, we can get the equations :
input gate
forget gate
cell
output gate
cell’s output
When compared with RNN, LSTM is easier to change the weight of selfrecursive model dynamically by adding the gates, and handle different scale data with better performance. Although there are many variants of LSTM, like GRU [3] which is a simplification of LSTM, and bidirectional LSTM [4], showing stronger performance, there are also some problems in LSTM–gradient explosion or gradient vanishing. [5, 6] both mentioned that in their paper, and employed residual learning [7] to avoid that problem, and did related experiment in speech and human activity recognition. That is why the applications of LSTM that we see are always in shallow neural networks. Though there are a lot of methods [8, 9] getting away from gradient explosion or gradient vanishing to some extent, such as weight regularization, batchnorm, clip gradient, etc, there are no better measures to solve the problem of gradient combining with layer scales. Recently, Sabeek [10] had done RNN in the depths of residual learning, which solved the gradient vanishing problem and showed a better performance. Given the thought of convolutional residual memory networks [11] and deep residual neural networks [7], we utilize a method with mathematical derivation to avoid the problems and deepen LSTM neural networks to excavate more information from original data in next section. Though some researchers aforementioned utilized this thought, there are some differences from our work–we use every two layers as a residue instead of one layer as a residue to accelerate the computational velocity in a sequential and larger dataset while Sabeek used it for sentimental analysis with a small dataset. And we prove its convergence theoretically. Furthermore, we utilize the attention mechanism to strengthen the extraction of information. This part will be shown in “Model architecture” section. If there are some notations you feel confused in “Results” section, we suggest that you’d better to read the “Methods” section before “Results” section. All of these will be described in the flow processes of the algorithm and application in our paper in Fig. 3.
Results
Because the impact to accuracy of FRPP of layer number in neural networks is usually more uncomplicated and efficient than units numbers in parametric numbers. Like the methods of dichotomization, we use different layer numbers in a wide bound to find one with the best performance, then in this way continue to find the neighbor layer numbers and choose the optimal unit number. Viewing to the Table 1 left, we find that layer_60, not only the predicted true positive amounts in top 1%0 but also the mean accuracy, shows better performance than others. In like manner the unit _n and the model layer_m_unit_n can be denoted similarly in whole passage. After that, we continue to narrow it. Table 1 right shows the layer number near to layer_60, which is better than ones around it. So we next search the optimal unit number in layer_60, and finally we choose the best result with unit number in layer_60. Based on Table 1, Table 2 shows the results of the number of different units in detail. Despite the model mean of layer _60_unit_6 is lower than layer _60_unit_8, the number of RFPP(1%0) is quite lager inversely. Table 3 elaborates the result of model layer _60_unit_8 further on. In this model we can predict 8/11 if we choose the top 1%0 pairs of every dimer in the test set as predictions.
Comparison with other methods
PPiPP [12] is a method by using protein sequences for monomer binding site predictions, and PAIRpred [13] is a fresh complex interface prediction approach published in 2014 and realizes a higher prediction accuracy. Zhenni Zhao [14] used a deep learning architecture–multilayer LSTMs, to predict interface residue pairs, and achieved a better accuracy. Table 4 shows the results from the abovementioned approaches in different Docking Benchmark Data dataset. The evaluation index is RFPP. When p equals 90%, our model can predict around 90% proteins correctly in our dataset if we choose top 194 residue pairs as prediction. And it improves around a third when comparing with others. Because of the differences of proteins that we select in our train and test set, and pretreatment methods, we can only take a look at the results of the comparison partly. In addition, our protein sequence is longer and residue pairs amount is bigger than above, hence these can increase the difficulties for predicting RFPP. In order to balance the comparison, we use another evaluation index–accuracy order, to replace it. Wei Wang.etc [15] used different machine learning methods chosen by different protein properties to predict interface residue pairs. we show the comparison and our prediction precision by choosing top 1%0 residue pairs in Table 5.
Furthermore, we also use random theory to calculate the RFPP. As we know mathematical expectation is one of the most significant numerical characteristics to describe the average of variables. X denotes the random variable of RFPP here. In order to correspond to our index of algorithm, we select 1000 pairs randomly, so
where N denotes the number of surface residue pairs and M denotes the number of interface residue pairs.
Then
Why we use the inequality is that the the latter is simpler than the former in computational complexity, but calculation is still complicated based on pure theory. Monte Carlo simulation is a wellknown method to compute the expectation by using the frequency of events to estimate its probability respectively. This will be more convenient for us to achieve them. We use, more specifically, random simulation about 10 billion times, then we count it that happens respectively. The formula:
Here,the purpose we extract the coefficient \(\frac 1{10 \text {billion}}\) is to avoid something happening to reduce the error like the frequency \(\frac {15}{10 \text {billion}}\) limited to 0. All the results will be shown in the last row of Table 3. We can clearly see that our result is extremely better than random RFPP except 1GL1 and 1BUH.
Discussion
Viewing Tables 1 and 2, we select the two best prediction accuracy in each table while choosing top 1%0 as estimated index. According to the Fig. 4, we find that our model shows poor performance in protein 1BUH and good performance in protein both 2VDB and 1Z5Y commonly. One of the most possible reasons is that 1BUH is far away from the train data in homology while 2VDB and 1Z5Y aren’t. This will be verified by identity matrix to some extent which shows the highest homology in train set is 12.86% between 1DFG and 1BUH. As for 1GL1, We notice that the random model with RFPP 124 shows better performance than our model with RFPP 194. This is hard to give an explanation. But from the perspective of homology, we find that 1GL1 has a little higher homology 16.7% with 2I9B. This may be one possible reason for 1GL1. We also depict some of proteinprotein interaction interface pairs predicted by our model in Fig. 5 where the first row is predicted well, but the second is not.
On the one hand, how to choose hyperparameters is also a complicated problem in deep learning. The existing methods such as grid search which gives a trick for us. On the other hand, most biological data will lose some information when we transform it. In detail we use threedimensional coordinates of one atom to replace an amino acid for simplification and we excessively depend on the structure of monomers, It’s one of the biggest limitations. Because our problem is to predict whether any two monomers can form a dimer complex. And the different features selection from original data make different prediction performance. If we don’t consider any physicochemical and geometric properties, from sequence to predict structure directly usually shows low accuracy. And because our prediction method depends on the 9 feature values from monomers structure other than dimer complexes structure, therefore if some values are missing, we will delete the corresponding pairs or whole dimers. This is also a limitation. Recently AlQuraishi [16] employ bidirectional LSTM to predict protein structure from protein sequence and obtain stateofart achievement. This may inspire us to rethink the problem from protein sequence perspective. Data extreme imbalance is a serious problem introduced to model for training. How to choose a good approach is also preferred.
Conclusions
In this paper, we employ a novel LSTM based on residual architecture and attention mechanism, and derive the gradient. Then we utilize this model to predict proteinprotein interaction interface residue pairs, and compare our model with standard LSTMs and other methods, to show that our prediction accuracy is more than 72 percent which far surpasses other methods in performance. This will be more significant for biomedical related research as well as the computational though there are a lot of further problems we can consider like the feature selections, coevolution [17] information, contact preferences and interface composition [18].
Methods
Algorithm derivation
Before deriving the equations of backward pass, we need to redefine LSTM. We call the LSTM unit a small block, and the two LSTM layers a big block, which possesses an additional connection from the output layer l to the output layer l+2 (see bold line in Fig. 6).
Figure 6 is a simplified version, and we just consider that there is only one cell in LSTM unit. However, what we usually use is full connection traditionally. In order to view the differences from different layers, we use the (·)^{l} to present the values of the layer l respectively. For example, the \(\left (b_{c}^{t}\right)^{\mathit {l}}\) denotes the cell output value of layer l. And if they are in a same layer, then we omit the superscript l additionally.
cell’s output
output gate
state
cell
forget gate
input gate
We can see that if gradient vanishing happens in layer l+2 which also means that \(\frac {\partial \left (b_{c}^{t}\right)^{l+2}}{\partial \left (b_{c}^{t}\right)^{l}}=0\), the conventional LSTM fail to update parameters before layer l+2. But from (2.2), our model architecture can prohibit that because of \(1+ \frac {\partial \left (b_{c}^{t}\right)^{l+2}}{\partial \left (b_{c}^{t}\right)^{l}}=1\).
Background, data, and evaluation criteria
Proteins are the foundations of life activities for cells, but most of them exert their functions only having interaction with other molecules. As a result, proteinprotein interaction prediction becomes a very important project. The first step of it is to know the site of interface residue pairs precisely. The most common methods are from experimental and computational perspective recently. One the one hand, anatomizing all proteins is unfeasible to experiment technicians for the high expenses. On the other hand, the computational methods become the scientific tidal current due to its low costs and convenience, such as template [19] and structure model [20] methods. In recent years, artificial intelligence especially machine learning and deep learning has been used in computer vision image and language recognition,etc, and received many achievements. At the same time some computational researchers transfer those methods to biology. Protein contact prediction [21] is one of the good instances by using deep residual networks. Though there are some achievements [13–15] in proteinprotein interaction interface residue pairs predictions especially while Zhenni [14] used a deep learning architecture to tackle this project, we still need to proceed and develop new algorithms for its low accuracy. Here we will apply our method to predict interface residue pairs.
Our data is from benchmark versions 3.0, 4.0, and 5.0 [22, 23] on the international Critical Assessment of PRoteinprotein Interaction predictions(CAPRI). All selected dimers whose states are unbound satisfy our requirement and add up to 54, then they are randomly split into three parts including train, validation, test set with ratio around 6:2:2 (shown in Table 6). Moreover, In order to illustrate test efficiency of our data partition structure, we identity multi protein sequences homology comparison in ClustalW2 https://www.ebi.ac.uk/Tools/msa/muscle/. Both of the results are attached in supplementary–identity matrix, and only the homology ≥30% of two dimers is shown in Table 6. From the identity matrix, we can see only the partition of 2I25(in train set) and 1H9D(in test set) is little unreasonable because of the homology with 40%, but we will show the better prediction result of 1H9D with such litter higher homology later. Every residue pair consists of 18 features which are concatenated by the two 9 feature values of each residue proposed basing on physicochemical and geometric properties which are common in computation. The 9 features are listed below and their computation are shown respectively in Table 7. Interior Contact area(IC) [24], Exterior Contact area with other residues(EC) [24] Exterior Void area(EV) [24, 25], Absolute Exterior Solvent Accessible area(AESA) [25], Relative Exterior Solvent Accessible area(RESA) [25], Hydropathy Index(HI, two versions) [26, 27] and pK _{α} (two versions) [28]. paper [29]summarized these features and their respective tools for computation. Here we just simply describe it. IC is the Interior Contact area between atoms inside a residue. EC is the Exterior Contact area between residues from the same protein. EV is the area does not contact with water molecules or any amino acid. AESA is the contact area between water molecules and surface residues.
RESA is a proportion between AESA in protein and AESA of free amino acids. H1 and H2 are two versions of hydrophobicity index used to measure the hydrophobic ability. pKa is a reflection of the electrostatics of surface residue in the specific environment.
A residue pair is defined as interface if the contact areas of two amino acids from different two monomers are not zero. Here we use two statistical evaluation criteria combining biological meanings to measure our model prediction: rank of the first positive prediction(RFPP), and the number of correctly predicted dimers(NCPD). In order to overcome the length differences and balance the predicted difficult degree in different proteins, accuracy order is adopted.
\( accuracy \quad order = \frac {RFPP}{TNRP} \), where TNRP is the total number of residue pairs in a dimer.
Model architecture
This is a binary classification problem. The input format is a matrix with dimension L×18 Fig. 7, since every amino acid consists of 9 features and a residue pair possesses 18 features. Where L is the number of combinations of amino acid residue pairs. We use the label 1 to present that the pair is an interface residue pair, and label 0 is opposite. Because the amount of label 0s is extremely larger than 1s, so we need to pretreat the imbalance between the positive and negative samples. We use a distance to exclude some impossible residue pairs. The distance between different chains will be small to some way to meet a threshold if the residue pairs are contact. Therefore we choose the residue pairs with the most short distance, then choose 3 residues around them in each chain respectively, hence there are 3×3 pairs altogether. This method can reduce the amount of negative samples efficiently. Because we use this selective method which can make the data sequential, therefore the LSTM neural network is a quite good choice for us. Then the data pretreated will be input to the neural network architecture. There are some hyperparameters to explain in detail. Dropout [30] is a way to prevent model from overfitting, because it can be a probability from 0 to 1 to drop out the units and cutdown all the connections from the units to next units randomly. In this paper, we use 0.15 to dropout some redundant information of the inputs. According to the new achievement, Wojciech Zeremba [31] proposed a new method–adding dropout from the current layer to next layer, but not to recurrent layer, to regularize the RNN, which inspires us to use dropout in LSTM and fit it in 0.6. These hyperparameters can be fitted by a common technique–grid search, and the results will be shown in supplementary. Attention has been widely used in speech recognition [32] and reasoning [33],etc for its efficient mechanism which can reallocate weight and retrieve some more critical information, therefore these motivate us to use attention in our model. The dense layer’s activation function is softmax, and the loss function is categorical crossentropy. Softmax and crossentropy is designed as following
where p is a true distribution while q is an estimated distribution. Softmax function can mapping a n^{d} vector to another n^{d} vector whose elements are from 0 to 1. Crossentrop, equal to maximum likelihood estimation, is an index to measure the gap between the true distribution and the estimated distribution.
Availability of data and materials
Our code and parameters of model can be found in https://github.com/JialeLiu/LSTM and data is available in ftp://202.112.126.135/pub/surrounding_3.mat.
Abbreviations
 BPTT:

Back propagation through time
 LSTM:

Long short term memory
 NCPD:

The number of correctly predicted dimers
 RFPP:

Rank of the first positive prediction
 RNN:

Recurrent neural network
 TNRP:

Total number of residue pairs in a dimer
References
 1
Graves A. Supervised sequence labelling. In: Supervised Sequence Labelling with Recurrent Neural Networks. Springer: 2012. p. 5–13.
 2
Hochreiter S, Schmidhuber J. Long shortterm memory. Neural Comput. 1997; 9(8):1735–80.
 3
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 2014.
 4
Zhou J, Xu W. Endtoend learning of semantic role labeling using recurrent neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers): 2015. p. 1127–37.
 5
Kim J, ElKhamy M, Lee J. Residual lstm: Design of a deep recurrent architecture for distant speech recognition. arXiv preprint arXiv:1701.03360. 2017.
 6
Zhao Y, Yang R, Chevalier G, Xu X, Zhang Z. Deep residual bidirlstm for human activity recognition using wearable sensors. Math Problems Engineer. 2018; 2018(7316954):13.
 7
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. p. 770–8.
 8
Jozefowicz R, Zaremba W, Sutskever I. An empirical exploration of recurrent network architectures. In: Int Confer Mach Learn.2015. p. 2342–50.
 9
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. 2014.
 10
Pradhan S, Longpre S. Exploring the depths of recurrent neural networks with stochastic residual learning. Report. 2016.
 11
Moniz J, Pal C. Convolutional residual memory networks. arXiv preprint arXiv:1606.05262. 2016.
 12
Ahmad S, Mizuguchi K. Partneraware prediction of interacting residues in proteinprotein complexes from sequence data. PLoS One. 2011; 6(12):29104.
 13
Afsar Minhas FuA, Geiss BJ, BenHur A. Pairpred: Partnerspecific prediction of interacting residues from sequence and structure. Proteins: Struct, Func, Bioinforma. 2014; 82(7):1142–55.
 14
Zhao Z, Gong X. Proteinprotein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM Trans Comput Biol Bioinforma. 2017; 16(5):1753–59.
 15
Wang W, Yang Y, Yin J, Gong X. Different proteinprotein interface patterns predicted by different machine learning methods. Sci Rep. 2017; 7(1):16023.
 16
AlQuraishi M. Endtoend differentiable learning of protein structure. Cell systems. 2019; 8(4):292–301.
 17
Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information. Elife. 2014; 3:02030.
 18
Nadalin F, Carbone A. Protein–protein interaction specificity is captured by contact preferences and interface composition. Bioinformatics. 2017; 34(3):459–68.
 19
Ohue M, Matsuzaki Y, Shimoda T, Ishida T, Akiyama Y. Highly precise proteinprotein interaction prediction based on consensus between templatebased and de novo docking methods. In: BMC Proceedings. BioMed Central: 2013. p. 6.
 20
Singh R, Park D, Xu J, Hosur R, Berger B. Struct2net: a web service to predict protein–protein interactions using a structurebased approach. Nucleic Acids Res. 2010; 38(suppl_2):508–15.
 21
Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultradeep learning model. PLoS Comput Biol. 2017; 13(1):1005324.
 22
Vreven T, Moal IH, Vangone A, Pierce BG, Kastritis PL, Torchala M, Chaleil R, JiménezGarcía B, Bates PA, FernandezRecio J, et al. Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J Mole Biol. 2015; 427(19):3031–41.
 23
Janin J, Henrick K, Moult J, Ten Eyck L, Sternberg MJ, Vajda S, Vakser I, Wodak SJ. Capri: a critical assessment of predicted interactions. Proteins: Structure, Function, and Bioinformatics. 2003; 52(1):2–9.
 24
Fischer TB, Holmes JB, Miller IR, Parsons JR, Tung L, Hu JC, Tsai J. Assessing methods for identifying pairwise atomic contacts across binding interfaces. J Struct Biol. 2006; 153(2):103–12.
 25
Hubbard S, Thornton J. Naccess: Department of biochemistry and molecular biology, university college london. 1993. Software available at http://www.bioinf.manchester.ac.uk/naccess/nacdownload.html.
 26
Eisenberg D. Threedimensional structure of membrane and surface proteins. Ann Rev Biochem. 1984; 53(1):595–623.
 27
Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mole Biol. 1982; 157(1):105–32.
 28
Olsson MH, Søndergaard CR, Rostkowski M, Jensen JH. Propka3: consistent treatment of internal and surface residues in empirical p k a predictions. J Chem Theory Comput. 2011; 7(2):525–37.
 29
Yang Y, Wang W, Lou Y, Yin J, Gong X. Geometric and amino acid type determinants for proteinprotein interaction interfaces. Quantitative Biol. 2018; 6(2):163–74.
 30
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014; 15(1):1929–58.
 31
Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329. 2014.
 32
Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y. Attentionbased models for speech recognition. In: Advances in Neural Information Processing Systems: 2015. p. 577–85.
 33
Rocktäschel T, Grefenstette E, Hermann KM, Kočiskỳ T, Blunsom P. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664. 2015.
Acknowledgements
The authors gratefully acknowledge the discussion with Zhenni Zhao. Thanks to the support of Beijing Advanced Innovation Center for Structural biology.
Funding
We’d like to thank for National Natural Science Foundation of China(Nos. 31670725 and 91730301) for providing financial supports for this study and publication charges. These funding bodies did not play any role in the design of study, the interpretation of data, or the writing of this manuscript.
Author information
Affiliations
Contributions
JL developed the algorithm, did the computation, and wrote the manuscript. XG designed the project, collected the data and revised the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
No applicable.
Consent for publication
No applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Liu, J., Gong, X. Attention mechanism enhanced LSTM with residual architecture and its application for proteinprotein interaction residue pairs prediction. BMC Bioinformatics 20, 609 (2019). https://doi.org/10.1186/s1285901931991
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285901931991
Keywords
 Residual architecture
 Attention
 LSTM
 Proteinprotein interaction prediction
 Monte Carlo