HASP server: a database and structural visualization platform for comparative models of influenza A hemagglutinin proteins
- Xavier I Ambroggio†1,
- Jennifer Dommer†1,
- Vivek Gopalan1,
- Eleca J Dunham2,
- Jeffery K Taubenberger2 and
- Darrell E Hurt1Email author
© Ambroggio et al.; licensee BioMed Central Ltd. 2013
Received: 10 October 2012
Accepted: 21 May 2013
Published: 18 June 2013
Influenza A viruses possess RNA genomes that mutate frequently in response to immune pressures. The mutations in the hemagglutinin genes are particularly significant, as the hemagglutinin proteins mediate attachment and fusion to host cells, thereby influencing viral pathogenicity and species specificity. Large-scale influenza A genome sequencing efforts have been ongoing to understand past epidemics and pandemics and anticipate future outbreaks. Sequencing efforts thus far have generated nearly 9,000 distinct hemagglutinin amino acid sequences.
Comparative models for all publicly available influenza A hemagglutinin protein sequences (8,769 to date) were generated using the Rosetta modeling suite. The C-alpha root mean square deviations between a randomly chosen test set of models and their crystallographic templates were less than 2 Å, suggesting that the modeling protocols yielded high-quality results. The models were compiled into an online resource, the Hemagglutinin Structure Prediction (HASP) server. The HASP server was designed as a scientific tool for researchers to visualize hemagglutinin protein sequences of interest in a three-dimensional context. With a built-in molecular viewer, hemagglutinin models can be compared side-by-side and navigated by a corresponding sequence alignment. The models and alignments can be downloaded for offline use and further analysis.
The modeling protocols used in the HASP server scale well for large amounts of sequences and will keep pace with expanded sequencing efforts. The conservative approach to modeling and the intuitive search and visualization interfaces allow researchers to quickly analyze hemagglutinin sequences of interest in the context of the most highly related experimental structures, and allow them to directly compare hemagglutinin sequences to each other simultaneously in their two- and three-dimensional contexts. The models and methodology have shown utility in current research efforts and the ongoing aim of the HASP server is to continue to accelerate influenza A research and have a positive impact on global public health.
KeywordsInfluenza A HASP Hemagglutinin Receptor binding Membrane fusion ROSETTA Sialic acid Flu Homology modeling Molecular visualization
Influenza A viruses (IAV) are among the most common causes of human respiratory infections and among the most significant because they cause high morbidity and mortality, both in annual epidemics and in unpredictable pandemics . IAVs are enveloped negative-strand RNA viruses with segmented genomes containing 8 gene segments encoding at least 11 open reading frames . IAVs are covered with proteins, notably hemagglutinin (HA) and neuraminidase (NA). The combination of alleles of HA and NA define strain nomenclature because of their variability and importance in pathogenicity, in host immunity, and in host species specificity. Currently, there are 17 subtypes of HA and 10 subtypes of NA.
HA is a glycosylated type I integral membrane protein that functions both as the viral receptor-binding protein and fusion protein. HA recognizes sialic-acid (SA) bound glycans with variable specificity for SA with α2-3 or α2-6 glycosidic linkages; these linkages are critical in determining host species specificity . The HA protein is a trimer, with each monomer composed of a heavy (~40 kDa) and light (~20 kDa) chain cleaved from a single precursor. Given the HA diversity of IAVs, structural predictions are an important tool to map receptor-binding and antigenic regions.
HA crystal structures and templates selected for HASP server
1RD8, 1RU7, 1RUY, 1RUZ, 1RV0, 1RVT, 1RVX, 1RVZ, 2WRG, 2WRH, 3GBN, 3HTO, 3HTP, 3HTQ, 3HTT, 3LZF, 3LZG, 3M6S
2WR7, 2WRB, 2WRC, 2WRD, 2WRE, 2WRF, 3KU3, 3KU5, 3KU6
1EO8, 1HA0, 1HGD, 1HGE, 1HGF, 1HGG, 1HGH, 1HGI, 1HGJ, 1HTM, 1KEN, 1MQL, 1MQM, 1MQN, 1QFU, 1QU1, 2HMG, 2OJE, 2VIR, 2VIS, 2VIT, 2VIU, 3EYM, 3HMG, 4HMG, 5HMG
1JSM, 1JSN, 1JSO, 2FK0, 2IBX, 3FKU, 3GBM, 3MGO
1JSD, 1JSH, 1JSI
Construction and content
All crystal structures of influenza HAs were retrieved from the Protein Data Bank (PDB) and filtered through the PISCES server  using a 95% sequence identity cutoff. Additional criteria included having an R-factor below 0.30 and a resolution better than 4.0 Å, resulting in a set of 12 structures (Table 1). Each of these structures was processed with the Molprobity server  to optimize hydrogen positions and rotamers for asparagine, glutamine, and histidine residues. A symmetric trimer for each model was then generated from the first (or only) monomer in the model by applying the crystallographic or non-crystallographic symmetry operators using the symmetry functionality of ROSETTA3, v3.2 . The resulting models and their amino acid sequences were used for comparative modeling.
All unique HA sequences were downloaded from the Influenza Research Database (8,769 as of June 2011) . The template sequences were searched with each sequence using BLAST . The top scoring high-scoring segment pair (hsp) was used as the alignment for comparative modeling.
A conservative approach was taken in the modeling, where only amino acid types, side-chain conformations, and dihedrals were allowed to change, with insertions and deletions omitted from the model. This approach was chosen because of the high sequence similarity between the query and template HA sequences and the desire to retain as much information from the crystal structure as possible.
The models were created by mapping the query sequence onto the template sequence from the hsp alignment using the fixed-backbone design functionality of ROSETTA3, v3.2 . Rotamers with χ1 angles ±1 standard deviation from canonical Dunbrack rotamers were included in modeling. Energy minimization of the side chain dihedral angles was also performed. For positions with identities between the query and template sequences in the hsp alignments, the side-chain conformation of the template structure was retained. Only aligned residues in the hsp alignment are present in the final models.
Model quality estimates
Summary of rotamer recovery for template sequences modeled using the HASP server
Recovery rate ± stda
0.50 ± 0.11
0.40 ± 0.13
0.89 ± 0.11
0.75 ± 0.12
0.62 ± 0.10
0.35 ± 0.13
0.75 ± 0.08
0.37 ± 0.18
0.45 ± 0.11
0.41 ± 0.12
0.37 ± 0.11
0.69 ± 0.14
0.73 ± 0.11
0.84 ± 0.07
0.96 ± 0.09
0.95 ± 0.06
On a finer scale, the rotamer recovery rate was evaluated as a function of amino acid type (Table 2). Rotamer recovery rates were roughly proportional to hydrophobicity and the number of dihedrals for a given amino acid type. Large hydrophobic amino acids with fewer dihedrals (F, W, Y) were recovered at rates of approximately >90%. Amino acids with many dihedrals (K, R, M) were recovered at the lowest rates, 35-37%, followed by hydrophilic amino acids with fewer dihedrals (D, E, N, Q) with rates between 40-50%.
Utility and discussion
Main features of the HASP server
The HASP server interface has two primary components, the Search tab (buttons or tabs are identified hereafter in italics) to identify HA sequences of interest and the Viewer tab to display both the sequences and structures of the selected HA proteins. In the Search tab, HA sequences of interest can be selected from the database based on the H/N subtype of the strain, the geographical location of strain collection, by keyword, and other strain features.
To start, a user chooses the parameters of the query in the Search tab. By clicking Toggle Map Viewer, results can be narrowed by geographical location of strain collection through a color-coded world map; countries labeled in dark and light shades of green have the highest and lowest number of cases, respectively. Clicking Go updates the results, and after toggling out of the map viewer, a list of HA sequences is displayed. Information provided for each sequence includes the EMBL ID, Subtype, Strain, Year, Location (City, State/Country), and Species (host).
Sequences of interest are selected by checking the box in the right-hand column of the search results. In the Viewer tab, up to two of these models can be displayed simultaneously within the interface using the built-in molecular viewer. Residues can be easily identified within the model through interactive alignment functionality. In the viewer, preset views of the models may be selected through buttons (Hydrogens On, All Atoms On, Single Chain) with options for displaying models as cartoons, or with or without labels. For further analysis, a drop-down menu allows the user to export sequences or structures in FASTA or PDB format, respectively.
Discussion of HASP server use through a case study
In the World Health Organization’s Human Animal Interface online database for H5N1 Avian Influenza in Humans, it is reported that there were three deaths from H5N1 in Vietnam in 2003 and twenty deaths in 2004 . The differences between a pair of HA proteins sequenced from infected humans and chickens in Vietnam in 2003 and 2004 are explored below with the HASP server as a case study.
Because HA regulates IAV entry into the cell  and strongly impacts both pathogenicity and interspecies transmission [16, 17], new insights into HA structure and function are critical contributions to the study of influenza A. The HASP server was designed to enable researchers to visualize their HA sequences of interest in the three-dimensional context of the most related HA crystal structure. It employs computationally fast protocols for model generation, database navigation, and visualization appropriate for the large number of HA sequences and scalable with increasing sequence data. These protocols will be used in planned implementations of the HASP server to allow for models to be generated in real-time for user submitted sequences. The HASP server makes structural information on HA sequences quickly and easily accessible to all researchers, providing a valuable aid for interpreting data and generating new hypotheses. For example, structural information derived from the HASP pipeline was recently utilized in a study of a HA variant from the 1918 influenza pandemic  that resulted in nearly 50 million deaths worldwide . In that study, the models generated were used in docking studies with receptor analogs to assess potential changes in receptor binding specificity resulting from point mutations .
The primary aim of the HASP server is to provide a tool for viewing and comparing amino acid changes in HA subtypes at all levels of protein structure, from primary to quaternary, giving a complete and integrated view of those changes to facilitate understanding. Understanding these changes, which determine the efficiency of transmission, pathology, and ecology of these viruses, is of critical and vital importance to global public health.
Availability and requirements
The HASP server is available free of charge as a web application at: http://exon.niaid.nih.gov/HASP.html
Hemagglutinin Structure Prediction (server)
Root Mean Square Deviation
High-scoring segment pair
Influenza A virus
Protein Data Bank
Google Web Toolkit.
We are grateful to Dr. Yentram Huyen and Mike Tartakovsky for their support of the project. We also thank Dr. Michael Dolan for helpful comments on HASP, and Dr. Meghan Coakley for assistance in writing this manuscript. This research was supported in part by the Office of Science Management and Operations of the National Institute of Allergy and Infectious Diseases. This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Maryland. (http://biowulf.nih.gov).
- Clark NM, Lynch JP: Influenza: epidemiology, clinical features, therapy, and prevention. Semin Respir Crit Care Med. 2011, 32 (4): 373-392. 10.1055/s-0031-1283278.View ArticlePubMed
- Taubenberger JK, Kash JC: Influenza virus evolution, host adaptation, and pandemic formation. Cell Host Microbe. 2010, 7 (6): 440-451. 10.1016/j.chom.2010.05.009.PubMed CentralView ArticlePubMed
- Suzuki Y, Ito T, Suzuki T, Holland RE, Chambers TM, Kiso M, Ishida H, Kawaoka Y: Sialic acid species as a determinant of the host range of influenza A viruses. J Virol. 2000, 74 (24): 11825-11831. 10.1128/JVI.74.24.11825-11831.2000.PubMed CentralView ArticlePubMed
- Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J, Lipman D: The influenza virus resource at the National Center for Biotechnology Information. J Virol. 2008, 82 (2): 596-601. 10.1128/JVI.02005-07.PubMed CentralView ArticlePubMed
- Ha Y, Stevens DJ, Skehel JJ, Wiley DC: H5 avian and H9 swine influenza virus haemagglutinin structures: possible origin of influenza subtypes. EMBO J. 2002, 21 (5): 865-875. 10.1093/emboj/21.5.865.PubMed CentralView ArticlePubMed
- Russell RJ, Gamblin SJ, Haire LF, Stevens DJ, Xiao B, Ha Y, Skehel JJ: H1 and H7 influenza haemagglutinin structures extend a structural classification of haemagglutinin subtypes. Virology. 2004, 325 (2): 287-296. 10.1016/j.virol.2004.04.040.View ArticlePubMed
- Wang G, Dunbrack RL: PISCES: a protein sequence culling server. Bioinformatics. 2003, 19 (12): 1589-1591. 10.1093/bioinformatics/btg224.View ArticlePubMed
- Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC: MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010, 66 (Pt 1): 12-21.PubMed CentralView ArticlePubMed
- Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, et al: ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011, 487: 545-574.PubMed CentralView ArticlePubMed
- Squires B, Macken C, Garcia-Sastre A, Godbole S, Noronha J, Hunt V, Chang R, Larsen CN, Klem E, Biersack K, et al: BioHealthBase: informatics support in the elucidation of influenza virus hostpathogen interactions and virulence. Nucleic Acids Res. 2008, 36: D497-D503.PubMed CentralView ArticlePubMed
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMed
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2011, 39: D32-37. 10.1093/nar/gkq1079.PubMed CentralView ArticlePubMed
- Katoh K, Toh H: Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008, 9 (4): 286-298. 10.1093/bib/bbn013.View ArticlePubMed
- Cumulative number of confirmed human cases for avian influenza A (H5N1) reported to WHO. 2003, [http://www.who.int/influenza/human_animal_interface/H5N1_cumulative_table_archives/en/index.html], -2012
- Jiang S, Li R, Du L, Liu S: Roles of the hemagglutinin of influenza A virus in viral entry and development of antiviral therapeutics and vaccines. Protein Cell. 2010, 1 (4): 342-354. 10.1007/s13238-010-0054-6.View ArticlePubMed
- Stevens J, Donis RO: Influenza virus hemagglutinin-structural studies and their implications for the development of therapeutic approaches. Infect Disord Drug Targets. 2007, 7 (4): 329-335. 10.2174/187152607783018727.View ArticlePubMed
- Rungrotmongkol T, Yotmanee P, Nunthaboot N, Hannongbua S: Computational studies of influenza A virus at three important targets: hemagglutinin, neuraminidase and M2 protein. Curr Pharm Des. 2011, 17 (17): 1720-1739. 10.2174/138161211796355083.View ArticlePubMed
- Sheng ZM, Chertow DS, Ambroggio X, McCall S, Przygodzki RM, Cunningham RE, Maximova OA, Kash JC, Morens DM, Taubenberger JK: Autopsy series of 68 cases dying before and during the 1918 influenza pandemic peak. Proc Natl Acad Sci USA. 2011, 108 (39): 16416-16421. 10.1073/pnas.1111179108.PubMed CentralView ArticlePubMed
- Taubenberger JK, Morens DM: 1918 Influenza: the mother of all pandemics. Emerg Infect Dis. 2006, 12 (1): 15-22.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.