H2V: a database of human genes and proteins that respond to SARS-CoV-2, SARS-CoV, and MERS-CoV infection
BMC Bioinformatics volume 22, Article number: 18 (2021)
The ongoing global COVID-19 pandemic is caused by SARS-CoV-2, a novel coronavirus first discovered at the end of 2019. It has led to more than 50 million confirmed cases and more than 1 million deaths across 219 countries as of 11 November 2020, according to WHO statistics. SARS-CoV-2, SARS-CoV, and MERS-CoV are similar. They are highly pathogenic and threaten public health, impair the economy, and inflict long-term impacts on society. No drug or vaccine has been approved as a treatment for these viruses. Efforts to develop antiviral measures have been hampered by the insufficient understanding of how the human body responds to viral infections at the cellular and molecular levels.
In this study, journal articles and transcriptomic and proteomic data surveying coronavirus infections were collected. Response genes and proteins were then identified by differential analyses comparing gene/protein levels between infected and control samples. Finally, the H2V database was created to contain the human genes and proteins that respond to SARS-CoV-2, SARS-CoV, and MERS-CoV infection.
H2V provides molecular information about the human response to infection. It can be a powerful tool to discover cellular pathways and processes relevant for viral pathogenesis to identify potential drug targets. It is expected to accelerate the process of antiviral agent development and to inform preparations for potential future coronavirus-related emergencies. The database is available at: http://www.zhounan.org/h2v.
Coronaviruses are single-stranded RNA viruses, and some can cross species barriers to cause deadly and infectious respiratory disease in humans . A novel coronavirus that causes viral pneumonia was reported in December 2019 . The virus, now known as SARS-CoV-2, is commonly asymptomatic and contagious prior to symptom onset . These characteristics contribute to the difficulty in containing the virus. As a result, SARS-CoV-2 spread rapidly worldwide and caused the ongoing COVID-19 pandemic.
The two most recent coronavirus disease epidemics were severe acute respiratory syndrome (SARS) in 2002–2003 and Middle East respiratory syndrome (MERS) in 2012 . With a case fatality rate of ~ 10%, SARS-related coronavirus (SARS-CoV) infected 8098 people and caused 774 deaths; MERS-related coronavirus (MERS-CoV) has a higher mortality rate of ~ 34%, and it has resulted in ~ 2500 confirmed cases and ~ 900 deaths to date . The average case fatality rate of COVID-19 is ~ 2%, though the risk of serious complications and mortality increases dramatically at later ages . The mortality rate is < 0.1% in children but increases to 10% or higher in older people . In terms of the absolute number of cases and deaths, the COVID-19 pandemic is more severe than the previous two outbreaks. As of 11 November 2020, > 50 million confirmed cases and > 1 million deaths have been reported to the WHO (https://www.who.int) worldwide. It is urgent for the world to unite to find effective ways to bring the COVID-19 crisis to an end.
SARS-CoV-2, SARS-CoV and MERS-CoV are beta-coronaviruses that can cause serious health consequences in humans. Two other beta-coronaviruses, HCoV-OC43 and HKU1, can also infect humans but only cause self-limiting flu-like illness . Even though the world has repeatedly suffered from coronavirus outbreaks, there are no clinically effective prophylactics or therapeutics available. The clinical management of COVID-19, as well as SARS and MERS, is largely limited to infection prevention and supportive care. This highlights the need to develop therapies to treat coronavirus-related diseases.
The life cycle of coronavirus includes several key steps: viral entry, genomic RNA replication, mRNA translation, protein processing, and virion assembly and release . The interplay between host cells and viruses at the viral entry stage has been well documented. To enter human cells, both SARS-CoV-2 and SARS-CoV bind via their S proteins to the cell surface receptor angiotensin-converting enzyme 2 (ACE2) . MERS-CoV enters the human cell by binding another receptor, dipeptidyl peptidase 4 (DPP4) . Hoffmann and colleagues have also proven that the binding of the SARS-CoV-2 S protein to human ACE2 additionally depends on TMPRSS2 and have shown that cellular entry of SARS-CoV-2 can be blocked by the serine protease inhibitor camostat mesylate . More details about the interplay between humans and viruses at other viral life cycle stages remain to be elucidated. There is no doubt that the human body responds to viral infection, and the response can be detected at the molecular level by genome- and proteome-wide measurements.
Although SARS-CoV suddenly disappeared in the summer of 2003, MERS-CoV is occasionally observed, and SARS-CoV-2 continues to spread rapidly in some parts of the world. The spread of SARS-CoV-2 has worsened to the extent that the winter 2020 wave of COVID-19 has forced new lockdowns in some European cities. For normal life to resume, specific drugs against COVID-19 are urgently required but remain unavailable. Additionally, there is no cure for SARS and MERS, indicating that our understanding of these dangerous coronaviruses is very limited. Given that knowledge of cellular responses to viral infections is essential for establishing therapeutics, we identified human proteins and genes that respond to SARS-CoV-2, SARS-CoV and MERS-CoV infections and subsequently developed the H2V database in the present study.
Construction and content
In this study, human proteins/genes responding to viral infections were defined as differentially expressed genes (DEGs), proteins that participate in human-virus protein–protein interactions (PPIs), differentially expressed proteins (DEPs), differentially phosphorylated proteins (DPPs), differentially translated proteins (DTPs), differentially ubiquitinated proteins (DUPs), and disease severity associated proteins (SAPs).
We used the Bing search engine (https://www.bing.com), NCBI resources (https://www.ncbi.nlm.nih.gov/), and Proteome Xchange database http://www.proteomexchange.org/) to search for studies of SARS-CoV-2, SARS-CoV, and MERS-CoV infection. Based on the definition of response genes/proteins, the studies were classified as DEG, PPI, DEP, DPP, DTP, DUP and SAP. For each study type, three independent studies per virus were selected. If the number of available studies was less than three, any identified sources were used. Since we focused on dynamic changes in response genes/proteins over time post infection, studies reporting time-course surveys were selected as the highest priority. Only in cases of insufficient study numbers were studies without time-course examinations selected. After study selection, the journal articles reporting the selected studies were retrieved, and information about gene and protein responses was extracted from the main text and supplementary material of each article. When such information was not available in the journal article, raw data from the selected studies were downloaded from public repositories and subsequently analyzed. The selected studies ( [12,13,14,15,16,17,18,19,20,21,22,23,24]) and corresponding strategies to identify response genes and proteins are summarized in Table 1.
Genome assemblies MN985325.1, NC_004718.3 and NC_019843.3 from the NCBI database (https://www.ncbi.nlm.nih.gov/) were used to annotate SARS-CoV-2, SARS-CoV and MERS-CoV genes, respectively. Drug information was collected from the DrugBank database . Postprocessing of data was performed using R (https://www.r-project.org/) and Python (https://python.org/).
Utility and discussion
Statistics of H2V data
Due to the variation in the availability of studies, the H2V datasets vary among the three viruses. As shown in Table 2, seven datasets of genes/proteins that respond to SARS-CoV-2 infection are available, namely, DEGs, PPIs, DEPs, DPPs, DTPs, DUPs and SAPs. In comparison, only three (DEGs, PPIs and DEPs) and two (DEGs and PPIs) datasets of genes/proteins that respond to SARS-CoV and MERS-CoV infections, respectively, are available. DEGs datasets are available for the response to infections with all three viruses. A total of 9321 human genes responded to MERS-CoV infection, while fewer genes (2249) responded to SARS-CoV infection and even fewer (1395) to SARS-CoV-2 infection. PPIs datasets are also available for the response to infections with all three viruses. There are 1581, 1150, and 296 interaction pairs of human and corresponding SARS-CoV-2, SARS-CoV and MERS-CoV proteins. DEPs datasets are available for the response to SARS-CoV-2 and SARS-CoV infections and include 253 and 66 human proteins, respectively, that responded to the infections. DPPs, DTPs, DUPs and SAPs datasets are only available for the response to SARS-CoV-2 infection, and include 2198 (5046 phosphorylation sites), 232, 516 (730 ubiquitination sites) and 610 response proteins, respectively.
To determine whether common proteins participate in different processes in response to SARS-CoV-2 infection, the intersection of DEPs, DPPs, DTPs and DUPs was analyzed. Figure 1a shows that both expression and translation of 11 proteins changed dramatically upon infection, that both phosphorylation and ubiquitination of 180 proteins changed remarkably upon infection and that one protein underwent noticeable changes in expression, phosphorylation, translation and ubiquitination. We then used Venn diagrams to analyze genes/proteins that are common across responses to different viral infections. This would help to elucidate the fundamental mechanisms of viral pathogenesis. Figure 1b shows that 130 common genes exhibited significant differences in expression upon infection. Figure 1c shows that 62 human proteins could interact with all three viruses.
Overview of H2V
As shown in Fig. 2a, the web page header contains a navigation bar and a search box. The search box accepts queries from the user and tries to match anything that resembles a gene or protein. The navigation bar provides access to all resources in the database. The “SARS2” drop-down menu is linked to the SARS-CoV-2 infection response genes/proteins. Similarly, the “SARS1” and “MERS” drop-down menus link to the SARS-CoV-1 and MERS-CoV infection response genes/proteins, respectively. Under the “Utilities” drop-down menu, useful utilities, including a link to download data from or upload data to H2V, are provided. On the page listing the response genes/proteins, the genes/proteins are shown within rows of a table, with additional information about the gene/protein shown in columns (Fig. 2b). The “Score” column in the table indicates the reliability of the gene/protein, calculated as the number of studies in which the gene/protein was identified . The genes/proteins in the table are clickable. Clicking on a gene/protein will link to another page showing details of how the gene/protein responds to viral infection. This page includes two helpful features: one is to examine changes in the gene/protein at different timepoints post infection (Fig. 2c), and the other is to discover known drugs that target the gene/protein. For PPIs, an embedded sequence viewer, as shown in Fig. 2d, is provided for easy inspection of the gene/protein annotation in the viral genome. In addition, PPIs can also be visualized as an interaction network on the page (Fig. 2e).
To facilitate rapid drug discovery for the treatment of COVID-19 during the pandemic, H2V provides a drug finder that can be used to identify drugs that target a given protein based on the UniProt accession number. The found drugs and their DrugBank identifiers will then be displayed on the lower part of the same page. For example, a search for Q9BYF1 will identify a few drugs, including chloroquine and hydroxychloroquine (Fig. 3a).
To help users establish a concrete perception of how all genes/proteins change dynamically over time post infection, H2V provides a utility called “Data animation”. On the page, a settings panel is provided to select data for animation. For example, Fig. 3b shows the setting to animate DPPs in response to SARS-CoV-2 infection. The results (Fig. 3c, d) of this example demonstrate that more human proteins are differentially phosphorylated at 24 h than immediately after SARS-CoV-2 infection. This indicates that the human body responds to SARS-CoV-2 infection by continuously rewiring cellular pathways.
H2V can be used to analyze integrated findings from different studies. Figure 4 shows an example of using the “Enrichment” analysis utility to analyze enriched pathways of DPPs that respond to SARS-CoV-2 infection. DPPs identified in at least two studies were analyzed first (also referred to as analysis 1). After setting the parameters on the left in Fig. 4a, the analysis was implemented by clicking the button at the bottom. Based on the completed analysis, the input DPPs for analysis are listed on the right in Fig. 4a, and the result is shown in Fig. 4b. Seven pathways were enriched, including the FAS signaling pathway, p38 MAPK pathway, and PDGF signaling pathway. Findings repeated in independent studies are expected to be more reliable than those from a single study, so the same analysis (referred to as analysis 2) was performed for DPPs identified in at least one study. This time, more pathways were enriched, and the top seven pathways are shown in Fig. 4c. The comparison shows that the top two pathways identified in analysis 1 were not among the top seven pathways identified in analysis 2. This indicates that the inclusion of DPPs of low confidence could distort the analysis result. H2V can be used to remove confounding factors to acquire reliable biological inferences.
We have developed H2V as the first database of human proteins and genes that respond to SARS-CoV-2, SARS-CoV, and MERS-CoV infection. The database will help to understand the cellular details of how the human body responds to coronavirus infections. H2V can also be used as a platform to analyze rewired pathways by combining the findings from independent studies. This can be helpful to identify key targets with potential to treat coronavirus diseases. We acknowledge that the present release of our database may omit some data that should be included, and we will continue to update the database and provide the missing data in future releases. In summary, the database will help to design effective and specific therapeutics and preventive vaccines targeting SARS-CoV-2, SARS-CoV and MERS-CoV.
Availability of data and materials
All data generated or analyzed during this study are included in this published article.
Severe acute respiratory syndrome coronavirus 2
Severe acute respiratory syndrome coronavirus
Middle East respiratory syndrome-related coronavirus
Coronavirus disease 2019
World Health Organization
Angiotensin-converting enzyme 2
Dipeptidyl peptidase 4
Differentially expressed genes
Differentially expressed proteins
Differentially phosphorylated proteins
Differentially translated proteins
Severity associated proteins
National center for biotechnology information
Hypertext markup language
Cascading style sheets
Representational state transfer
Application programming interface
Weiss SR, Navas-Martin S. Coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus. Microbiol Mol Biol Rev. 2005;69:635–64. https://doi.org/10.1128/MMBR.69.4.635-664.2005.
Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382:727–33. https://doi.org/10.1056/NEJMoa2001017.
Bai Y, Yao L, Wei T, Tian F, Jin D-Y, Chen L, et al. Presumed asymptomatic carrier transmission of COVID-19. JAMA. 2020;323:1406–7. https://doi.org/10.1001/jama.2020.2565.
Zhou N, Zhang Y, Zhang J-C, Feng L, Bao J-K. The receptor binding domain of MERS-CoV: the dawn of vaccine and treatment development. J Formos Med Assoc. 2014;113:143–7. https://doi.org/10.1016/j.jfma.2013.11.006.
Walls AC, Park Y-J, Tortorici MA, Wall A, McGuire AT, Veesler D. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 2020;181(281–292):e6. https://doi.org/10.1016/j.cell.2020.02.058.
Promislow DEL. A Geroscience perspective on COVID-19 mortality. J Gerontol Ser A. 2020;75:e30–3. https://doi.org/10.1093/gerona/glaa094.
Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet Infect Dis. 2020;20:669–77. https://doi.org/10.1016/S1473-3099(20)30243-7.
Liu DX, Liang JQ, Fung TS. Human coronavirus-229E, -OC43, -NL63, and -HKU1. Ref Modul Life Sci. 2020;1:1. https://doi.org/10.1016/B978-0-12-809633-8.21501-X.
Dai L, Zheng T, Xu K, Han Y, Xu L, Huang E, et al. A universal design of betacoronavirus vaccines against COVID-19, MERS, and SARS. Cell. 2020. https://doi.org/10.1016/j.cell.2020.06.035.
Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–3. https://doi.org/10.1038/s41586-020-2012-7.
Hoffmann M, Kleine-Weber H, Schroeder S, Krüger N, Herrler T, Erichsen S, et al. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell. 2020;181:271-280.e8. https://doi.org/10.1016/j.cell.2020.02.052.
Stukalov A, Girault V, Grass V, Bergant V, Karayel O, Urban C, et al. Multi-level proteomics reveals host-perturbation strategies of SARS-CoV-2 and SARS-CoV. bioRxiv. 2020;1:1. https://doi.org/10.1101/2020.06.17.156455.
Lamers MM, Beumer J, van der Vaart J, Knoops K, Puschhof J, Breugem TI, et al. SARS-CoV-2 productively infects human gut enterocytes. Science. 2020;369:50–4. https://doi.org/10.1126/science.abc1669.
Yoshikawa T, Hill TE, Yoshikawa N, Popov VL, Galindo CL, Garner HR, et al. Dynamic innate immune responses of human bronchial epithelial cells to severe acute respiratory syndrome-associated coronavirus infection. PLoS ONE. 2010;5:e8729. https://doi.org/10.1371/journal.pone.0008729.
Jiang X-S, Tang L-Y, Dai J, Zhou H, Li S-J, Xia Q-C, et al. Quantitative analysis of severe acute respiratory syndrome (SARS)-associated coronavirus-infected cells using proteomic approaches. Mol Cell Proteomics. 2005;4:902–13. https://doi.org/10.1074/mcp.M400112-MCP200.
Zhang X, Chu H, Wen L, Shuai H, Yang D, Wang Y, et al. Competing endogenous RNA network profiling reveals novel host dependency factors required for MERS-CoV propagation. Emerg Microbes Infect. 2020;9:733–46. https://doi.org/10.1080/22221751.2020.1738277.
Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020. https://doi.org/10.1038/s41586-020-2286-9.
Gordon DE, Hiatt J, Bouhaddou M, Rezelj VV, Ulferts S, Braberg H, et al. Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms. Science. 2020. https://doi.org/10.1126/science.abe9403.
Bojkova D, Klann K, Koch B, Widera M, Krause D, Ciesek S, et al. Proteomics of SARS-CoV-2-infected host cells reveals therapy targets. Nature. 2020. https://doi.org/10.1038/s41586-020-2332-7.
Bouhaddou M, Memon D, Meyer B, White KM, Rezelj VV, Marrero MC, et al. The global phosphorylation landscape of SARS-CoV-2 infection. Cell. 2020. https://doi.org/10.1016/j.cell.2020.06.034.
Klann K, Bojkova D, Tascher G, Ciesek S, Münch C, Cinatl J. Growth factor receptor signaling inhibition prevents SARS-CoV-2 replication. Mol Cell. 2020;80(164–174):e4. https://doi.org/10.1016/j.molcel.2020.08.006.
Shen B, Yi X, Sun Y, Bi X, Du J, Zhang C, et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell. 2020. https://doi.org/10.1016/j.cell.2020.05.032.
Li Y, Wang Y, Liu H, Sun W, Ding B, Zhao Y, et al. Urine proteome of COVID-19 patients. medRxiv. 2020;1:1. https://doi.org/10.1101/2020.05.02.20088666.
Mitchell HD, Eisfeld AJ, Sims AC, McDermott JE, Matzke MM, Webb-Robertson B-JM, et al. A network integration approach to predict conserved regulators related to pathogenicity of influenza and SARS-CoV respiratory viruses. PLoS ONE. 2013;8:e69374. https://doi.org/10.1371/journal.pone.0069374.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017;46:D1074–82. https://doi.org/10.1093/nar/gkx1037.
Franz M, Lopes CT, Huck G, Dong Y, Sumer O, Bader GD. Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics. 2015;32:309–11.
Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 2019;47:D419–26. https://doi.org/10.1093/nar/gky1038.
Consortium TU. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2018;47:D506–15. https://doi.org/10.1093/nar/gky1049.
Zhou N, Bao J. FerrDb: a manually curated resource for regulators and markers of ferroptosis and ferroptosis-disease associations. Database (Oxford). 2020. https://doi.org/10.1093/database/baaa021.
We thank the authors who generated the raw data that have been used in this study.
This work was supported by: Basic and Applied Basic Research of Guangzhou Municipal Basic Research Plan; Guangzhou Municipal Psychiatric Disease Clinical Transformation Laboratory ; Guangzhou Municipal Key Discipline in Medicine (2017–2019); Key Laboratory for Innovation Platform Plan, Science and Technology Program of Guangzhou, China.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhou, N., Bao, J. & Ning, Y. H2V: a database of human genes and proteins that respond to SARS-CoV-2, SARS-CoV, and MERS-CoV infection. BMC Bioinformatics 22, 18 (2021). https://doi.org/10.1186/s12859-020-03935-2