H2V, a database for human genes and proteins in response to SARS-CoV-2, SARS-CoV, and MERS-CoV infection

The ongoing COVID-19 pandemic in the world is caused by SARS-CoV-2, a new coronavirus rst discovered in the end of 2019. It has led to more than 50 million conrmed cases and more than 1 million deaths across 219 countries by 11 November 2020, according to the WHO statistics. SARS-CoV-2, SARS-CoV, and MERS-CoV are alike. They are highly pathogenic, and they threaten public health, impair economy, and inict long-term impacts on the society. No drug or vaccine has been approved as a cure for these viruses. The efforts to develop antiviral measures are hampered by the insucient understanding of how the human body responds to viral infections at the cellular and molecular levels. In this study, journal articles and transcriptomic and proteomic data that survey coronavirus infections were collected. Response genes and proteins were then identied via differential analyses which compared the gene/protein between the infected sample and control. A database, H2V, was nally created for human genes/proteins responding to SARS-CoV-2, SARS-CoV, and MERS-CoV infection. H2V provides the molecular information about human response to infection. It can be a powerful tool to discover cellular pathways and processes relevant for viral pathogenesis to identify potential drug targets. It is expected to speed up the process of developing antiviral agents and to shed light on the preparation for potential coronavirus emergency in the future.


Background
Coronaviruses are single-stranded RNA viruses, and some can cross the species barrier to cause deadly and infectious respiratory disease in humans [1]. A novel coronavirus that causes viral pneumonia was reported in December 2019 [2]. The virus, now known as SARS-CoV-2, is commonly asymptomatic and contagious prior to symptom onset [3]. These characteristics contribute to the di culty of containing the virus. As a result, SARS-CoV-2 spread rapidly in the world and caused the ongoing COVID-19 pandemic.
The last two coronavirus disease epidemics were severe acute respiratory syndrome (SARS) in [2002][2003] and Middle East respiratory syndrome (MERS) from 2012 [4]. With a case fatality rate of ~10%, the SARSrelated coronavirus (SARS-CoV) infected 8098 people and caused 774 deaths; the MERS-related coronavirus (MERS-CoV) has a higher mortality rate of ~34%, and it has resulted in ~2500 con rmed cases and ~900 deaths so far [5]. The average case fatality rate of COVID-19 is ~2%, though the risk of serious complications and mortality increases dramatically at later ages [6]. The death rate is < 0.1 in children, but it increases to 10% or more in older people [7]. In terms of the absolute number of cases and deaths, COVID-19 is more severe than the previous two outbreaks. As of 11 November 2020, > 50 million con rmed cases and > 1 million deaths have been reported to WHO (https:// www.who.int) worldwide. It is urgent for the world to unite to nd effective ways to bring the COVID-19 crisis to an end.
SARS-CoV-2, SARS-CoV and MERS-CoV are beta-coronaviruses and are able to cause serious health consequences in humans. Two other beta-coronaviruses, HCoV-OC43 and HKU1, are also able to infect human beings but only cause self-limiting u-like illness [8]. Even though the world has been repeatedly suffered from coronavirus outbreaks, there is no clinically effective prophylactics or therapeutics available. The clinical management of COVID-19, as well as SARS and MERS, is largely limited to infection prevention and supportive care. This ampli es the need to develop therapies to treat coronavirus diseases.
The life cycle of coronavirus includes several key steps: viral entry, genomic RNA replication, mRNA translation, protein processing, and virion assembly and release [9]. The interplay between host cell and virus at the viral entry stage has been well documented. To enter the human cell, both SARS-CoV-2 and SARS-CoV bind their S proteins to the cell surface receptor ACE2, angiotensin-converting enzyme 2 [10].
Hoffmann and colleagues have also proved that the binding of SARS-CoV-2 S protein to human ACE2 additionally depends on TMPRSS2 and showed that the cell entry of SARS-CoV-2 can be blocked by serine protease inhibitor camostat mesylate [11]. More details about the interplay between human and virus at other life cycle stages remain to be revealed. There is no doubt that the human body responds to virus infection and the response can be detected at the molecular level by genome-and proteome-wide measurements.
Although SARS-CoV suddenly disappeared in the Summer of 2003, MERS-CoV is occasionally saw and SARS-CoV-2 keeps spreading rapidly in some parts of the world. It is getting worse that the 2020 Winter wave of COVID-19 has forced new lockdowns in some European cities. To pull life back to normal track, speci c drugs for COVID-19 are urgently required but unavailable yet. Also, there is no cure for SARS and MERS, indicating our understanding of these dangerous coronaviruses is very limited. As long as the knowledge of cellular responses to viral infections is essential to establish therapeutics, we identi ed human proteins and genes that respond to SARS-CoV-2, SARS-CoV and MERS-CoV infections and then developed the H2V database in the present study.

Data collection
In the study, response proteins/genes of human to virus infection are de ned as differentially expressed genes (DEGs), proteins that participant in human-virus protein-protein interactions (PPIs), differentially expressed proteins (DEPs), differentially phosphorylated proteins (DPPs), differentially translated proteins (DTPs), differentially ubiquitinated proteins (DUPs), and disease severity associated proteins (SAPs).
We used the Bing search engine (https://www.bing.com), NCBI resources (https://www.ncbi.nlm.nih.gov/), and Proteome Xchange database http://www.proteomexchange.org/) to search studies of SARS-CoV-2, SARS-CoV, and MERS-CoV infection. With respect to the de nition of response gene/protein, studies were classi ed into types of DEG, PPI, DEP, DPP, DTP, DUP and SAP. For each study type, three independent studies per virus were selected. If the number of available studies was less than three, what we could nd would be used. Since we focused on the dynamic change of response gene/protein over time post infection, studies with time-course survey were selected with priority. Only if there were insu cient studies, would that without time-course examination be selected. After study selection, journal articles of the selected studies were collected, and then the information about response genes and proteins were extracted from the main text and supplementary materials of the article. When such information was not available in the journal article, raw data of the selected studies were downloaded from public repositories and subsequently analyzed. The selected studies ( [12][13][14][15][16][17][18][19][20][21][22][23][24]) and corresponding strategies to identify response genes and proteins were summarized in Table 1.

Utility And Discussion
Statistics of H2V data Due to the variation in the availability of studies, H2V datasets vary in viruses. As shown in Table 2, genes/proteins that respond to SARS-CoV-2 infection exist in seven datasets, namely DEGs, PPIs, DEPs, DPPs, DTPs, DUPs and SAPs. In comparison, genes/proteins that respond to SARS-CoV and MERS-CoV To know whether there are common proteins participating in different processes in response to SARS-CoV-2 infection, the intersection of DEPs, DPPs, DTPs and DUPs were analyzed. Figure 1a shows that both expression and translation of 11 proteins change dramatically upon infection, that both phosphorylation and ubiquitination of 180 proteins change remarkably upon infection, and that one protein experiences noticeable change in expression, phosphorylation, translation and ubiquitination. We then used Venn diagrams to analyze genes/proteins that are common in response to different virus infections. This would help to elucidate the fundamental mechanisms of viral pathogenesis. Figure 1b shows that 130 common genes encounter signi cant difference in expression upon infection. Figure 1c shows that 62 human proteins are able to interact with all of the three viruses.

Overview of H2V
As shown in Figure 2a, there is a navigation bar and a search box in the header on the web page. The search box accepts queries from the user and then tries to match anything that looks like a gene or protein. The navigation bar provides access to all resources in the database. The SARS2 drop-down menu links to response genes/proteins to SARS-CoV-2 infection. Similarly, the SARS1 and MERS drop-down menus link to response genes/proteins to SARS-CoV-1 and MERS-CoV infections, respectively. Under the Utilities drop-down menu, useful utilities, such as the link to download data from or upload data to H2V, are provided. On the page that lists response genes/proteins, the genes/proteins are shown in a table in rows, with additional information about the gene/protein shown in columns (Figure 2b). The Score column in the table indicates the reliability of the gene/protein, and the score was calculated as the number of studies in which the gene/protein was identi ed [29]. The genes/proteins in the table are clickable. Clicking on a gene/protein will link to another page showing details of how the gene/protein responds to virus infection. This page comes with two fantastic features: one is to examinate changes of the gene/protein at different timepoints post infection (Figure 2c), and the other is to discover known drugs that target the gene/protein. For PPIs, an embedded sequence viewer, as shown in Figure 2d, is provided for easy inspection of the gene/protein annotation in the viral genome. In addition, PPIs can also be visualized as an interaction network on the page (Figure 2e).

Application cases
To facilitate rapid drug discovery to treat COVID-19 at the pandemic time, H2V provides a drug nder which can be used to nd drugs of a given protein based on the UniProt accession number. The found drugs, with DrugBank identi ers, will then be displayed on the lower part of the same page. For example, searching Q9BYF1 will nd a few drugs, including Chloroquine and Hydroxychloroquine (Figure 3a).
To help users establish a concrete perception of how all genes/proteins change dynamically over time post infection, H2V provides a utility named Data animation for this purpose. On the page, a setting panel is provided to select data for animation. For example, Figure 3b shows the setting to animate DPPs in response to SARS-CoV-2 infection. The results (Figure 3c and 3d) of this example demonstrate that more human proteins are differentially phosphorylated at 24 h than at the very beginning after the infection of SARS-CoV-2. This indicates that the human body responds to SARS-CoV-2 infection by continuously rewiring cellular pathways.
H2V can be used to analyze integrated ndings from different studies. Figure 4 shows an example of using the Enrichment analysis utility to analyze enriched pathways of DPPs that respond to SARS-CoV-2 infection. DPPs identi ed in at least two studies were analyzed at rst (also referred to as analysis 1).
After setting parameters on the left in Figure 4a, the analysis was implemented by clicking the button at the bottom. When the analysis was completed, the input DPPs for analysis were listed on the right in Figure 4a, and the result was shown in Figure 4b. It shows that seven pathways were enriched, including FAS signaling pathway, p38 MAPK pathway, and PDGF signaling pathway. It is supposed that ndings repeated by independent studies would be more reliable than the unrepeatable ones, so the same analysis (also referred to as analysis 2) was performed for DPPs identi ed in at least one study. This time, more pathways were enriched, and the top seven pathways were shown in Figure 4c. The comparison shows that the top two pathways in analysis 1 were not in the top seven pathways of analysis 2. This indicates that the inclusion of DPPs of low con dence could distort the analysis result. H2V can be used to remove confounding factors to acquire reliable biological inferences.

Conclusions
We have developed H2V, the rst database for human proteins and genes that respond to SARS-CoV-2, SARS-CoV, and MERS-CoV infection. The database will help to understand the cellular details of how the human body responds to coronavirus infections. H2V can also be used as a platform to analyze rewired pathways by combining ndings of independent studies. This can be helpful to nd key targets with the potential to treat coronavirus diseases. We have to acknowledge that the present release of our database may omit some data which should be included, we will keep updating the database and offer missing data in future releases. In summary, the database will help to design effective and speci c drugs and preventive vaccines targeting SARS-CoV-2, SARS-CoV and MERS-CoV.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analyzed during this study are included in this published article.

Competing interests
The authors declare that they have no competing interests.  a: Response genes/proteins were extracted from the journal article. b: Response genes/proteins were identified from RNA-seq data using RaNA-seq, with p < 0.05 and |log2(fold change)| > 1 at any timepoint post infection. c: Response genes/proteins were identified from read counts from GEO using DESeq2, with p < 0.05 and |log2(fold change)| > 1 at any timepoint post infection. d: Response genes/proteins were identified from expression matrix from GEO using limma, with p < 0.05 and |log2(fold change)| > 1 at any timepoint post infection.
Table2. Statistics of data in H2V.