H2V, a Database for Human Proteins and Genes in Response to SARS-CoV-2, SARS-CoV, and MERS- CoV Infection


 The ongoing COVID-19 pandemic in the world is caused by SARS-CoV-2, a new coronavirus firstly discovered in the end of 2019. It has led to more than 10 million confirmed cases and more than 500,000 confirmed deaths across 216 countries by 1 July 2020, according to WHO statistics. SARS-CoV-2, SARS-CoV, and MERS-CoV are alike, killing people, impairing economy, and inflicting long-term impacts on the society. However, no specific drug or vaccine has been approved as a cure for these viruses. The efforts to develop antiviral measures are hampered by insufficient understanding of molecular responses of human to viral infections. In this study, we collected experimentally validated human proteins that interact with SARS-CoV-2 proteins, human proteins whose expression, translation and phosphorylation levels experience significantly changes after SARS-CoV-2 or SARS-CoV infection, human proteins that correlate with COVID-19 severity, and human genes whose expression levels significantly changed upon SARS-CoV-2 or MERS-CoV infection. A database, H2V, was then developed for easy access to these data. Currently H2V includes: 332 human-SARS-CoV-2 protein-protein interactions; 65 differentially expressed proteins, 232 differentially translated proteins, 1298 differentially phosphorylated proteins, 204 severity associated proteins, and 4012 differentially expressed genes responding to SARS-CoV-2 infection; 66 differentially expressed proteins responding to SARS-CoV infection; and 6981 differentially expressed genes responding to MERS-CoV infection. H2V can help to understand the cellular responses associated with SARS-CoV-2, SARS-CoV and MERS-CoV infection. It is expected to speed up the development of antiviral agents and shed light on the preparation for potential coronavirus emergency in the future.Database url: http://www.zhounan.org/h2v

threatening. Even though the world has been repeatedly suffered from coronavirus outbreaks, there is no clinically effective prophylactics or therapeutics available.
The life cycle of SARS-CoV-2, SARS-CoV, and MERS-CoV includes several key steps: viral entry, genomic RNA replication, mRNA translation, protein processing, and virion assembly and release [6]. The interplay between human and the viruses at the viral entry stage has been well documented. To enter the human cell, both SARS-CoV-2 and SARS-CoV bind their S proteins to the cell surface receptor ACE2, angiotensinconverting enzyme 2 [7]. MERS-CoV enters the human cell via binding another receptor, dipeptidyl peptidase 4 (DPP4) [3]. Hoffmann and colleagues have also proved that the binding of SARS-CoV-2 S protein to human ACE2 additionally depends on TMPRSS2 and showed that the cell entry of SARS-CoV-2 can be blocked by serine protease inhibitor camostat mesylate [8]. More details about the interplay between human and other activities regarding the virus life cycle remain to be revealed. There is no doubt that the human body must respond to virus infection and the response can be detected on the molecular level by genome-and proteome-wide measurements.
SARS-CoV-2 keeps spreading rapidly in some parts of the world. To pull life back to normal track, speci c drugs with clinical e cacy that target SARS-CoV-2 are urgently required but unavailable yet. There is also no cure for SARS and MERS, indicating our understanding of these dangerous coronaviruses is very limited. As long as knowledge of cellular responses to viral infections would help to combat the COVID-19 pandemic, we developed H2V, a database for human proteins and genes that respond to SARS-CoV-2, SARS-CoV and MERS-CoV infection in the present study.

Data collection
In the study, human proteins/genes that respond to viral infections are de ned as interactors of humanvirus protein-protein interactions (PPIs), differentially expressed genes (DEGs), differentially expressed proteins (DEPs), differentially translated proteins (DTPs), differentially phosphorylated proteins (DPPs), and severity associated proteins (SAPs). The source data for the discovery of response proteins and genes were collected from published studies.
High-con dence human-SARS-CoV-2 protein-protein interactions (PPIs) were collected from Gordon and colleagues' research [9]. In their study, physical PPIs were measured by a nity-puri cation mass spectrometry. The collected data was used in this study without any post-processing.
DEPs are proteins whose abundance undergo signi cantly change after virus infection. DTPs are proteins who experience signi cant translation changes upon virus infection. Translatome and proteome proteomics data in human cells at 2, 6, 10, and 24 hours post SARS-CoV-2 infection were collected from BojKova and colleagues' study [10]. Proteins were selected as signi cantly changed if they have a fold change (infection vs control) of > 2 and a p value of < 0.05 at any of the time points.
DEGs are genes whose expression levels change signi cantly after virus infection. DEGs in response to SARS-CoV-2 infection were collected from Blanco-Melo and colleagues' study [11]. Within the downloaded data, missing records were removed and genes with a fold change (infection vs control) of > 2 and a p value of < 0.05 were chosen as DEGs.
SAPs are proteins that can differentiate critical COVID-19 cases from moderate cases or healthy cohorts. Original data was from Shen and colleagues' work [12]. From the collected data, SAPs were chosen as proteins whose expression levels changed > 2 folds (infection vs control) with a p value of < 0.05.
Phosphorylation changes of kinases re ect the signaling pathways that virus relies on to survive in the human body. Phosphorylation dynamics data in Vero E6 cells at 0, 2, 4, 8, 12, and 24 hours after SARS-CoV-2 infection was collected from Bouhaddou and colleagues' study [13]. Since Vero E6 cells are from monkey, the original authors had mapped phosphorylation sites and protein identi ers to their respective human protein orthologs. From the downloaded data, DPPs were selected as proteins with a fold change (infection vs control) of > 2 and a p value of < 0.05 at any time.
DEPs in response to SRAR-CoV infection were collected from Jiang and colleagues' article [14]. The data was used in our study without any post processing.
RNA-seq data at 6 and 24 hours after MERS-CoV infection in Calu-3 cells was collected from Zhang and colleagues' research [15]. Gene expression was measured by Salmon in RaNA-seq and DEGs were detected by DESeq2 in RaNA-seq [16][17][18]. DEGs were selected with a fold change (infection vs control) of > 2 and a p value of < 0.05 at any of the two sampling times.
Genome assembly MN985325.1 from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) was used to annotate SARS-CoV-2 genes. Drugs that target H2V proteins were collected from the DrugBank database [19]. Post processing of data was performed by R (https://www.r-project.org/). Implementation H2V was developed with mainstream web developing techniques. The user interface was developed with HTML5, CSS3, and JavaScript. Bootstrap v4 (https://getbootstrap.com/) was used for layout design. DataTables (https://datatables.net/) was used to organize data in table on the web page. Cytoscape.js was used for network visualization of PPIs [20]. Plotly (https://plotly.com/) was used to create interactive plots. PHP (https://www.php.net/), python (https://www.python.org), and bash scripts were used for server-side development. Data is managed by SQLite (https://www.sqlite.org/). NCBI's sequence viewer (https://www.ncbi.nlm.nih.gov/projects/sviewer/) was embedded in H2V, for easily browsing SARS-CoV-2 gene information within our database. Drug information is not stored in H2V, instead it is automatically retrieved on request from the DrugBank database via UniProt's REST API [21]. H2V is deployed in an Amazon AWS host running Ubuntu 16.04. Among the human proteins that respond to SARS-CoV-2 infection (Figure 1a): 10 proteins intersect between DEPs and DTPs; 3 proteins intersect between DEPs and SAPs; 6 proteins intersect between DEPs and DPPs; 5 proteins intersect between DTPs and SAPs; 26 proteins intersect between DTPs and DPPs; 5 proteins intersect between SAPs and DPPs; DEPs, DTPs, and SAPs share 1 protein; DEPs, SAPs, and DPPs share 1 protein as well. There is only 1 common protein between DEPs in response to SARS-CoV-2 and SARS-CoV infection (Figure 1b). The set of intersecting genes is larger than that of intersecting proteins. As shown in Figure 1c, there are 1497 intersecting genes between DEGs in response to SARS-

CoV-2 and MERS-CoV infection.
Proteins/genes responding to SARS-CoV-2 The "SARS2" drop-down menu in the navigation bar of the website provides links to human proteins and genes that respond to SARS-CoV-2 infection. PPIs are shown in a table or as a network (Figure 2a-b). In the table of PPIs (Figure 2a), human-SARS-CoV-2 PPIs are put in rows, with other protein information shown in columns. For SARS-CoV-2 proteins, the corresponding genes are put in the "SARS2 gene" column. By clicking on a SARS-CoV-2 gene, its annotation will be shown in the embedded NCBI sequence viewer on the same web page (Figure 2c). UniProt identi ers, protein names, gene names, and HGNC identi ers of human proteins are also displayed in the PPIs table. The external links to the UniProt database and HGNC database can help users to study the proteins of their interests. The "Drug" column in the PPIs table can be used to retrieve drugs of a protein target from the DrugBank database. This feature would facilitate rapid discovery of candidate antiviral drugs. DEPs, DTPs, and DPPs are listed in a table in a similar format, so DPPs are taken as an example and demonstrated in Figure 2d. Compared to the PPIs table, SARS-CoV-2 columns are removed but a "Temporal pro le" column is added. Clicking on the "Show" button of a protein in this column will display a plot showing how the protein phosphorylation changes over time and when the change reaches signi cance (Figure 2e). DEPs and DTPs also have a similar pro le to show how the expression or translation of a protein dynamically changes over time, but only the DPPs pro le is shown as an example. Because time series data is not available for SAPs, the log2 fold changes and p values are put in columns "log2FC" and "P-value" (Figure 2f). DEGs are also shown in a table, with links to the HGNC database, log2 fold changes, and p values put in columns (Figure 2g).

Proteins responding to SARS-CoV
All of the collected human proteins that respond to SARS-CoV infection are DEPs. The DEPs table can be accessed via the "DEPs" link on the "SARS1" drop-down menu in the navigation bar of the website. As can be seen from Figure 3, DEPs are listed in table rows, and the correlated information is listed in table columns.

Genes responding to MERS-CoV
All of the collected human genes that respond to MERS-CoV infection are DEGs. The DEGs table can be accessed by clicking on the "DEGs" link on the "MERS-CoV" drop-down menu in the navigation bar of the website. In the table, gene names, HGNC identi ers, and buttons to temporal pro les are given (Figure 4a).
Users can click on the "Show" button of a DEG to see how its expression changes with time after MERS-CoV infection (Figure 4b).
Application case H2V provides two easy-to-use utilities, one is "Drug nder" and the other is "Data animation". The former one facilitates to nd drugs that can target a given protein based on its UniProt accession number. If drug exists, drug name and DrugBank identi er will be displayed below the utility on the same page. For example, searching 9BYF1 will nd drugs moexipril and SPP1148 (Figure 5a).
The data animation utility provides a concrete perception of the proteomic dynamics over a period (Figure   5b-5g). Compared to 2 h after SARS-CoV-2 infection, the number of DEPs and DTPs increased at 24 h after virus infection. The number of DPPs at 24 h is also larger than that at 0 h after SARS-CoV-2 infection. These trends give us a clear picture that the human body responds to SARS-CoV-2 infection by continuously activating protein biogenesis.

Conclusions
We have developed the rst database for human proteins and genes that respond to SARS-CoV-2, SARS-CoV, and MERS-CoV infection. The database will help to understand the cellular details of how the human body responds to the coronavirus infection. We have to acknowledge that the present release of our database may omit some data which should be included, we will keep updating the database and offer missing data in future releases. Apart from providing knowledge about human response to viral infections, another key feature of the database is that drugs of any protein of interest can be found with ease. This provides valuable hints on drug development. In summary, the database will help to design effective and speci c drugs and preventive vaccines targeting SARS-CoV-2, SARS-CoV, and MERS-CoV.

Declarations
Ethics approval and consent to participate Not applicable.