Classification of viral zoonosis through receptor pattern analysis

Background Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environmental factors. Viruses can be categorized according to genotype (ssDNA, dsDNA, ssRNA and dsRNA viruses). Among them, the RNA viruses exhibit particularly high mutation rates and are especially problematic for this reason. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species. In this study, we sought to predict zoonotic propensity through the analysis of receptor characteristics. We hypothesized that the major barrier to interspecies virus transmission is that receptor sequences vary among species--in other words, that the specific amino acid sequence of the receptor determines the ability of the viral envelope protein to attach to the cell. Results We analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species and used a statistical discriminant analysis to predict the likelihood of transmission among species. Conclusions This study is an attempt to predict zoonosis through simple computational analysis of receptor sequence differences. Our method may be useful in predicting the zoonotic potential of newly discovered viral strains.


Background
Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins [1]. Zoonosis can occur not only through direct transmission, but also through intermediate reservoirs or other environmental factors [2][3][4]. The zoonotic viruses can be categorized according to genotype; of the various classes of viruses, the RNA viruses exhibit the highest mutation rates [5]. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species [6,7]. The high mutation rate of envelope proteins [5] hinders the development of accurate vaccines, as does the great ability of the RNA viruses to infect host species in order to exploit host proteins for viral reproduction [8].
Lacking the ability to self-replicate, viruses must utilize the replication apparatus of their host cells [9]. Viral infection of a cell begins with attachment of the virus to the cell surface [6,10,11]. During attachment to the cell membrane, the viral envelope protein (a structural protein) interacts with the host-cell receptor protein(s) [12]. In non-envelope viruses, the capsid plays this role. The cell receptors that play a major role in viral attachment are predominantly membrane proteins of the immunoglobin superfamily [13][14][15]. The identification of virusbinding cellular receptors was rapidly accelerated in the late 1980s owing to developments in the use of monoclonal antibodies and molecular cloning techniques [15]. The various receptors that have been found are surface matrix structures containing carbohydrate, lipid, and protein moieties [1,16,17]. In some cases, viral attachment also exploits co-receptors. For example, HIV, which uses the CD4 molecule as its receptor, uses the CXCR4 and CCR5 co-receptors to strengthen the effectiveness of infection [1,14,18,19]. Similarly, hepatitis C virus utilizes CD81 as a receptor and LDLR as a coreceptor [20].
Since the host-cell range of a specific virus is predetermined by its ability to recognize specific receptors, the similarities between the receptors of its primary reservoir host cell and the potential human host cell play a major role in determining the likelihood of viral zoonosis. Here, we analysed zoonotic and non-zoonotic RNA viruses along with their cellular receptors in human and (non-human) primary reservoir species to extract the receptor characteristics common to zoonosis. Viruses not previously reported to infect humans were classified as non-zoonotic viruses. We excluded all viruses known to utilize co-receptors; i.e., only virus-receptor interactions occurring through virus tropism and pathogenesis were considered [5,21]. The receptors and viruses examined in this study are listed in Table 1. We hypothesized that the major barrier to the transmission of viruses between species is the difference in cellular receptor sequences. In other words, the specific amino acid sequence of the receptor should be the major determinant of the ability of the viral envelope protein to attach to the cell. Ordinary sequence alignment protocol tells us overall sequence similarity which we thought useful but insufficient because most receptors are membrane proteins and membrane proteins consist of distinctive hydrophobic and hydrophilic parts. Therefore, we analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species to predict the likelihood of transmission across species, including humans. To our best knowledge, this study is the first attempt to predict zoonosis through a simple analysis of receptor sequence similarities and differences. This method may be useful in predicting the zoonotic potential of newly discovered viral strains.

Results and Discussion
The pair-wise receptor sequence similarities ( g S i,1 , g S i,2 , and g S i,3 ) between host-species pairs for each virus family are shown in Table 1. For logical comparisons, each virus contains at least one infected host (the primary reservoir, designated as "#" in Table 1). As shown in Table 1, the similarity scores for the infected group (g = 1) were high, ranging from 0.790 to 0.988 for 1 S i,1 , from 0.841 to 0.996 for 1 S i,2 , and 0.794 to 0.962 for 1 S i,3 . All pair-wise comparisons in group 1 (human vs. primary reservoir, primary reservoir vs. host, and human vs. host) yielded high similarity scores, indicating a high similarity among receptor sequences. The similarity scores were comparatively low in the non-infection group (g = 2), ranging from 0.092 to 0.440 for 2 S i,1 , from 0.108 to 0.432 for 2 S i,2 , and from 0.130 to 0.416 for 2 S i,3 . For group 2, both the primary host species and non-infected species are listed to illustrate the differences in similarity. In pair-wise comparisons, all the non-infection cases yielded low similarity values, i.e., the receptor sequences differed significantly from each other.
We assume that a low similarity in receptor sequences disfavors infection despite the existence of a common receptor. For example, enterovirus infects only Sus scrofa (pig); it does not infect Rattus norvegicus (rat) or Homo sapiens (human) because of the high transmission barrier. Similarly, for leukovirus, only Gallus gallus (chicken) is infected as a primary reservoir; because of the high transmission barrier, R. norvegicus and H. sapiens are not infected. These results imply that for non-infection cases, species barriers exist, and the propensity to cross the barrier is determined by the sequence similarity between the potential and primary host receptors.
Similarity scores for rabies virus were low between Canis lupus familiaris (domestic dog) and Bos Taurus (domestic cow) ( 2 S i,1 = 0.280, 2 S i,2 = 0.373, and 2 S i,3 = 0.366) and also between B. taurus and H. sapiens ( 2 S i,1 = 0.267, 2 S i,2 = 0.371, and 2 S i,3 = 0.416) but were high between C. l. familiaris and H. sapiens ( 1 S i,1 = 0.947, 1 S i,2 = 0.985, and 1 S i,3 = 0.962). Clearly, C. l. familiaris is the primary reservoir, and transmission of the disease to H. sapiens is possible only because of the high human/ dog receptor similarity. Thus, for particular viruses, transmission of disease may be species-selective, although common receptors exist among species. Furthermore, infection specificity may be determined by the species barrier, which results from receptor differences.
The values in Table 1 are plotted in Figure 1 to illustrate the differences among groups. The x-and y-axes denote g S i,1 and g S i,2 , respectively, where "g" is the group classification. All pair-wise similarity scores are shown. Groups 1, 2 and 3 are each well separated in the colour-coded two-dimensional space. The results provide clear evidence that the receptor sequences from cases of cross-species infection are well separated from those of other infection cases. From these observations, we conclude that receptor differences are a major contributing factor to the potential of a specific viral strain to cross species barriers for transmission. In other words, the species dependence of infection is indirectly related to the receptor sequence similarity. This finding implies that once the receptor sequences of the primary reservoir and possible hosts are known, we might be able to predict the likelihood of viral disease transmission. The accuracy of these classifications can be judged Figure 1 Similarity scores of among groups. Three kinds of pairwise similarity scores ( g S i,1 , g S i,2 , g S i,3 ) are plotted in two dimensional space to show clear differences among groups. Groups 1, 2 and 3 are each well separated; the results show clearly that the receptor sequences from cases of cross-species infection are well distinguished from those of other infection cases.
by subsequent assessment of cases of actual zoonotic transmission to humans.
Our analysis revealed significant differences in receptor similarity between infection and non-infection cases.
The similarity values, and the experimentally determined group categories were fed into a statistical discriminant analysis to logically predict infection (or zoonosis, in the case of human infection). As described in the Materials and Methods section, the values D i 2 (i = 1, 2, 3) were calculated from the data in the Table 1 to yield results of a specific discriminant analysis.
The statistical discriminant analysis was verified using a test set of four viruses that were deliberately excluded from the training set. The viruses whose groups were predicted using the discriminant analysis are shown in Table 2. The first virus, feline immunodeficiency virus (FIV), uses Felis catus (domestic cat) as its primary host and CD4 as its receptor. According to the literature [22,23], FIV infection of humans is rare but has been reported. Our method categorized this case as nearinfection (G = 3). The second virus, classical swine fever virus, is known to be non-zoonotic and was classified as such by our method (G = 2). Thirdly, the encephalomyocarditis virus infects S. scrofa but has been known to cause sporadic infections in H. sapiens; it was classified as group 1 (G = 1) by our method. Finally, the Lass virus is known to be zoonotic and was classified as group 1 (G = 1) by our method.
In Table 2, the hydrophilic similarity scores (S 1 ) show less consistency, comparing to the hydrophobic scores (S2), with the predictive values (G). From the result, it could be said that the hydrophobic characteristics of receptor sequence might be the key contributor to the prediction. However, this observation should only be carefully interpreted because the variables (S1, S2, S3) are complementary in the statistical process.

Conclusions
Our analysis of viral receptor sequences shows that the likelihood of viral infection correlates with the similarity in sequence of the primary and host receptors. This result is not surprising, because viral infection also inversely correlates with the inhibition of viral coat protein binding to the receptors. Importantly, we were able to establish this relationship at the amino acid sequence level, allowing for the prediction of possible human infection at an early stage of a viral outbreak, before the structures of viral coat proteins and receptors are known. Therefore, once the receptor sequences of primary reservoir and the potential host are known, the likelihood of viral infection can be predicted if the virus does not mutate too abruptly. Our simplistic approach needs further refinement because the complex processes of host tropism of viruses are largely ignored in our current method. For example, the process of host immune response could be included for better prediction of zoonosis. Although further refinements of our methods and analyses of larger databases are needed, this simple conceptual approach may be useful, even now, as a basic tool for the classification of zoonosis of new viral species.

Data collection
Viral infection requires the insertion of viral genes into host cells. Such a process begins with the binding of coat proteins to host receptors, and in some cases, coreceptors [24]. Ten RNA viruses (seven zoonotic viruses and three non-zoonotic viruses) were investigated. Viruses that use co-receptors were excluded from the study. Receptor sequence data for each virus were collected from the National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/, and the research literature was examined to determine the specific species tropism of each virus [[25], http://www.ictvonline.org/]. The viruses, host species, receptors, receptor sequences, and infection information for each host are shown in Table 1. We selected viruses that are each a representative of a different family, with different primary reservoirs. Viruses with unknown or poorly defined host receptors (particularly human receptors) were excluded from the study. Orthologues of the where N tot is the total number of amino acids in one sequence string; n tot is the total number of matched amino acids in the sequence; N phi and N pho are the numbers of hydrophilic and hydrophobic amino acids in the sequence, respectively; N others is the number of deleted amino acids (gaps/insertions in sequence) plus the number of amino acids with undetermined properties; n phi and n pho are the numbers of hydrophilic and hydrophobic amino acids matched, respectively; and g S i,1 is the similarity score for hydrophilic residues of the i th row of infection group g. Here, there are only three groups: g = 1, 2, or 3, which are the infection, non-infection, and near-infection groups, respectively. The interspecies infection information was identified and classified among three infection states: group 1 (g = 1) represents infection; group 2 (g = 2) represents non-infection; and group 3 (g = 3) represents nearinfection. By definition, if a group 1 species pair includes humans, then the infection is zoonotic. Decisions for grouping were made on the basis of experimental and epidemiological studies reported in the literature [4,[30][31][32][33].
The variables (shown in Table 1) were arranged in matrices to allow for discriminant analysis, a method of multivariate analysis that can determine the group related to variables [34]. Each group has three columns and l, m, or n rows, depending on the numbers of variable sets. Here, the matrix for group 1 is defined as: Similarly, 2 S and 3 S were defined as: All of the related variables were tabulated as shown in Table 1. From the above matrices, three averages were found for each group: The averages 2 S m,1 , 2 S m,2 , and 2 S m,3 for group 2 and 3 S n,2 , 3 S n,2 , and 3 S n,3 for group 3 were calculated similarly.
Group classification (G) was identified using the criterion: For example, if D 1 2 is the minimum among three values from the above set of three equations, then G = 1; i.e., "group 1" is the group classification. To automate the mathematical process described above, we developed a Java computer program named ZOO. To evaluate the accuracy of our method and software, we analysed a test data set (described in the Results & Discussion section).