- Open Access
Curation of viral genomes: challenges, applications and the way forward
© Kulkarni-Kale et al; licensee BioMed Central Ltd 2006
- Published: 18 December 2006
Whole genome sequence data is a step towards generating the 'parts list' of life to understand the underlying principles of Biocomplexity. Genome sequencing initiatives of human and model organisms are targeted efforts towards understanding principles of evolution with an application envisaged to improve human health. These efforts culminated in the development of dedicated resources. Whereas a large number of viral genomes have been sequenced by groups or individuals with an interest to study antigenic variation amongst strains and species. These independent efforts enabled viruses to attain the status of 'best-represented taxa' with the highest number of genomes. However, due to lack of concerted efforts, viral genomic sequences merely remained as entries in the public repositories until recently.
VirGen is a curated resource of viral genomes and their analyses. Since its first release, it has grown both in terms of coverage of viral families and development of new modules for annotation and analysis. The current release (2.0) includes data for twenty-five families with broad host range as against eight in the first release. The taxonomic description of viruses in VirGen is in accordance with the ICTV nomenclature. A well-characterised strain is identified as a 'representative entry' for every viral species. This non-redundant dataset is used for subsequent annotation and analyses using sequenced-based Bioinformatics approaches. VirGen archives precomputed data on genome and proteome comparisons. A new data module that provides structures of viral proteins available in PDB has been incorporated recently. One of the unique features of VirGen is predicted conformational and sequential epitopes of known antigenic proteins using in-house developed algorithms, a step towards reverse vaccinology.
Structured organization of genomic data facilitates use of data mining tools, which provides opportunities for knowledge discovery. One of the approaches to achieve this goal is to carry out functional annotations using comparative genomics. VirGen, a comprehensive viral genome resource that serves as an annotation and analysis pipeline has been developed for the curation of public domain viral genome data http://bioinfo.ernet.in/virgen/virgen.html. Various steps in the curation and annotation of the genomic data and applications of the value-added derived data are substantiated with case studies.
- West Nile Virus
- Dengue Virus
- Japanese Encephalitis Virus
- Classical Swine Fever Virus
- Japanese Encephalitis Virus
The emergence of high throughput technologies for genome sequencing, microarrays and proteomics transformed biology into a data-rich information science. Sequencing the complete genome of an organism is the first step in generating the 'parts list' of life. One of the first efforts involved sequencing of Haemophilus influenzae in 1995 . As of July 2006, more than 403 organisms have been sequenced completely. Furthermore, the genome sequencing projects of ~932 prokaryotic and ~608 eukaryotic species have been launched . Enormous data generated by the genome sequencing projects is archived in both dedicated genomic resources and public domain databases. While the complete genome sequencing of the model organisms and microbes are taking the center-stage, viral genome sequencing continue to be individual efforts . Viruses are a diverse group of organisms and are most abundant [4, 5]. The genome size of viruses varies from a few hundreds to millions of bases [6, 7]. SV-40 was the first virus for which the complete genome (5,224 bp) sequence was obtained in late 70s . About ~4000 viruses have been sequenced so far by virologists all over the world with an objective to study antigenic variation, geographic distribution, spread and evolution. These independent efforts enabled viruses to attain the status of 'best-represented taxa' with the highest number of whole genomes sequenced. However, due to lack of concerted efforts, viral genomic sequences only added to the entries in the public repositories until recently. The GOLD (Genome OnLine Database) is a tracking system for genome sequencing and provides the update of various genome-sequencing projects  but does not have any mechanism to specifically monitor viral genome sequencing initiatives.
Whole genome sequence data of viruses offer unlimited opportunities for data mining and knowledge discovery . The complete genome sequences of two large viral genomes viz., Mimivirus  and Polydnavirus  substantiate this fact. Varying coding density and the occurrence of genes associated with metabolic pathways in these DNA viruses offers interesting opportunities in viral genomics in general and in understanding evolution of viruses in particular . However, it is known that in the absence of curation and functional annotation of the genomic data, the utility of the sequence data is minimal and the sequence merely remains as an entry in the database. Bioinformatics provides large number of databases, tools and approaches for mining huge sequence data. Although there exist numerous genome databases for the model organisms and microbes, there are a few databases, which archive viral genomic data [12, 13]. Most of these databases are synthesis of experimental work carried out in the respective laboratories. As a result, these compilations are highly specialized [14–18].
VirGen: genome annotation & comparative genomics pipeline
Growth statistics of VirGen.
No. of families
No. of genomes
The major focus of this paper is towards sharing with the research community the issues involved in curation of viral genomic data apart from demonstrating the utility of the derived data in understanding the biology, a pre-requisite for development of viral diagnostics, vaccines and drugs.
Families in VirGen database listed according to type of genetic material and host range.
Genetic material type
▪ Curation of genomic entries of viruses
◦ Organization of genomic data in a structured fashion to facilitate navigation from family to strain/isolate
◦ Compilation of representative genomic entries for every viral species
▪ Annotation of genomic entries
▪ Compilation of synonyms for viral proteins
▪ Graphical representation of genome organization using SVG technology
▪ Precomputed Multiple Sequence Alignments (MSA) of genomes and proteomes
▪ Reconstruction of phylogeny using genome/proteome data
▪ Prediction of sequential and conformational B-cell epitopes
▪ Curation and compilation 3D structures of viral proteins available in PDB
Issues involved in curation of viral genomes
Genome sequences deposited in the public domain sequence repositories were retrieved using well-defined search strategies. The queries were formatted using keywords and MeSH terms to ensure that none of the complete genome sequences were missed. The entries were curated with respect to taxonomic hierarchy as per the guidelines provided by the ICTV . For many entries, the names of viral strains/isolates were not explicitly present in the field 'organism source' but were available as a part of feature table annotations. It was also observed that in case of a few entries, although the published reference explicitly documented the strain information along with the accession numbers, the same was missing in the sequence record. Such entries were curated manually.
In case of Hepatitis C Virus (HCV), the curation of genomic records for assignment of genotypes called for exhaustive sequence analyses. Hepatitis C virus is a member of family Flaviviridae, genus Hepacivirus and is a major causative agent of liver diseases. The assignment of Hepatitis C virus genotypes is essential as molecular epidemiology differs from subtype to subtype. Also, due to the absence of significant cross-protection among different HCV subtypes, the exact subtype identification becomes a prerequisite in determining the suitability of present antiviral therapy as well as designing new antiviral compounds and vaccines . Isolation of HCV by standard immunological and virological techniques  is rather difficult. Efforts to classify HCV at type and subtype level are based on molecular phylogeny using 5'UTR and core, NS3, NS4 & NS5 respectively . However, absence of significant sequence variation in 5'UTR limits its usage for classification only up to genotype level . Furthermore, intratypic crossing over events have been reported in some of the HCV genotypes . Given this scenario, use of a single gene or genomic region may lead to inaccuracies in genotype assignment of HCV. During the process of curation of HCV genomic sequences for incorporation into VirGen, it was found that a majority of the genomic records in the public domain repositories lacked the information on genotype. An approach based on whole genome phylogeny was used to assign the genotypes. These assignments were further substantiated through literature search. More than 130 records of HCV were annotated and added to VirGen using this process.
Identification of putative genomic records
It was observed that there were many sequences in the public domain repositories, which were not explicitly annotated with keywords such as full or whole or complete genome even though their sequence lengths were in the typical range of the complete genome sequence for a given species. Such sequences, which are not explicitly annotated as the complete genome entries have been referred to as 'putative genomes' in VirGen .
Identification of reference dataset
As there exist multiple genomic entries for every viral species, a well-annotated and characterized entry has been identified as the 'representative genomic entry' for a given species. The representative entries provide a non-redundant set of viral genome sequences, which are subsequently used for annotation of genomic records, alignment of genomes and proteomes and to study phylogenetic relationships.
Curation and annotation of complete genomes
VirGen uses sequence-based Bioinformatics approaches and stringent cutoffs for annotating the viral genomes and proteomes. Entries are annotated with respect to the genome organisation, typical to a given family. The annotations are further refined to accommodate genus and species-specific organisation. Using the representative genomes and the program BLAST , the entries are annotated with respect to individual coding sequences (cds), polyprotein(s) and individual proteins as described previously .
Compilation of synonyms for viral proteins
Derived data & its applications
Graphical genome view
Genome scale multiple sequence alignment
Comparative genomics inadvertently includes multiple sequence alignment (MSA) in a major way, as the inferences drawn are very informative. Viruses being one of the smallest replicating species are under severe selection pressure not only to evade host immune response but also to sustain its survival in the vector. Availability of genome data of viruses offers opportunities to study variations at molecular level.
Bioinformatics tools like MSA can be used to identify such variations, which when mapped on phenotypic properties like antigenicity, immunogenicity etc. may provide a rationale for the observed strain and species-specificity. MSA data and predicted epitopes for HN protein along with the predicted 3D structures were used to study strain specificity of mumps virus .
MSA module in VirGen is computed using the parallel version of ClustalW . Multiple sequence alignment of viral genome and proteome at different levels of taxonomic hierarchy, apart from being a prerequisite for phylogenetic analysis and mapping of predicted B and T-cell epitopes, help to detect species and strain-specific signature sequences and in primer design. The MSA module can also be browsed independently to access alignments and dendrograms.
It must be mentioned that these calculations require a curated non-redundant data set that includes NS3 protein sequences of all the known species belonging to genus Flavivirus. However, due to availability of limited annotations and NS3 sequence being part of polyprotein entries in the public domain databases, the curation process could not be automated completely and required manual intervention. Thus, the entire procedure of generating a curated dataset for calculation of sensitivity and specificity of any pattern is time-consuming and calls for inclusion of steps that are specific for a given data set.
Furthermore, the 399-RGD-401 sequence motif, present in domain III is unique to the mosquito-borne Flaviviruses and was proposed to form a part of receptor binding site [40, 41]. The mutations found within the loop containing the RGD motif are known to alter tropism or virulence in different Flaviviruses . MSA of 47 strains of JEV Egp show mutations in this region, leading to occurrence of seven unique tripeptides (RGE, RGH, RGG, RKD, RED, RGN, MGD). Curation and MSA of Egp sequences from additional 88 JEV strains revealed that IGD also occurs in place of RGD. All tripeptides except MGD and RGN are naturally occurring. MGD was observed in the attenuated strain (GenPept Accession: 6970068) . Similarly RGN was observed in a strain, which was passaged in Neuro-2a cells (GenPept Accession: 34495383). The RED tripeptide is present in a highly neurovirulent and neuroinvasive strain P3 (GenPept Accession: 1488031) . Thus, every variation is significant and can be correlated with functionality in the context of selection pressure.
Reconstruction of phylogeny using complete genomes
Molecular phylogenetic studies help to decipher the evolution of viruses and offer a mechanism to understand the origin and spread of infectious viral diseases. Phylogenetic trees are usually reconstructed using a single gene/protein or a region encompassing them . UTRs have also been used to study phylogenetic relationships . It was shown that the phylogenetic noise is minimum when whole genome sequence data was used for reconstruction of phylogeny . However, phylogenetic analyses using whole genome data require curation with respect to recombination and insertion sites . Also the use of whole genome phylogeny has been restricted due its compute-intensive nature. The phylogeny module of VirGen includes the whole genome phylogenetic trees obtained using a parallel version of the PHYLIP (Felsenstein, J., Department of Genetics, University of Washington, Seattle and Silicon Graphics Incorporation) implemented on SGI Onyx 300, a 4 processor parallel machine. The parameters for phylogenetic analysis in VirGen are as described previously . Whole genome phylogenetic analysis facilitates identification of unclassified viruses and when coupled with the antigenicity data, assists in the identification of viral strains, isolated during epidemics.
Genome to Vaccinome: compilation of predicted epitopes
Availability of genomic sequences has paved way for in silico design of vaccines, a field popularly known as 'reverse vaccinology' . One of the prerequisites for in silico vaccine design is the availability of a curated data set of antigenic proteins and a set of programs for prediction of epitopes. As a step towards reverse vaccinology, VirGen stores predicted B-cell epitopes of antigenic proteins using sequence-based and structure-based algorithms developed in-house [50–52].
Kolaskar & Tongaonkar's method is a sequence-based approach for prediction of B-cell epitopes. The algorithm is based on antigenic propensity, which is assigned to each of the twenty amino acids depending on their frequency of occurrence in experimentally determined B cell epitopes. Parameters like hydrophilicity, accessibility and flexibility are averaged for every overlapping heptapeptide from N to C terminii and assigned to the central residue of every segment. The residues having average antigenic propensity ≥ 1.0 are termed as potential antigenic determinants. The accuracy of this method is ~75% and has been implemented in major sequence analysis packages viz, GCG  and EMBOSS . VirGen contains the predicted epitopes of known antigenic proteins. However, if one wants to predict the B-cell epitopes for any other proteins of interest, using Kolaskar & Tongaonkar approach, a link is provided to the Antigenic program of EMBOSS.
Module for Viral structures
Three-dimensional structural data is essential to understand function of proteins at molecular level. The 3D structures of proteins are known to be conserved and can accommodate sequence variation up to 80%. Structures of ~1082 viral proteins and viral assemblies have been solved and 3D coordinates are available in the PDB . Structural genomics initiatives for viruses have been launched recently and are limited to a few viral species such as SARS  and Poxvirus . VirGen includes a module for viral structures that are deposited in PDB. The PDB entries have been curated with respect to the description of organism source, as variations were found in the same. The viral structures can be searched using PDB ID as well as protein and organism description. Precomputed results of conformational epitope prediction or probable antibody-binding sites are also made available for these structures. The structure module facilitates identification of templates for knowledge-based homology modeling and identification of targets for structural genomics initiatives. Using homology-modeling approach, the structures of envelope glycoprotein of two strains of Japanese encephalitis virus have been predicted. These models were used to design candidates for peptide vaccine for Japanese encephalitis and helped to gain an insight into strain-specific variations [37, 38].
Comparative genomics to understand species-specificity
Analyses of curated data of whole genomes and proteomes reveals molecular variations, which when correlated with the observed phenotype provide a rationale for species-specific properties. As an attempt towards this, molecular mechanism of polyprotein cleavage at NS3-NS2B in JEV was modeled using the experimental data available for related flaviviruses since NS3 is a candidate for anti-viral therapy.
Prediction of 3D structure of NS3
The 3D structure of NS3 serine protease domain of Japanese Encephalitis virus (Virulent SA-14-14-2 strain) was predicted using knowledge-based homology modeling approach. Two models of NS3 serine protease of JEV were built using the crystal structures of NS3 of Dengue virus II (DEN) PDB_ID: 1BEF  and Hepatitis C Virus (HCV) PDB_ID: 1JXP . These templates showed remarkable similarity in terms of fold (two, six β barrel domains that are separated by a linker region), secondary structure content and conformation of active site. The sequence identity between NS3 of JEV with that of DEN and HCV is 47.16% and 13.3% respectively. The active site residues His51, Asp75 and Ser135 are found to be conserved. Sequence and structural alignments were used to identify the structurally conserved regions (SCR) and loops.
A hydrophobic stretch of the non-structural protein 2B (NS2B), an integral membrane protein, serves as a cofactor for NS3 [61, 72]. However, this sequence was not identified explicitly in JEV. A detailed analyses of the known cofactors of NS3 belonging to Flaviviridae family and their multiple sequence alignments combined with hydropathic profiles generated using Kyte & Dolittle index  helped to identify the putative hydrophobic stretch of NS2B. The residues, thus mapped for JEV are 62-WEMDAAITGSSR-73. It is known that the binding of the NS2B to NS3 is a co-translational event , where NS2B may only be partially folded. The structure of only hydrophobic stretch mentioned above was predicted using multiple MD simulations of 1ns duration. The peptide was found to adopt helical conformation predominantly (data not shown).
Docking Cofactor & NS3
In the initial phase, docking of NS3 serine protease with the cofactor was carried out to study their interactions at molecular level. The cofactor (NS2B) was first docked onto NS3, followed by docking of substrate with the complex of NS2B-NS3.
Simulation of ternary complex
The exquisite selectivity of serine proteases for particular substrate is a result of the existence of specific binding sites on the enzyme for amino acid side chains of the substrate . The substrate is oriented by binding of the amino acid side chain of the P1 residue in the S1 pocket (P1 is the substrate residue at the amino terminal, and P1' is the residue at the carboxyl terminal of the scissile bond) via hydrogen bond. It is known that the substrate binding cleft of the Dengue virus II protease is not very extensive and does not appear capable of providing specific interactions in the absence of NS2B activating peptide, with side chains beyond P2 and P2' . This observation is consistent with the heterogeneity of residues beyond these sites as seen in flaviviral proteases.
Residues in the binding pocket of NS3 of JEV and DEN-2. Active site residues are underlined and variations are shown in bold.
The reductionalist approach such as whole genome sequencing enables understanding of life and its processes at molecular level. However, the challenge in the post-genomic era is to establish the link between the genomic parts and phenotypic features, which requires systematic organisation and integration of various databases that span the spectrum of Biocomplexity. VirGen is a comprehensive genomics resource for viruses. VirGen attempts to compile and curate the whole genome sequences of viruses. Various features and utilities to analyse viral genome data has been discussed using a case study of family Flaviviridae. The NS3 case study involving prediction of structure of enzyme, cofactor and docking of cofactor and substrate to explain molecular mechanism of cleavage in JEV demonstrates the utility of resources such as VirGen in studying the species-specific variations. Comparative genomics studies of this kind enable expansion of the available knowledgebase. The primary as well as the value-added derived data in VirGen will be highly useful not only in understanding the strain and species-specific variations but will also serve as starting point for discovery of diagnostic kits, antiviral drugs and vaccines. Furthermore, we believe that it would be a useful resource in analysing the outcome of metagenomics initiatives [82, 5].
Data curation & annotations
Various sequence-based [27, 31, 32, 50] and structure-based [37, 51, 52] bioinformatics approaches were used to curate, annotate and analyse viral genome sequence data. Perl scripts were used to retrieve, parse, populate and update the database.
Homology modelling: NS3
The models were built using Homology module of Insight II molecular modeling package. Amber-all atom force field  and distance dependent dielectric constant of 4rij was used. The models were refined using steepest descents and conjugate gradient methods, the detailed protocol of which is described previously [37, 38].
Molecular dynamics simulations: NS2B peptide (co-factor)
The initial conformation of the peptide WEMDAAITGSSR was assigned randomly from the allowed region of the Ramachandran plot. The equilibration was carried out for 100ps. Amber force field v1.6 and a distance dependent dielectric constant of 4rij was used. The trajectory data was sampled every 10ps, resulting in 100 frames per MD run. The conformation that is most populated, having the least energy and lying in the allowed region of Ramachandran plot was identified for docking studies.
Molecular docking simulations: secondary and ternary complexes
Docking studies were carried out using the Affinity module of InsightII.
The protocol used is as follows:
▪ Amber force field with implicit solvent model was employed.
▪ Initially the ligand (NS2B/substrate) was placed near the binding site.
▪ The residues of protein lying within 5Å radius from the center of ligand were defined as 'movable atoms' whereas the rest of the molecule was kept rigid during docking and was defined as 'bulk set'.
▪ Ligand was flexible with respect to (φ,ψ) torsional angles. Hydrogen bond donor and acceptor atoms were defined for both the protein and the ligand. In order to avoid displacement of the ligand far away from the active site residues, the ligand was confined to 3Å radius.
▪ Initial coarse search was carried out using Quartic_vdw_no_Coulb method in which coulombic terms are not calculated thereby excluding electrostatic interactions. This allows sampling of larger conformational space in a shorter time interval. 100 docked conformations were collected during this phase in which the ligand is subjected to random combinations of translational and rotational movements followed by 1000 steps of minimization using conjugate gradients. The conformations lying within the energy range of 100 kcal and 1Å RMS deviation were selected.
▪ The conformations collected in previous step were filtered using various parameters like orientation of ligand, distance from the active-site residues and bad contacts, if any.
▪ Twenty conformations satisfying above criteria were further refined in the second stage using Group_based method in which non-bond interactions were calculated using van der Waals and coulombic cutoffs of 15 and 10Å respectively. The conformers so obtained were refined by minimization of 100 steps of conjugate gradient followed by 50 steps of simulated annealing starting at an initial temperature of 500 K up to a final temperature of 300 K with temperature leap of 4 K at each step. The Newton's equations of motion were integrated using the Verlet algorithm  with a time step of 1fs using NVT ensemble. Temperature control was achieved by direct scaling of atom velocities. Finally, the conformers were minimized to a gradient of 0.001 kcal/mole/Å or less using conjugate gradients.
▪ The conformers were evaluated based on the interaction energy, which is a sum total of van der Waal and coulombic interactions. The least energy conformer having favorable interactions with the residues in the binding pocket was selected for further analysis.
VirGen is supported by the Department of Biotechnology, Government of India under Centre of Excellence (COE) grant. The students who worked on various aspects of development of VirGen as a part of their masters' projects are acknowledged. Shubhada Nagarkar's help in sitemap development and Janaki Ojha's help in testing scripts for data updates are deeply appreciated. The authors would like to thank the anonymous reviewers for their useful suggestions to improve the presentation of this manuscript.
This article has been published as part of BMC Bioinformatics Volume 7, Supplement 6, 2006: APBioNet – Fifth International Conference on Bioinformatics (InCoB). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/7?issue=S5.
- Fraser CM, Casjens S, Huang WM, Sutton GG, Clayton R, Lathigra R, White O, Ketchum KA, Dodson R, Hickey EK, Gwinn M, Dougherty B, Tomb JF, Fleischmann RD, Richardson D, Peterson J, Kerlavage AR, Quackenbush J, Salzberg S, Hanson M, van Vugt R, Palmer N, Adams MD, Gocayne J, Weidman J, Utterback T, Watthey L, McDonald L, Artiach P, Bowman C, Garland S, Fuji C, Cotton MD, Horst K, Roberts K, Hatch B, Smith HO, Venter JC: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995, 269: 496–512. 10.1126/science.7542800View ArticleGoogle Scholar
- Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides NC: The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res 2006, 34: D332-D334. 10.1093/nar/gkj145PubMed CentralView ArticlePubMedGoogle Scholar
- Mindell DP, Villarreal LP: Don't forget about viruses. Science 2003, 302: 1677. 10.1126/science.302.5651.1677bView ArticlePubMedGoogle Scholar
- Bergh O, Borsheim KY, Bratbak G, Heldal M: High abundance of viruses found in aquatic environments. Nature 1989, 340: 467–468. 10.1038/340467a0View ArticlePubMedGoogle Scholar
- Breitbart M, Rohwer F: Here a virus, there a virus, everywhere the same virus? Trends Microbiol 2005, 13: 278–284. 10.1016/j.tim.2005.04.003View ArticlePubMedGoogle Scholar
- Kaper JM, Tousignant ME, Steger G: Nucleotide sequence predicts circularity and self-cleavage of 300-ribonucleotide satellite of arabis mosaic virus. Biochem Biophys Res Commun 1988, 154: 318–325. 10.1016/0006-291X(88)90687-0View ArticlePubMedGoogle Scholar
- Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM: The 1.2-megabase genome sequence of Mimivirus. Science 2004, 306: 1344–1350. 10.1126/science.1101485View ArticlePubMedGoogle Scholar
- Fiers W, Contreras R, Haegemann G, Rogiers R, Van de Voorde A, Van Heuverswyn H, Van Herreweghe J, Volckaert G, Ysebaert M: Complete nucleotide sequence of SV40 DNA. Nature 1978, 273: 113–120. 10.1038/273113a0View ArticlePubMedGoogle Scholar
- Claverie JM: Viruses take center stage in cellular evolution. Genome Biol 2006, 7: 110. 10.1186/gb-2006-7-6-110PubMed CentralView ArticlePubMedGoogle Scholar
- Espagne E, Dupuy C, Huguet E, Cattolico L, Provost B, Martins N, Poirie M, Periquet G, Drezen JM: Genome sequence of a polydnavirus: insights into symbiotic virus evolution. Science 2004, 306: 286–289. 10.1126/science.1103066View ArticlePubMedGoogle Scholar
- Desjardins C, Eisen JA, Nene V: New evolutionary frontiers from unusual virus genomes. Genome Biol 2005, 6: 212. 10.1186/gb-2005-6-3-212PubMed CentralView ArticlePubMedGoogle Scholar
- Bao Y, Federhen S, Leipe D, Pham V, Resenchuk S, Rozanov M, Tatusov R, Tatusova T: National center for biotechnology information viral genomes project. J Virol 2004, 78: 7291–7298. 10.1128/JVI.78.14.7291-7298.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Brooksbank C, Cameron G, Thornton J: The European Bioinformatics Institute's data resources: towards systems biology. Nucleic Acids Res 2005, 33: D46-D53. 10.1093/nar/gki026PubMed CentralView ArticlePubMedGoogle Scholar
- Alba MM, Lee D, Pearl FM, Shepherd AJ, Martin N, Orengo CA, Kellam P: VIDA: a virus database system for the organization of animal virus genome open reading frames. Nucleic Acids Res 2001, 29: 133–136. 10.1093/nar/29.1.133PubMed CentralView ArticlePubMedGoogle Scholar
- Lefkowitz EJ, Upton C, Changayil SS, Buck C, Traktman P, Buller RM: Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource. Nucleic Acids Res 2005, 33: D311-D316. 10.1093/nar/gki110PubMed CentralView ArticlePubMedGoogle Scholar
- Brodie R, Smith AJ, Roper RL, Tcherepanov V, Upton C: Base-By-Base: single nucleotide-level analysis of whole viral genome alignments. BMC Bioinformatics 2004, 5: 96. 10.1186/1471-2105-5-96PubMed CentralView ArticlePubMedGoogle Scholar
- Kuiken C, Yusim K, Boykin L, Richardson R: The Los Alamos hepatitis C sequence database. Bioinformatics 2005, 21: 379–384. 10.1093/bioinformatics/bth485View ArticlePubMedGoogle Scholar
- Rocheleau L, Pelchat M: The Subviral RNA Database: a toolbox for viroids, the hepatitis delta virus and satellite RNAs research. BMC Microbiol 2006, 6: 24. 10.1186/1471-2180-6-24PubMed CentralView ArticlePubMedGoogle Scholar
- Kulkarni-Kale U, Bhosle S, Manjari GS, Kolaskar AS: VirGen : a comprehensive viral genome resource. Nucleic Acids Res 2004, 32: D289–292. 10.1093/nar/gkh098PubMed CentralView ArticlePubMedGoogle Scholar
- VirGen: a comprehensive viral genome resource[http://bioinfo.ernet.in/virgen/virgen.html]
- Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA: Virus Taxonomy VIIIth Report of the International Committee on Taxonomy of Viruses. San Diego: Academic Press; 2004.Google Scholar
- Nousbaum JB: Genomic subtypes of hepatitis C virus: epidemiology, diagnosis and clinical consequences. Bull Soc Pathol Exot 1998, 91: 29–33.PubMedGoogle Scholar
- Nitkiewicz J: Molecular epidemiology of chronic hepatitis C (HCV) virus. Przegl Epidemiol 2004, 58: 413–421.PubMedGoogle Scholar
- Sandres-Saune K, Deny P, Pasquier C, Thibaut V, Duverlie G, Izopet J: Determining hepatitis C genotype by analyzing the sequence of the NS5b region. J Virol Methods 2003, 109: 187–193. 10.1016/S0166-0934(03)00070-3View ArticlePubMedGoogle Scholar
- Lole KS, Jha JA, Shrotri SP, Tandon BN, Prasad VG, Arankalle VA: Comparison of hepatitis C virus genotyping by 5' noncoding region- and core-based reverse transcriptase PCR assay with sequencing and use of the assay for determining subtype distribution in India. J Clin Microbiol 2003, 41: 5240–5244. 10.1128/JCM.41.11.5240-5244.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Colina R, Casane D, Vasquez S, Garcia-Aguirre L, Chunga A, Romero H, Khan B, Cristina J: Evidence of intratypic recombination in natural populations of hepatitis C virus. J Gen Virol 2004, 85: 31–37. 10.1099/vir.0.19472-0View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M: The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol 2005, 6: R44. 10.1186/gb-2005-6-5-r44PubMed CentralView ArticlePubMedGoogle Scholar
- Rinck G, Birghan C, Harada T, Meyers G, Thiel HJ, Tautz N: A cellular J-domain protein modulates polyprotein processing and cytopathogenicity of a pestivirus. J Virol 2001, 75: 9470–9482. 10.1128/JVI.75.19.9470-9482.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Kulkarni-Kale U, Ojha J, Manjari GS, Deobagkar DD, Mallya AD, Dhere RM, Kapre SV: Mapping antigenic diversity & strain-specificity of mumps virus: a Bioinformatics approach. Virology 2006, in press.Google Scholar
- Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 2003, 31: 3497–3500. 10.1093/nar/gkg500PubMed CentralView ArticlePubMedGoogle Scholar
- Wu TT, Kabat EA: An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J Exp Med 1970, 132: 211–250. 10.1084/jem.132.2.211PubMed CentralView ArticlePubMedGoogle Scholar
- Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, Bairoch A: Recent improvements to the PROSITE database. Nucleic Acids Res 2004, 32: D134-D137. 10.1093/nar/gkh044PubMed CentralView ArticlePubMedGoogle Scholar
- Heinz FX: Epitope mapping of flavivirus glycoproteins. Adv Virus Res 1986, 31: 103–168.View ArticlePubMedGoogle Scholar
- Heinz FX, Roehrig JT: Flaviviruses. In Immunochemistry of viruses. Volume 2. Amsterdam-New York-Oxford: Elsevier; 1990:289–305.Google Scholar
- Roehrig JT, Hunt AR, Johnson AJ, Hawkes RA: Synthetic peptides derived from the deduced amino acid sequence of the E-glycoprotein of Murray Valley encephalitis virus elicit antiviral antibody. Virology 1989, 171: 49–60. 10.1016/0042-6822(89)90509-6View ArticlePubMedGoogle Scholar
- Kolaskar AS, Kulkarni-Kale U: Prediction of three-dimensional structure and mapping of conformational epitopes of envelope glycoprotein of Japanese encephalitis virus. Virology 1999, 261: 31–42. 10.1006/viro.1999.9859View ArticlePubMedGoogle Scholar
- Kulkarni-Kale U, Kolaskar AS: Prediction of 3D structure of envelope glycoprotein of Sri Lanka strain of Japanese encephalitis virus. In the proceedings of first APBC conference: 4–7 February 2003; Conferences in research and practice in information technology 19, 2003 Edited by: Yi-Ping Phoebe Chen. 2003, 87–96.Google Scholar
- Monath TP, Arroyo J, Levenbook I, Zhang ZX, Catalan J, Draper K, Guirakhoo F: Single mutation in the flavivirus envelope protein hinge region increases neurovirulence for mice and monkeys but decreases viscerotropism for monkeys: relevance to development and safety testing of live, attenuated vaccines. J Virol 2002, 76: 1932–1943. 10.1128/JVI.76.4.1932-1943.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Lobigs M, Usha R, Nestorowicz A, Marshall ID, Weir RC, Dalgarno L: Host cell selection of Murray Valley encephalitis virus variants altered at an RGD sequence in the envelope protein and in mouse virulence. Virology 1990, 176: 587–595. 10.1016/0042-6822(90)90029-QView ArticlePubMedGoogle Scholar
- van der Most RG, Corver J, Strauss JH: Mutagenesis of the RGD motif in the yellow fever virus 17D envelope protein. Virology 1999, 265: 83–95. 10.1006/viro.1999.0026View ArticlePubMedGoogle Scholar
- Wu SC, Lee SC: Complete nucleotide sequence and cell-line multiplication pattern of the attenuated variant CH2195LA of Japanese encephalitis virus. Virus Res 2001, 73: 91–102. 10.1016/S0168-1702(00)00235-5View ArticlePubMedGoogle Scholar
- Ni H, Barrett AD: Molecular differences between wild-type Japanese encephalitis virus strains of high and low mouse neuroinvasiveness. J Gen Virol 1996, 77: 1449–1455.View ArticlePubMedGoogle Scholar
- Jobes DV, Chima SC, Ryschkewitsch CF, Stoner GL: Phylogenetic analysis of 22 complete genomes of the human polyomavirus JC virus. J Gen Virol 1998, 79: 2491–2498.View ArticlePubMedGoogle Scholar
- Tokita H, Okamoto H, Iizuka H, Kishimoto J, Tsuda F, Miyakawa Y, Mayumi M: The entire nucleotide sequences of three hepatitis C virus isolates in genetic groups 7–9 and comparison with those in the other eight genetic groups. J Gen Virol 1998, 79: 1847–1857.View ArticlePubMedGoogle Scholar
- Salemi M, Vandamme AM: Hepatitis C virus evolutionary patterns studied through analysis of full-genome sequences. J Mol Evol 2002, 54: 62–70. 10.1007/s00239-001-0018-9View ArticlePubMedGoogle Scholar
- Gould EA, Moss SR, Turner SL: Evolution and dispersal of encephalitic flaviviruses. Arch Virol Suppl 2004, 18: 65–84.PubMedGoogle Scholar
- Crabtree MB, Sang RC, Stollar V, Dunster LM, Miller BR: Genetic and phenotypic characterization of the newly described insect flavivirus, Kamiti River virus. Arch Virol 2003, 148: 1095–1118. 10.1007/s00705-003-0019-7View ArticlePubMedGoogle Scholar
- Capecchi B, Serruto D, Adu-Bobie J, Rappuoli R, Pizza M: The genome revolution in vaccine research. Curr Issues Mol Biol 2004, 6: 17–27.PubMedGoogle Scholar
- Kolaskar AS, Tongaonkar PC: A semi-empirical method for prediction of antigenic determinants on protein antigens. FEBS Lett 1990, 276: 172–174. 10.1016/0014-5793(90)80535-QView ArticlePubMedGoogle Scholar
- Kulkarni-Kale U, Bhosle S, Kolaskar AS: CEP: a conformational epitope prediction server. Nucleic Acids Res 2005, 33: W168-W171. 10.1093/nar/gki460PubMed CentralView ArticlePubMedGoogle Scholar
- CEP a conformational epitope prediction server[http://bioinfo.ernet.in/cep.htm]
- Womble DD: GCG: The Wisconsin Package of sequence analysis programs. Methods Mol Biol 2000, 132: 3–22.PubMedGoogle Scholar
- EMBOSS: European Molecular Biology Open Software Suite[http://portal.litbio.org/Registered/Option/emboss.html]
- Modis Y, Ogata S, Clements D, Harrison SC: A ligand-binding pocket in the dengue virus envelope glycoprotein. Proc Natl Acad Sci USA 2003, 100: 6986–6991. 10.1073/pnas.0832193100PubMed CentralView ArticlePubMedGoogle Scholar
- Hiramatsu K, Tadano M, Men R, Lai CJ: Mutational analysis of a neutralization epitope on the dengue type 2 virus (DEN2) envelope protein: monoclonal antibody resistant DEN2/DEN4 chimeras exhibit reduced mouse neurovirulence. Virology 1996, 224: 437–445. 10.1006/viro.1996.0550View ArticlePubMedGoogle Scholar
- Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, Green RK, Flippen-Anderson JL, Westbrook J, Berman HM, Bourne PE: The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res 2005, 33: D233-D237. 10.1093/nar/gki057PubMed CentralView ArticlePubMedGoogle Scholar
- Campanacci V, Egloff MP, Longhi S, Ferron F, Rancurel C, Salomoni A, Durousseau C, Tocque F, Bremond N, Dobbe JC, Snijder EJ, Canard B, Cambillau C: Structural genomics of the SARS coronavirus: cloning, expression, crystallization and preliminary crystallographic study of the Nsp9 protein. Acta Crystallogr D Biol Crystallogr 2003, 59: 1628–1631. 10.1107/S0907444903016779View ArticlePubMedGoogle Scholar
- Randall AZ, Baldi P, Villarreal LP: Structural proteomics of the poxvirus family. Artif Intell Med 2004, 31: 105–115. 10.1016/j.artmed.2004.01.006View ArticlePubMedGoogle Scholar
- Tiroumourougane SV, Raghava P, Srinivasan S: Japanese viral encephalitis. Postgrad Med J 2002, 78: 205–215. 10.1136/pmj.78.918.205PubMed CentralView ArticlePubMedGoogle Scholar
- Falgout B, Pethel M, Zhang YM, Lai CJ: Both nonstructural proteins NS2B and NS3 are required for the proteolytic processing of dengue virus nonstructural proteins. J Virol 1991, 65: 2467–2475.PubMed CentralPubMedGoogle Scholar
- Bartenschlager R, Ahlborn-Laake L, Mous J, Jacobsen H: Kinetic and structural analyses of hepatitis C virus polyprotein processing. J Virol 1994, 68: 5045–5055.PubMed CentralPubMedGoogle Scholar
- Chambers TJ, Weir RC, Grakoui A, McCourt DW, Bazan JF, Fletterick RJ, Rice CM: Evidence that the N-terminal domain of nonstructural protein NS3 from yellow fever virus is a serine protease responsible for site-specific cleavages in the viral polyprotein. Proc Natl Acad Sci USA 1990, 87: 8898–8902. 10.1073/pnas.87.22.8898PubMed CentralView ArticlePubMedGoogle Scholar
- Preugschat F, Lenches EM, Strauss JH: Flavivirus enzyme-substrate interactions studied with chimeric proteinases: identification of an intragenic locus important for substrate recognition. J Virol 1991, 65: 4749–4758.PubMed CentralPubMedGoogle Scholar
- Clum S, Ebner KE, Padmanabhan R: Cotranslational membrane insertion of the serine proteinase precursor NS2B-NS3(Pro) of dengue virus type 2 is required for efficient in vitro processing and is mediated through the hydrophobic regions of NS2B. J Biol Chem 1997, 272: 30715–30723. 10.1074/jbc.272.49.30715View ArticlePubMedGoogle Scholar
- Krishna Murthy HM, Judge K, DeLucas L, Clum S, Padmanabhan R: Crystallization, characterization and measurement of MAD data on crystals of dengue virus NS3 serine protease complexed with mung-bean Bowman-Birk inhibitor. Acta Crystallogr D Biol Crystallogr 1999, 55: 1370–1372. 10.1107/S0907444999007064View ArticlePubMedGoogle Scholar
- Matusan AE, Kelley PG, Pryor MJ, Whisstock JC, Davidson AD, Wright PJ: Mutagenesis of the dengue virus type 2 NS3 proteinase and the production of growth-restricted virus. J Gen Virol 2001, 82: 1647–1656.View ArticlePubMedGoogle Scholar
- Yao N, Reichert P, Taremi SS, Prosise WW, Weber PC: Molecular views of viral polyprotein processing revealed by the crystal structure of the hepatitis C virus bifunctional protease-helicase. Structure 1999, 7: 1353–1363. 10.1016/S0969-2126(00)80025-8View ArticlePubMedGoogle Scholar
- Chappell KJ, Nall TA, Stoermer MJ, Fang NX, Tyndall JD, Fairlie DP, Young PR: Site-directed mutagenesis and kinetic studies of the West Nile Virus NS3 protease identify key enzyme-substrate interactions. J Biol Chem 2005, 280: 2896–2903. 10.1074/jbc.M409931200View ArticlePubMedGoogle Scholar
- Bessaud M, Grard G, Peyrefitte CN, Pastorino B, Rolland D, Charrel RN, de Lamballerie X, Tolou HJ: Identification and enzymatic characterization of NS2B-NS3 protease of Alkhurma virus, a class-4 flavivirus. Virus Res 2005, 107: 57–62. 10.1016/j.virusres.2004.06.015View ArticlePubMedGoogle Scholar
- Shivashankar Y, Satchidanandam V: Expression of the Japanese encephalitis virus NS3 and NS2b proteins as glutathione S-transferase fusions. Indian J Biochem Biophys 1995, 32: 356–360.PubMedGoogle Scholar
- Jan LR, Yang CS, Trent DW, Falgout B, Lai CJ: Processing of Japanese encephalitis virus non-structural proteins: NS2B-NS3 complex and heterologous proteases. J Gen Virol 1995, 76: 573–580.View ArticlePubMedGoogle Scholar
- Yamshchikov VF, Trent DW, Compans RW: Upregulation of signalase processing and induction of prM-E secretion by the flavivirus NS2B-NS3 protease: roles of protease components. J Virol 1997, 71: 4364–4371.PubMed CentralPubMedGoogle Scholar
- Murthy HM, Clum S, Padmanabhan R: Dengue virus NS3 serine protease. Crystal structure and insights into interaction of the active site with substrates by molecular modeling and structural analysis of mutational effects. J Biol Chem 1999, 274: 5573–5580. 10.1074/jbc.274.9.5573View ArticlePubMedGoogle Scholar
- Yan Y, Li Y, Munshi S, Sardana V, Cole JL, Sardana M, Steinkuehler C, Tomei L, De Francesco R, Kuo LC, Chen Z: Complex of NS3 protease and NS4A peptide of BK strain hepatitis C virus: a 2.2 A resolution structure in a hexagonal crystal form. Protein Sci 1998, 7: 837–847.PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: Procheck: a program to check the stereochemical quality of protein structure. J Appl Cryst 1993, 26: 283–291. 10.1107/S0021889892009944View ArticleGoogle Scholar
- Sippl MJ: Recognition of errors in three-dimensional structures of proteins. Proteins 1993, 17: 355–362. 10.1002/prot.340170404View ArticlePubMedGoogle Scholar
- Ramachandran GN, Sasisekharan V: Conformation of polypeptides and proteins. Adv Protein Chem 1968, 23: 283–438.View ArticlePubMedGoogle Scholar
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157: 105–132. 10.1016/0022-2836(82)90515-0View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Perona JJ, Craik CS: Evolutionary divergence of substrate specificity within the chymotrypsin-like serine protease fold. J Biol Chem 1997, 272: 29987–29990. 10.1074/jbc.272.48.29987View ArticlePubMedGoogle Scholar
- Riesenfeld CS, Schloss PD, Handelsman J: Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 2004, 38: 525–52. 10.1146/annurev.genet.38.072902.091216View ArticlePubMedGoogle Scholar
- Seibel G, Singh UC, Weiner PK, Caldwell J, Kollman P: AMBER 3.0 revision. San Francisco. University of California at San Francisco; 1990.Google Scholar
- Brunger AT, Brooks CL, Karplus M: Stochastic boundary conditions for molecular dynamics simulations of ST2 water. Chem Phys Let 1984, 105: 495–500. 10.1016/0009-2614(84)80098-6View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.