WSPMaker: a web tool for calculating selection pressure in proteins and domains using window-sliding
https://doi.org/10.1186/1471-2105-9-S12-S13
© Lee et al; licensee BioMed Central Ltd. 2008
Published: 12 December 2008
Abstract
Background
In the study of adaptive evolution, it is important to detect the protein coding sites where natural selection is acting. In general, the ratio of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks) is used to estimate either negative or positive selection for an entire gene region of interest. However, since each amino acid in a region has a different function and structure, the type and strength of natural selection may be different for each amino acid. Specifically, domain sites on the protein are indicative of structurally and functionally important sites. Therefore, Window-sliding tools can be used to detect evolutionary forces acting on mutation sites.
Results
This paper reports the development of a web-based tool, WSPMaker (Window-sliding Selection pressure Plot Maker), for calculating selection pressures (estimated by Ka/Ks) in the sub-regions of two protein-coding DNA sequences (CDSs). The program uses a sliding window on DNA with a user-defined window length. This enables the investigation of adaptive protein evolution and shows selective constraints of the overall/specific region(s) of two orthologous gene-coding DNA sequences. The method accommodates various evolutionary models and options such as the sliding window size. WSPmaker uses domain information from Pfam HMM models to detect highly conserved residues within orthologous proteins.
Conclusion
WSPMaker is a web tool for scanning and calculating selection pressures (estimated by Ka/Ks) in sub-regions of two protein-coding DNA sequences (CDSs).
Keywords
Background
In researching adaptive evolution processes, it is important to detect protein coding genes that are under an active natural selection process. Traditionally, this is described as the ratio of Ka/Ks. Ka is the rate of non-synonymous substitution and Ks is the rate of synonymous substitution. The Ka/Ks ratio can be used to estimate either negative or positive selection tendencies for genes of interest [1–3]. However, most Ka/Ks ratio values of complete coding genes have been found to be too low to be easily detected. This is because, in general, non-synonymous substitutions occur much less frequently than synonymous substitutions [4]. Therefore, regions that may have important protein function and structure showing Ka/Ks values over 1 are not readily detectable with classical methods. Conventional methods often use sliding windows for detecting high Ka/Ks values that may represent high selection pressure at a single amino acid site or gene region [5, 6]. However, these methods can be ineffective at detecting good candidate sites when natural selection is under way or when a small number of alignable sequences are provided.
Recently, the Selecton program [7] has been developed to detect the evolutionary effects on a single amino acid site within the three-dimensional (3D) protein structure. This tool is powerful at detecting active substitution regions if the 3D structures of genes are available and if at least five homologous sequences are provided. Unfortunately, not all genes have corresponding 3D structures, and it may be difficult to provide five homologous sequences from five different species. This paper presents WSPMaker (Window-Sliding Selection Pressure plot maker), a new web tool for assisting in the detection of gene regions under differential selection pressure when only two pair-aligned protein-coding sequences are provided.
The most important feature of the WSPMaker is that it compares the selection constraints of domain and non-domain regions. It provides a simple (or effective) method to detect for defining positive and negative selection pressure regions within protein-coding sequences.
Results
A schematic overview of the WSPMaker.
Implemented evolutionary models
WSPMaker implements several different evolutionary models widely used to estimate the level of selection operating on a gene or coding region. In the 'Single Size' option, there are three model pairs (M1a v. M2a: nearly neutral v. positive selection, M7 v. M8: beta v. beta & w, and M0 v. M3: one-ratio v. discrete). The first pair includes M1a (nearly neutral) and M2a (positive selection). The second pair includes M7 (beta) and M8 (beta & ω), while the third pair includes M0 (one ratio) and M3 (discrete). Comparison of these model pairs allows the for statistical testing of the hypothesis that there is positive selection on proteins (alternative hypothesis model) against the null hypothesis model. The output is the likelihood of each model, allowing for a comparison using a likelihood ratio test (LRT). The program predicts when a site is undergoing positive selection. In the second option, 'User Defined Size,' there are two models. The Yang and Nielsen model [9] and the Nei and Gojobori [2] model are commonly used in estimating synonymous and non-synonymous substitution rates and detecting positive selection in protein-coding sequences.
Input data format and parameters
WSPMaker takes a default input data type of user-curated orthologous DNA coding sequence pairs. WSPMaker allows users to submit various sequence formats, such as FASTA, EMBL, GCG, ClustalW, and Phylip/Phylip4, for the input data containing codon-based alignment of paired sequences. WSPMaker provides a choice of one of the 17 NCBI genetic codes [10] for automatically generated codon-based alignment after translating orthologous CDSs. There are two additional advanced options: 'Single Size' and 'User Defined Size.' If users select 'Single Size,' they can choose one of three models to detect the sites that are under significant selection pressure. The default window and sliding size is 3 bp for calculating the Ka/Ks ratio. The 'User Defined Size' parameter is for calculating Ka/Ks using a user-defined sliding window size, and allows the users to choose one of two models in common use (Yang and Nielsen & Nei and Gojobori models). The default window size is 6% of the given gene sequence. This value is set empirically and can be altered. In addition, users can set positive or negative threshold values (defaults are one). Hmmpfam 2.3.2 was used to analyze domain distribution among all orthologous genes and to extract pfam domain sequences [11]. A threshold E-value <= 0.5 gave the best quality results, and thus was used in the present system. This domain architecture of the input orthologous genes was visualized with functional domains using pfam in the resulting web interface.
Output
An example of the WSPMaker main output page. The calculated Ka/Ks ratios are plotted as slim bars within a default sliding window size. A) WSPMaker results for MCPH1. B) WSPMaker results for CHRM5.
Applying WSPMaker to biological cases
Here, we give an example illustrating WSPMaker's use for detecting positive selection (Figure 2).
An example of the WSPMaker main input page. Users can paste in or upload sequences to calculate the Ka/Ks values of sub-regions in a CDS.
Conclusion
WSPMaker is a useful web-based visualization tool that shows which DNA sub-regions of coding sequences are under strong selection pressure. It can assist researchers in easily detecting candidate genes and partial gene regions for fast base substitution in functional and evolutionary analyses, using its window-sliding visual graphs. The WSPMaker server can be accessed at http://wspmaker.kobic.kr.
Declarations
Acknowledgements
This research was supported by a grant from the Korean Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program and the Korean Ministry of Science and Technology (MOST) under grant number (M10757020001-07N5702-00110).
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 12, 2008: Asia Pacific Bioinformatics Network (APBioNet) Seventh International Conference on Bioinformatics (InCoB2008). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/9?issue=S12.
Authors’ Affiliations
References
- Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 1994, 11: 725–736.PubMedGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 1986, 3: 418–426.PubMedGoogle Scholar
- Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 1998, 148: 929–936.PubMed CentralPubMedGoogle Scholar
- Nei M, Kumar S: Molecular evolution and phylogenetics. New York; Oxford: Oxford University Press; 2000.Google Scholar
- Zhang J, Rosenberg HF, Nei M: Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA 1998, 95: 3708–3713. 10.1073/pnas.95.7.3708PubMed CentralView ArticlePubMedGoogle Scholar
- Suzuki Y, Gojobori T, Nei M: ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 2001, 17: 660–661. 10.1093/bioinformatics/17.7.660View ArticlePubMedGoogle Scholar
- Doron-Faigenboim A, Stern A, Mayrose I, Bacharach E, Pupko T: Selecton: a server for detecting evolutionary forces at a single amino-acid site. Bioinformatics 2005, 21: 2101–2103. 10.1093/bioinformatics/bti259View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673–4680. 10.1093/nar/22.22.4673PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 1998, 46: 409–418. 10.1007/PL00006320View ArticlePubMedGoogle Scholar
- Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, Tatusova TA, Rapp BA: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2000, 28: 10–14. 10.1093/nar/28.1.10PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic Acids Res 2004, 32: D138–141. 10.1093/nar/gkh121PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 1997, 13: 555–556.PubMedGoogle Scholar
- Dorus S, Vallender EJ, Evans PD, Anderson JR, Gilbert SL, Mahowald M, Wyckoff GJ, Malcom CM, Lahn BT: Accelerated evolution of nervous system genes in the origin of Homo sapiens. Cell 2004, 119: 1027–1040. 10.1016/j.cell.2004.11.040View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.