Functionally informative tag SNP selection using a pareto-optimal approach: playing the game of life
© Lee et al; licensee BioMed Central Ltd. 2009
Published: 19 October 2009
Major interest in current epidemiology, medicine, and pharmarco-genomics is focused on identifying single nucleotide polymorphisms (SNPs) that underlie the etiology of common and complex diseases. However, due to the tremendous number of SNPs on the human genome, there is a clear need to prioritize SNPs to expedite genotyping and analysis overhead associated with disease-gene studies. Tag SNP selection and Functional SNP selection are the two main approaches for addressing the SNP selection problem. However, little was done so far to effectively combine these distinct and possibly competing approaches. Here we present a new multi-objective optimization framework for identifying SNPs that are both informative tagging and have functional significance.
Our SNP selection algorithm is based on the notion of Pareto optimality , which has been extensively used for addressing multi-objective optimization problems in game theory, economics and engineering. We describe the details of its three main steps as follows.
STEP 1. Computing Linkage Disequilibrium of SNPs
To efficiently compute the score of tagging informativeness, we calculate the pair-wise LD between all pairs of candidate SNPs in advance. As a measure of pair-wise LD, following Carlson et al. , we currently use the coefficient of determination, r2.
STEP 2. Retrieving Functional Significance of SNPs
We currently use the FS score of SNPs obtained from F-SNP , which assesses the deleterious functional effects of SNPs, using 16 bioinformatics tools, with respect to protein translation, splicing regulation, transcriptional regulation, and post-translational modification.
STEP 3. Selecting Functionally Informative Tag SNPs
- Kirman AP: Pareto as an economist. The New Palgrave: A Dictionary of Economics 1987, 5: 804–808.Google Scholar
- Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 2004, 74(1):106–120. 10.1086/381000PubMed CentralView ArticlePubMedGoogle Scholar
- Lee P, Shatkay H: F-SNP: computationally predicted functional SNPs for disease association studies. Nucleic Acids Res 2008, ( 36 Database issue):D820-D824.Google Scholar
- Zhu Y, Hoffman A, Wu X, Zhang H, Zhang Y, Leaderer D, Zheng T: Correlating observed odds ratios from lung cancer case-control studies to SNP functional scores predicted by bioinformatics tools. Mutation Research 2008, 639: 80–88.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd.