SNPAnalyzer 2.0: A web-based integrated workbench for linkage disequilibrium analysis and association analysis
© Yoo et al; licensee BioMed Central Ltd. 2008
Received: 23 February 2008
Accepted: 23 June 2008
Published: 23 June 2008
Since the completion of the HapMap project, huge numbers of individual genotypes have been generated from many kinds of laboratories. The efforts of finding or interpreting genetic association between disease and SNPs/haplotypes have been on-going widely. So, the necessity of the capability to analyze huge data and diverse interpretation of the results are growing rapidly.
We have developed an advanced tool to perform linkage disequilibrium analysis, and genetic association analysis between disease and SNPs/haplotypes in an integrated web interface. It comprises of four main analysis modules: (i) data import and preprocessing, (ii) haplotype estimation, (iii) LD blocking and (iv) association analysis. Hardy-Weinberg Equilibrium test is implemented for each SNPs in the data preprocessing. Haplotypes are reconstructed from unphased diploid genotype data, and linkage disequilibrium between pairwise SNPs is computed and represented by D', r2 and LOD score. Tagging SNPs are determined by using the square of Pearson's correlation coefficient (r2). If genotypes from two different sample groups are available, diverse genetic association analyses are implemented using additive, codominant, dominant and recessive models. Multiple verified algorithms and statistics are implemented in parallel for the reliability of the analysis.
SNPAnalyzer 2.0 performs linkage disequilibrium analysis and genetic association analysis in an integrated web interface using multiple verified algorithms and statistics. Diverse analysis methods, capability of handling huge data and visual comparison of analysis results are very comprehensive and easy-to-use.
Since the completion of the HapMap project, huge numbers of individual genotypes have been generated from many kinds of laboratories. The efforts of finding or interpreting genetic association between disease and SNPs/haplotypes have been on-going widely, and the necessity of the capability to analyze huge data and diverse interpretation of the result are growing rapidly. Recently developed software programs are well suited for constructing linkage disequilibrium blocks, estimating haplotypes or detecting genetic association between disease and SNPs [1–6]. However, some software programs have drawbacks such as long computation time for the association analysis , limited size of dataset [1, 2], inconvenient user interface [3–5] and limited number of genetic models or statistics for the association analysis . We have developed an advanced analysis software program, SNPAnalyzer 2.0, which performs sample-specific linkage disequilibrium analysis and implements genetic association analysis using multiple genetic models in an integrated web interface. It can handle hundreds of thousands of SNPs and thousands of samples in a rather manageable time as compared with other software programs.
The analysis engine was developed by C and interface by JAVA, and the operation of the software program is executed using JAVA applet after accessing through a web browser. Although the implementation of the software program is triggered by a web browser, any information about the user's data is not transmitted anywhere because all the analysis are performed locally using JAVA applet. Raw data and all the analyzed results are stored to the user's computer only. If genotypes from two different samples are available, sample-specific analysis and sample-merged analysis are simultaneously implemented in data preprocessing, haplotype estimation and LD blocking. For diverse interpretation of the genetic effects, one allelic or haplotype association test and three genotypic or diplotype association tests are possible. The free implementation of SNPAnalyzer 2.0 and free download of test dataset are available .
SNPAnalyzer 2.0 comprises of four main analysis modules. All the processes are sequentially implemented and results are displayed in comprehensive tables and graphs. The main features and functions are as follows.
2.1 Data import
2.2 Data preprocessing
Once the data is input, data quality check and preprocessing is automatically implemented to drop out erroneous SNPs such as monomorphic SNP. SNPs of which minor allele frequencies and missing genotype frequencies are below the specified threshold are also dropped out. Missing genotype can be replaced by heterozygous genotype. Hardy-Weinberg Equilibrium (HWE) test is sequentially implemented to each SNPs, and Bonferroni correction can be applied in the HWE test to prevent excluding SNPs by chance. Red colors in Figure 1 show missing genotypes. Allele frequencies, genotype frequencies, and the result of the HWE test are displayed in tables.
2.3 Haplotype estimation
2.4 Linkage disequilibrium (LD) blocking
2.5 Association analysis
In the association analysis with haplotypes, we applied a haplotype-specific test with one degree-of-freedom. Estimation of haplotype effects was not implemented because the current version handles only the haplotype frequencies previously reconstructed in the LD blocking analysis. Several algorithms for estimating haplotype effects have been developed by many researchers [13–16]. Software programs like THESIAS  and Haplo Stats  are freely available and widely used for the analysis of haplotype effects.
2.6 Data export
2.7 Accuracy measure
The numbers of individuals of each ethnic group and the numbers of SNPs used for redefining haplotypes
The accuracies of haplotype estimation produced by SNPAnalyzer 2.0
The computation time for association analysis
Number of SNPs
The limit of the analyzable dataset size depends on the random access memory (RAM) of user's computer. We checked that the association analysis using genotype data with over 100,000 SNPs and 2,000 samples was possible. All the test datasets are downloadable .
In the past work, we have developed a software program that calculates linkage disequilibrium between SNPs, reconstructs haplotypes and performs quantitative trait analysis . To meet the increasing demand for whole-genome association study, we have developed SNPAnalyzer 2.0 that can handle the genetic linkage disequilibrium analysis and the genetic association analysis between disease and SNPs/haplotypes in an integrated web interface. For the accuracy of the analysis, it implements several verified algorithms and statistics. The accuracy of the haplotype estimation was very high and the results of LD blocking were similar both by SNPAnalyzer 2.0 and Haploview program . Some mismatched structures of LD blocks are due to the different usage of the detailed parameters or algorithms applied by each software programs. For example, Haploview program used an accelerated EM algorithm. However, SNPAnalyzer 2.0 used both the EM-based algorithm and PL-EM algorithm for haplotype estimation. Comparison among control, case and merged samples is possible for linkage disequilibrium analysis using many LD indices. False positive control is implemented by multiple test correction and false discovery rate (FDR) in the association analysis. All the results are provided as tab delimited text files for user's convenience. We plan to implement more statistical analysis in future versions: stratification analysis, interaction analysis using multiple SNPs, haplotype effects analysis, and classification analysis for multiple samples.
SNPAnalyzer 2.0 performs linkage disequilibrium analysis and genetic association analysis in an integrated web interface. It implements multiple verified algorithms and statistics for the enhanced reliability of the analysis. Visual comparison and interpretation of the analysis result between two different sample groups are very comprehensive. The allelic or haplotype association and genotypic or diplotype association can be analyzed using multiple genetic models. Hundreds of thousands of SNPs and thousands of samples are analyzable in moderate time, and the analysis results are displayed in figures and tables for user's convenience.
Availability and requirements
Project name: SNPAnalyzer 2.0
Project homepage: http://snp.istech21.com/snpanalyzer/2.0/
Operating systems: Windows
Programming language: C and JAVA
Web application: Internet Explorer 6.0 or higher (Internet connection required for program installation)
License: free non-commercial research use license
Any restrictions to use by non-academics: none
This work was supported by grant M10529000013-06N2900-01310 from the Korea Science and Engineering Foundation (KOSEF), Republic of Korea.
- Sole X, Guino E, Valls J, Iniesta R, Moreno V: SNPStats: a web tool for the analysis of association studies. Bioinformatics 2006, 22(15):1928–1929. 10.1093/bioinformatics/btl268View ArticlePubMedGoogle Scholar
- Yoo J, Seo B, Kim Y: SNPAnalyzer: a web-based integrated workbench for single-nucleotide polymorphism analysis. Nucleic Acids Res 2005, (33 Web Server):W483–488. 10.1093/nar/gki428Google Scholar
- Browning BL, Browning SR: Efficient multilocus association mapping for whole genome association studies using localized haplotype clustering. Genet Epidemiol 2007, 31(5):365–375. 10.1002/gepi.20216View ArticlePubMedGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81(3):559–575. 10.1086/519795PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang K, Qin Z, Chen T, Liu JS, Waterman MS, Sun F: HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics 2005, 21(1):131–134. 10.1093/bioinformatics/bth482View ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005, 21(2):263–265. 10.1093/bioinformatics/bth457View ArticlePubMedGoogle Scholar
- SNPAnalyzer 2.0 homepage[http://snp.istech21.com/snpanalyzer/2.0/]
- Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995, 12(5):921–927.PubMedGoogle Scholar
- Niu T, Qin ZS, Xu X, Liu JS: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet 2002, 70(1):157–169. 10.1086/338446PubMed CentralView ArticlePubMedGoogle Scholar
- Devlin B, Risch N: A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 1995, 29(2):311–322. 10.1006/geno.1995.9003View ArticlePubMedGoogle Scholar
- Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science 2002, 296(5576):2225–2229. 10.1126/science.1069424View ArticlePubMedGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003, 100(16):9440–9445. 10.1073/pnas.1530509100PubMed CentralView ArticlePubMedGoogle Scholar
- Epstein MP, Satten GA: Inference on haplotype effects in case-control studies using unphased genotype data. Am J Hum Genet 2003, 73(6):1316–1329. 10.1086/380204PubMed CentralView ArticlePubMedGoogle Scholar
- Purcell S, Daly MJ, Sham PC: WHAP: haplotype-based association analysis. Bioinformatics 2007, 23(2):255–256. 10.1093/bioinformatics/btl580View ArticlePubMedGoogle Scholar
- Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 2002, 70(2):425–434. 10.1086/338688PubMed CentralView ArticlePubMedGoogle Scholar
- Tregouet DA, Escolano S, Tiret L, Mallet A, Golmard JL: A new algorithm for haplotype-based association analysis: the Stochastic-EM algorithm. Ann Hum Genet 2004, 68(Pt 2):165–177. 10.1046/j.1529-8817.2003.00085.xView ArticlePubMedGoogle Scholar
- THESIAS software program[http://www.genecanvas.org]
- Haplo Stats software program[http://mayoresearch.mayo.edu/mayo/research/schaid_lab/software.cfm]
- Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 2001, 68(4):978–989. 10.1086/319501PubMed CentralView ArticlePubMedGoogle Scholar
- dbSNP database at NCBI[http://www.ncbi.nlm.nih.gov/SNP/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.