MethylPCA: a toolkit to control for confounders in methylome-wide association studies
© Chen et al.; licensee BioMed Central Ltd. 2013
Received: 10 October 2012
Accepted: 20 February 2013
Published: 2 March 2013
In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome.
We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders.
MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS.
KeywordsPrincipal component analysis Methylome-wide association studies Eigen-decomposition Association test MBD-seq
Methylation studies are a promising complement to genetic studies of DNA sequence variation. First, as methylation is typically associated with transcriptional repression [1, 2], it may account for additional variation in disease susceptibility. Second, methylation studies can shed a unique light on clinical phenomena  such as sex differences [4, 5], genotype environment interactions , and disease course over time . Finally, methylation sites are potential new drug targets  and have good properties from a translational perspective such as being stable and enabling cost-effective assays in biosamples that are relatively easy to collect .
Because detailed prior biological knowledge is lacking, it will be critical to perform methylome-wide association studies (MWAS) to detect disease relevant sites [10, 11]. The most comprehensive approach uses next-generation sequencing (NGS) to interrogate DNA methylation on a genome-wide basis after bisulfite conversion of unmethylated cytosines. The single base resolution afforded by bisulfite sequencing is attractive but currently this approach is not economically feasible for the large sample sizes required for MWAS . In a cost-effective alternative the genome is fragmented and the methylated fragments are bound to antibodies  (e.g. MeDIP) or other proteins  (e.g. MBD-seq) with high affinity for methylated DNA. The unmethylated genomic fraction is washed away, and then only the methylation-enriched portion is sequenced. A final option involves arrays. Examples are the commercially available Infinium system from Illumina  that interrogates >450,000 loci or genome-wide tiling arrays and the 45 million probe array set from Affymetrix  that offers more comprehensive coverage of the methylome.
In addition to technical factors associated with processing samples, in MWAS there are many possible differences between cases and controls that may affect the methylome. Examples include differences in life style, diet, and medication use. As these confounding variables correlate with both the dependent (case-control status) and independent variable (methylation status) they will cause spurious associations that are not of direct substantive interest because they are unrelated to disease processes. Controlling for confounders in MWAS is critical to avoid a flood of false positive findings. If measured, such variables can be regressed out. However, the list of potential confounders is long, only a subset of these variables will have been measured, and many confounders may simply be unknown. Statistical methods that first capture the major sources of variation in the methylation data, and then regress out these components when performing the association analyses may provide an effective solution. However, because of the ultra high dimension of methylation data (e.g. the methylation of DNA cytosine residues at the carbon 5 position (5meC) occurs in the vast majority of cases at CpG sites of which already 27 million exist in the human reference genome), standard statistical packages or existing software for the analysis of large scale methylation data cannot be used [17-20]. To address the computational challenges we developed a toolkit called MethylPCA that is specifically designed to control for confounders in MWAS.
MethylPCA uses principal component analysis (PCA) to capture the major sources of variation in the methylation data. Although other options exist, PCA has the advantage of being well developed (e.g. algorithms exist that enable PCA on ultra-high dimensional data), is computationally efficient, and has already been successfully applied to high-dimensional biological data . EIGENSOFT [21, 22] also performs PCA. However, 1) MethylPCA provides an adaptive procedure designed to combine methylation data of neighboring sites into larger blocks prior to PCA; 2) Even after this data reduction step, calculation of the input matrix for the PCA would be prohibitive in terms of memory and CPU time for large sample sizes. MethylPCA allows partitioning the data into a user-specified number of sets to compute sub-matrices in parallel on a cluster and then assemble those to obtain the complete input matrix; 3) To enable a complete and flexible data analysis pipeline, MethylPCA provides options to perform PCA based on the covariance matrix and/or correlation matrix and includes an association testing procedure where covariates such as the calculated principal component scores (PCs) can be regressed out; 4) EIGENSOFT is designed to process categorical SNP data while our software can work on the quantitative methylation data.
Creating blocks. This procedure adaptively combines inter-correlated methylation data from adjacent sites.
PCA. It performs PCA on the methylation data and outputs the calculated PC scores, eigenvalues and loadings.
Association test. It performs association tests using multiple linear regression with optional supplied covariates (e.g., age, gender) and the PC scores calculated from the PCA procedure. It outputs the test statistics and p-values, as well as a QQ plot.
A user-friendly interface is provided in the form of a parameter file that controls which and how procedures are performed (see Additional file 1 for detailed description of software). For example, the above three procedures can be performed sequentially or individually by putting the parameters corresponding to the procedures in the parameter file. Each procedure has multiple parameters to be set in the parameter file in order to run it properly. The computational and I/O intensive parts of MethylPCA are implemented in C++ and the remainder in the R language.
In MWAS correlations often exist between adjacent sites. Rather than using a sliding window of arbitrary length, MethylPCA uses an adaptive algorithm that combines methylation data based on the observed inter-correlations. A benefit of creating “blocks” is that the data reduction speeds up subsequent analyses, e.g. the PCA procedure. The use of blocks may also prevent that the results of PCA are dominated by a limited number of regions containing highly correlated sites as well as improve the signal to noise ratio because a sum of substantially inter-correlated measurements is known to be more reliable than the individual measurements separately . Because there may be regions in a chromosome where there is differential methylation in just one CpG site, sites that are uncorrelated with neighboring sites are also kept by forming “blocks” that consist of a single CpG sites only.
Correlations between sites can occur for different reasons. For example in MBD-seq neighboring CpGs will be highly correlated because they are largely covered by the same DNA fragments. Correlations can also occur because of biological phenomenon . To account for these different causes, MethylPCA allows creating blocks in two stages. The first stage combines the sites that are largely covered by the same fragments to form the level 1 block data. Next the level 1 block data is combined to capture the “biological” correlations to form the level 2 block data.
Sometimes excluding some sites in the analysis is useful, e.g. those sites with low coverage or that are in repeats. There is an option to provide files that specify which sites are included or excluded. The computing time for creating blocks is approximately proportional to n × p, where n is the number of subjects and p is the number of sites. Because a block merged from multiple sites is processed as a single unit in the following analysis, the word “site” in the following text may either refer to a single CpG site, or a block containing multiple neighboring CpG sites.
Three parameters control the block creation. The first is a threshold for the average correlations inside a block denoted by t 1 . The second is a threshold denoted by n t for the number of new sites added to the block that have a mean correlation with sites already in the block below a third threshold labeled t 2 . The merging process of a block stops if 1) the average correlations in the block is below t 1 or n t new sites are merged having correlations with sites already in the block below t 2 . The output block data uses the mean of all methylation values inside the block to represent each block and stores the related block information such as the beginning of the block, the end of the block and the average correlation within the block in a separate file.
Principal component analysis (PCA) when p>> n
PCA is typically performed on the p × p sample covariance matrix , where X is the n × p data matrix, n the number of subjects and p the number of methylation sites. When p is much larger than n, direct eigen-decomposition of C is no longer computationally feasible. However, we can obtain the same PCA results through eigen-decomposition of the much smaller n × n matrix , sometimes called principal coordinate analysis .
Therefore, the loadings v i can also be calculated from u i and α i . EIGENSOFT  employs a similar method to calculate principle components.
In MethylPCA, we compute and store the n × n matrix X, and the PCs are then calculated from the eigenvectors of (see Equation 2). The loadings are calculated based on the original data matrix X and the PCs or the eigenvectors of M (see Equation 3). The main computing challenge (both in memory and time) is the calculation of the matrix XX T that becomes prohibitive for large samples using existing software. To handle this challenge, MethlPCA can calculate user-specified chunks of the matrix XX T after which the full matrix XX T is assembled. Because each computing job only works on a specified number of samples loaded into the memory for calculation instead of loading the entire methylation data, this solves the problem of processing large data sets with limited memory. If a cluster is available, each computing job can be executed in parallel to speed up the process. Statistics that are used repeatedly (e.g. means of all sites in the entire sample) are calculated only once and stored to further increase efficiency. PCA based on the correlation matrix is sometimes preferred because PCA on a covariance matrix can be dominated by variables with large variances . MethylPCA provides options to perform PCA based on the correlation matrix or covariance matrix. Even though it is possible to calculate the loadings for each PC, usually we are only interested in the loadings corresponding to the top PC scores. To reduce the computing time, users can specify the number of top principal components for which loadings will be calculated. The computing time of PCs is proportional to n 2 × p plus the time reading the data into the memory. The computing time for one loading is approximately proportional to n × p.
Because covariates that have been measured can be regressed out directly in the MWAS, the motivation for using PCA is typically to control for the unmeasured confounders. To better capture unmeasured confounders and include those together with the measured covariates in the MWAS, it is possible to regress out measured covariates prior to performing the PCA. This could include, for example, technical factors associated with processing samples such as the quantity of genomic DNA starting material or sample batches. This option is implemented using the multiple regression functions from GNU Scientific Library (GSL) . The adjusted methylation data used in the PCA are the residuals after regressing out the measured covariates.
To enable a complete data analysis pipeline, we also added the possibility to perform MWAS in MethylPCA through multiple linear regression analysis using functions in GSL . It tests the association between the phenotype and each methylation site while adjusting for covariates. Users can choose which covariates will be included in the association tests, such as age, gender and PCs. The test statistic and the p-values for each site are calculated and stored. Once all test statistics are generated, the genomic control inflation factor lambda is calculated, which is defined as the observed median test statistic value divided by the expected median of a chi-square distribution with 1 degree of freedom . Under the null hypothesis that there is no effect for any site, lambda is close to 1. Finally a QQ (quantile-quantile) plot is produced based on the p-values and the calculated lambda is also displayed. The association tests for different chromosomes can be computed in parallel to decrease CPU time.
Support for both a single computer and a cluster
An option is provided in the parameter file that controls whether to submit the computing jobs to a cluster or run it sequentially on a single computer. After analyzing the parameter file, all computing jobs will be arranged. Each computing job is written as a line of an executable command with corresponding parameters and is stored into batch files. For example, the block creating procedure can be performed per chromosome, with each command line processing one chromosome in the corresponding batch file.
where p 1 and p 2 control the correlation between the k th factor and the case-control status y i (see Additional file 2). The inclusion of the case-control status in the above models makes sure that there are correlations between the outcome and the confounding factors.
cases and 500 controls were simulated in each data set with 5 confounding factors. We did six simulations in which different combinations of continuous and dichotomous confounding factors were used (see Table 1). We set b k = 4 and σ = 10 for continuous factors so that the correlation between the continuous confounding factor and the case-control status was about 0.2. We set p 1 = 0.6 and p 2 = 0.4 so that the correlation between the dichotomous confounding factor and the case-control status was also 0.2. a j were uniformly sampled from 0 to 100. δ was set to 40. We applied MethylPCA on the simulated data sets and extracted the top PCs after examining the Scree plot, i.e., plot of eigenvalues. For comparison, we performed association tests with and without the top PCs.
The comparison of the genomic control inflation factor of association tests with and without top PCs
0c + 5d*
1c + 4d
2c + 3d
3c + 2d
4c + 1d
5c + 0d
MBD-seq MWAS in 1,500 samples
This study includes 750 schizophrenia cases and 750 controls, as well as 75 technical duplicates. For a detailed description of this study and the data analysis pipeline see . In summary, this study is part of a large ongoing project entitled “A Large-Scale Schizophrenia Association Study in Sweden”. The project is supported by grants from NIMH and the Stanley Foundation and aims at improving our understanding of the etiology of schizophrenia and bipolar disorder plus their clinical and epidemiological correlations using high dimensional biological investigations and proper analysis. For details on the project see [30-32]. Cases with schizophrenia were identified via the Hospital Discharge Register. Population controls, who had never received a discharge diagnosis of schizophrenia, were selected at random from the national population registers and then group matched to the cases in terms of age, gender and county of residence. All procedures were approved by ethical committees in Sweden and in the US, and all subjects provided written informed consent (or legal guardian consent and subject assent). DNA was extracted from peripheral donated blood at the local medical facilities of the participants.
We obtained, on average, 68.0 million 50 bp reads per sample of which 70.8% could be mapped. After several QC steps we estimated the methylation status of about 27 million autosomal CpGs (all CpGs in the reference genome hg19/ GRCh37). We eliminated 10,483,766 CpGs (mostly located in repeats) showing alignment problems according to an in silico alignment experiment plus another 2,735,400 sites showing low read coverage.
MethylPCA performed data reduction in two stages. The first stage consists of combining CpG sites that are very highly correlated (r >0.9) because they are largely covered by the same 100-200 bp fragments. In the second stage, we combine the “blocks” from the first stage that are highly correlated (r > 0.6) typically due to biological processes.
MethylPCA could combine the remaining 15,558,200 CpGs after QC into 8,822,240 stage (level) 1 blocks, which in turn could be combined into 5,074,538 stage (level) 2 blocks. This represented a 67.3% data reduction. The stage 1 blocks were small (15.6 bp) with high inter-correlations (mean r = 0.95) indicating that they involved CpGs in close proximity that were largely covered by the same 100-200 bp fragments. The stage 2 blocks comprised an average of 3.1 CpGs with the largest blocks consisting of >18 CpGs and spanning over 500 bp. This suggested regions seemed to be similarly methylated due to biological processes.
The computations were performed on a cluster. Processing the chromosomes in parallel, it took about 14 hours to create the stage 1 blocks and 4 hours to create the stage 2 blocks. We regressed out technical factors prior to the PCA, which took about 2 hours. The PCA was performed by portioning the similarity matrix into chunks of 350 subjects. Using 16 processors of 16 Gb each, the PCA took about 26 hours. The MWAS association test took about 2 hours.
A wide variety of other existing methods can in principle be used to analyze MWAS data. For example, surrogate variable analysis (SVR) developed in the context of microarray experiments  can be used to identify and remove unknown latent noise, such as batch effects. However, direct application of these packages , may not be practical because of the ultra high dimension of methylation data. Instead, efficient analysis of MWAS data is likely to require tailored computational tools that employs parallel computing, uses a low level programming language for CPU intensive calculations, stores intermediate results to avoid computing the same statistics multiple times or storing results in memory, and uses algorithms specifically designed for high dimensional data.
Our empirical data showed that the risk of false positives in MWAS is very high, likely because of the many differences between cases and controls (e.g. life style, diet, medication use) that affect the methylome. This stressed the need of controlling confounders for which the package MethylPCA was designed. It seems reasonable to assume that if confounders have such pervasive effects on the methylome, the pathogenic processes that cause the disease may also involve many methylation sites. A careful inspection of the PCs (e.g. using the loadings generated by MethylPCA) is important to prevent that disease processes are being regressed out in the MWAS.
As the input methylation data are quantitative values for a set of genomic locations, MethylPCA can be applied to methylation data generated by any assay. Furthermore, because the PCA components can be run independently, in principle it can also be applied to other ultra high dimensional data, such as genome-wide sequence data as long as the specific input format is followed.
Controlling for confounders in MWAS presents a major computational challenge because of the very large number of possible methylation sites. In this article we introduced MethylPCA that is specifically designed to handle this problem. We tested and demonstrated MethylPCA using simulations and empirical MWAS data from 1,500 subjects. Results showed that MethylPCA effectively controlled for possible confounders.
Availability and requirements
Project name: MethylPCA
Project home page: http://www.biomarker.vcu.edu
Operating systems: LINUX, MAC OS X, and MICROSOFT WINDOWS
Programming language: C++ and R
Other requirements: None
License: GNU GPL
Any restrictions to use by nonacademics: None
Principal component analysis
Methyl-CpG binding domain protein sequencing
Methylome-wide association studies.
Library construction and next generation sequencing of the empirical data sample was done by EdgeBio, Gaithersburg, MD.
This work was supported by the National Human Genome Research Institute (Grants R01 HG004240 and HG004240-02S1), the National Institute of Mental Health (Grant RC2 MH089996), and the National Institute of General Medical Sciences (grant GM073766). The research was also partly supported by the National Center for Advancing Translational Sciences (grant UL1TR000058).
- Petronis A: Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 2010,465(7299):721-727. 10.1038/nature09230View ArticlePubMed
- Reik W, Dean W, Walter J: Epigenetic reprogramming in mammalian development. Science 2001,293(5532):1089-1093. 10.1126/science.1063443View ArticlePubMed
- Waterland RA, Jirtle RL: Early nutrition, epigenetic changes at transposons and imprinted genes, and enhanced susceptibility to adult chronic diseases. Nutrition 2004,20(1):63-68. 10.1016/j.nut.2003.09.011View ArticlePubMed
- Jost JP, Saluz HP, Pawlak A: Estradiol down regulates the binding activity of an avian vitellogenin gene repressor (MDBP-2) and triggers a gradual demethylation of the mCpG pair of its DNA binding site. Nucleic Acids Res 1991,19(20):5771-5775. 10.1093/nar/19.20.5771PubMed CentralView ArticlePubMed
- Yokomori N, Moore R, Negishi M: Sexually dimorphic DNA demethylation in the promoter of the Slp (sex-limited protein) gene in mouse liver. Proc Natl Acad Sci USA 1995,92(5):1302-1306. 10.1073/pnas.92.5.1302PubMed CentralView ArticlePubMed
- Sutherland JE, Costa M: Epigenetics and the environment. Ann NY Acad Sci 2003, 983: 151-160. 10.1111/j.1749-6632.2003.tb05970.xView ArticlePubMed
- Cooney CA: Are somatic cells inherently deficient in methylation metabolism? A proposed mechanism for DNA methylation loss, senescence and aging. Growth Dev Aging 1993,57(4):261-273.PubMed
- Fuks F, Burgers WA, Brehm A, Hughes-Davies L, Kouzarides T: DNA methyltransferase Dnmt1 associates with histone deacetylase activity. Nat Genet 2000,24(1):88-91. 10.1038/71750View ArticlePubMed
- Laird PW: The power and the promise of DNA methylation markers. Nat Rev Cancer 2003, 3: 253-266.View ArticlePubMed
- Beck S, Rakyan VK: The methylome: approaches for global DNA methylation profiling. Trends Genet 2008,24(5):231-237. 10.1016/j.tig.2008.01.006View ArticlePubMed
- Laird PW: Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet 2010,11(3):191-203.View ArticlePubMed
- Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011,12(8):529-541. 10.1038/nrg3000PubMed CentralView ArticlePubMed
- Mohn F, Weber M, Schubeler D, Roloff TC: Methylated DNA immunoprecipitation (MeDIP). Meth Mol Biol 2009, 507: 55-64. 10.1007/978-1-59745-522-0_5View Article
- Serre D, Lee BH, Ting AH: MBD-isolated genome sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res 2010,38(2):391-399. 10.1093/nar/gkp992PubMed CentralView ArticlePubMed
- Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R, Gunderson KL: Genome-wide DNA methylation profiling using Infinium(R) assay. Epigenomics 2009,1(1):177-200. 10.2217/epi.09.14View ArticlePubMed
- Aberg K, Khachane AN, Rudolf G, Nerella S, Fugman DA, Tischfield JA, van den Oord EJ: Methylome-wide comparison of human genomic DNA extracted from whole blood and from EBV-transformed lymphocyte cell lines. Eur J Hum Genet 2012,20(9):953-955. 10.1038/ejhg.2012.33PubMed CentralView ArticlePubMed
- Trimarchi MP, Murphy M, Frankhouser D, Rodriguez BA, Curfman J, Marcucci G, Yan P, Bundschuh R: Enrichment-based DNA methylation analysis using next-generation sequencing: sample exclusion, estimating changes in global methylation, and the contribution of replicate lanes. BMC Genom 2012,13(Suppl 8):S6.
- Chavez L, Jozefczuk J, Grimm C, Dietrich J, Timmermann B, Lehrach H, Herwig R, Adjaye J: Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. Genome Res 2010,20(10):1441-1450. 10.1101/gr.110114.110PubMed CentralView ArticlePubMed
- Lan X, Adams C, Landers M, Dudas M, Krissinger D, Marnellos G, Bonneville R, Xu M, Wang J, Huang TH: High resolution detection and analysis of CpG dinucleotides methylation using MBD-Seq technology. PLoS One 2011,6(7):e22226. 10.1371/journal.pone.0022226PubMed CentralView ArticlePubMed
- Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero J, Tomazou EM: A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol 2008,26(7):779-785. 10.1038/nbt1414PubMed CentralView ArticlePubMed
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006,38(8):904-909. 10.1038/ng1847View ArticlePubMed
- Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006,2(12):e190. 10.1371/journal.pgen.0020190PubMed CentralView ArticlePubMed
- Bock C, Walter J, Paulsen M, Lengauer T: Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res 2008,36(10):e55. 10.1093/nar/gkn122PubMed CentralView ArticlePubMed
- Bollen KA: Structural equations with latent variables. New York: Wiley; 1989.View Article
- Gower JC: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 1966, 53: 325-338.View Article
- Rencher A: Methods of Multivariate Analysis. 2nd edition. New York, NY: John Wiley & Sons, Inc; 2002.View Article
- Galassi M, Davies J, Theiler J, Gough B, Jungman G, Alken P, Booth M, Rossi F: GNU Scientific Library Reference Manual. 3rd edition. Godalming, United Kingdom: Network Theory Ltd; 2009.
- Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999, 55: 997-1004. 10.1111/j.0006-341X.1999.00997.xView ArticlePubMed
- Aberg KA, McClay JL, Nerella S, Xie LY, Clark SL, Hudson AD, Bukszar J, Adkins D, Consortium SS, Hultman CM: MBD-seq as a cost-effective approach for methylome-wide association studies: demonstration in 1500 case-control samples. Epigenomics 2012,4(6):605-621. 10.2217/epi.12.59PubMed CentralView ArticlePubMed
- Bergen SE, O'Dushlaine CT, Ripke S, Lee PH, Ruderfer D, Akterin S, Moran JL, Chambert KD, Handsaker RE, Backlund L: Genome-wide association study in a Swedish population yields support for greater CNV and MHC involvement in schizophrenia compared to bipolar disorder. Mol Psychiatr In press In press
- International Schizophrenia Consortium: Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009, 460: 748-752.PubMed Central
- Schizophrenia Psychiatric Genome-Wide Association Study Consortium: Genome-wide association study of schizophrenia identifies five novel loci. Nat Genet 2011, 43: 969-976. 10.1038/ng.940View Article
- Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD: The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012,28(6):882-883. 10.1093/bioinformatics/bts034PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.