Identification of pathogen genomic variants through an integrated pipeline
© Manary et al.; licensee BioMed Central Ltd. 2014
Received: 9 December 2013
Accepted: 6 February 2014
Published: 3 March 2014
Whole-genome sequencing represents a powerful experimental tool for pathogen research. We present methods for the analysis of small eukaryotic genomes, including a streamlined system (called Platypus) for finding single nucleotide and copy number variants as well as recombination events.
We have validated our pipeline using four sets of Plasmodium falciparum drug resistant data containing 26 clones from 3D7 and Dd2 background strains, identifying an average of 11 single nucleotide variants per clone. We also identify 8 copy number variants with contributions to resistance, and report for the first time that all analyzed amplification events are in tandem.
The Platypus pipeline provides malaria researchers with a powerful tool to analyze short read sequencing data. It provides an accurate way to detect SNVs using known software packages, and a novel methodology for detection of CNVs, though it does not currently support detection of small indels. We have validated that the pipeline detects known SNVs in a variety of samples while filtering out spurious data. We bundle the methods into a freely available package.
KeywordsMalaria Sequencing Genome Polymorphism Variant
The detection of single nucleotide and copy number variants (SNVs and CNVs) conferring resistance to drug and vaccine candidates provides researchers with a powerful tool to choose the best combination of agents to treat infectious diseases such as malaria in specific regions, to study pathogen population dynamics and transmission, as well as to engineer new treatments that cannot be easily evaded. In addition, in organisms in which genetic complementation or backcrosses may be difficult or time consuming, whole genome sequencing (WGS) offers an opportunity to determine if second-site mutations may have been inadvertently introduced after transfection or transformation, and contribute to an observed phenotype.
With the reduction in price and increased power of current short-read high-throughput WGS methods and the wide dispersal of a variety of sequencing platforms and accompanying support, full genome sequence data is now relatively easy to generate. Recent advances in the algorithmic and programmatic analysis of WGS data have led to a number of standards, especially the use of the Genome Analysis Toolkit (GATK) , being used in the analyses of human genomic data to detect SNVs and CNVs. However, there are opportunities for more comprehensive analyses of the genomes of simpler eukaryotes such as the ~23.5 Mb genome of Plasmodium falciparum, the apicomplexan parasite and etiological agent of human malaria, which has also served as a model for eukaryotic pathogen genomics since the completion and full assembly of its genome sequence in 2002 . Full genome sequencing at 30-40X coverage is now readily achieved [3–6]. Such coverage allows for the identification of recombination events, the description of SNVs in sequences other than in the exomes, and the detection of small structural variants, including short-length insertion or deletion events. P. falciparum is responsible for up to a million deaths annually , and although its haploid genome is worthy of investigation for this reason alone, it also serves as an ideal test system because heterozygous calls generally do not need to be considered in sequence analysis validation (although mixed infections are a real concern) and a fully assembled reference genome is available . Furthermore, the parasite can be sub-cloned and readily cultured in vitro within white-cell depleted, anucleated human erythrocytes , mitigating host DNA contamination.
Whole-genome sequencing statistics
# of genomes
Gene conferring resistance
Chloroquine, Mefloquine, Pyrimethamine
KAD707 and 458
, this study
Current genotyping programs are generally designed to be conservative and as a consequence, return a large number of false positive variant calls. These programs, including GATK  and the sequence/alignment map toolbox (SAMTools) , typically allow the user to set a number of stringency filters such as the quality of the read alignment or bias towards a specific strand, that can theoretically be used to separate false from true positives. However, the actual threshold values for each filter are not pre-determined, and as such, it is left to the researcher to decide how to best utilize each metric, creating barriers for the novice user. Thus, we set out to create a set of empirically-derived filters for Plasmodium WGS data that could be used as a reference point for future SNV analyses.
To identify a robust set of filtering parameters we began with a list of 15,145 known SNVs identified using traditional Sanger resequencing of Dd2 to 7X coverage  and deposited in PlasmoDB (http://plasmodb.org). These distinguish the multidrug-resistant P. falciparum laboratory Indochina strain, Dd2, from the African drug-sensitive reference strain, 3D7. We then compared a P. falciparum Dd2 strain WGS short-read sequence obtained in our lab to the P. falciparum reference (3D7 strain) sequence. Our Dd2 sequence was generated with 70 bp paired-end reads on an Illumina Genome Analyzer II to a mean of 31X coverage with 96.4% of bases being covered by 5 reads or more. We considered the 15,145 curated SNVs to be true positives. All other SNVs detected were considered false positives, although it is likely that some of the novel SNVs are indeed true genetic differences (genetic diversity, especially in the subtelomeric regions, is extremely high approaching 90% diversity in at least one base position between field samples) . We then worked to identify a set of filtering parameters, which would have the sensitivity to detect at least 90% of the known SNVs, while eliminating as many ‘novel’ SNVs as possible.
Optimized filtering parameters applied by Platypus
Filters tested and found to affect specificity and sensitivity
Alignment aggregate mapping quality
Depth of coverage
Strand bias Fisher’s exact test
Filters tested and found not to affect specificity and sensitivity
Count of nucleotide identity
Clipped read significance
Depth of coverage per allele
Quality by depth
Confidence of elimination of incorrect genotype
Root mean square of mapping quality
Reads with mapping quality of zero
Reads with a mapping quality of zero
We chose a population of 10,000 parameter combinations to run through 100 evolutionary iterations. The algorithm we implemented included a low crossover rate (0.5) and high mutation rate (0.1) as well as a tournament pattern parental determination strategy with a tournament size of 100, and with a guaranteed 10 elite children using the MatLab Global Optimization toolbox. These settings were dynamically determined to give consistency and robustness across a variety of sensitivity ranges. Iterating through a forced sensitivity level in 1% increments yielded a smooth progression along a similar combination of filtering parameters.
The list of 10,000 randomly chosen parameter combinations was assessed for both sensitivity and specificity. Each set of filtering parameters sorted the true positives into two categories (“called” or “not called”) and similarly sorted the false positives; these calls were then evaluated for accuracy. Filtering sets that provided high specificity for a given level of sensitivity were carried over to the next round. The filtering parameters were then varied slightly within all successful sets, and individual parameters swapped between sets. After 100 iterative cycles, the most successful sets of filters converged on a single result – a theoretical optimal filtering set. We then added a further set of criteria based on the quality of the sequencing reads. The final optimized set excluded all SNV calls that met any of the following criteria listed in Table 2.
CNVs contribute substantially to drug resistance in Plasmodium and other eukaryotic pathogens [18–21]. The current methods for calling CNVs in Plasmodium spp. WGS data, like most pathogenic eukaryotes, rely on smoothing the depth of coverage data (e.g. number of reads aligned to the reference) [22–24]. Smoothing is needed because sequencing depends on multiple stochastic processes and there can be great variability in the actual coverage over a given stretch of genomic DNA. Users are thus required to guess the appropriate smoothing parameters such as the number of base pairs to be averaged, meaning that the user already needs to know the approximate size of the CNV. Furthermore, it is known that there is also a non-stochastic bias in the depth of coverage due to the tendency of areas of high and low GC content to be sequenced less efficiently and this must also be accounted for, especially as P. falciparum is extremely AT-rich (81%). Because we found that the current algorithms produced a large number of false calls when applied to our WGS data, we sought to address this problem by developing our own CNV calling algorithm.
An example of the output of this algorithm is demonstrated in Figure 2C.
To save computational time, we applied the convolution theorem to take these operators in the Fourier space and as such, reduce all operations to point-wise multiplication. After each Weierstrass transform, edges are detected by the above formula. The total number of convolution iterations was set to be variable in the first in silico tests, ending only when no new edges had appeared in the last 10 iterations, but was eventually held constant at 5 because in practice no new edges appeared after the 2nd or 3rd iterations of the algorithm. We must treat the mitochondrial and apicoplast genome separately, as the depth of coverage of these is usually very different than the other Plasmodium chromosomes, even by an order of magnitude. The depth of coverage in each region (i.e. between each edge) is then compared to the sample mean, and those that are statistically higher or lower are assigned an amplification number based on their increase (or decrease) relative to the mean.
Recombination contributes substantially to the virulence of many eukaryotic pathogens such as P. falciparum and T. bruceii where genome encoded virulence factors are located in hyper-recombinogenic sections of the genome. In addition, such rearrangements could contribute to a phenotype if no causative SNV or clear dosage effect in a likely target is found. We thus sought to implement a program to find these recombination events.
Results and discussion
Total number of SNVs detected using Platypus compared to simple filtering
# Raw SNVs
# SNVs from Q30
# SNVs from Platypus
Amino acid change conferring resistance
Genome total-fold coverage
I398F, P990R, CNV
We note that the Platypus reduces the total number of SNVs from raw data by a factor of approximately 103-104 (Table 3). While we cannot comprehensively genotype the tens of thousands of SNVs called initially by GATK or SAMTools, we have verified in atovaquone, spiroindolone, and cladosporin resistant lines that 63 of the SNVs called by Platypus are true genetic variants, and none of the 52 sites from the atovaquone resistant samples were excluded erroneously.
We also see (Table 3) that a comparison to the Q30 metric  identifies the Platypus as having significant gains over this simpler metric, reducing the number of SNVs called by a factor of approximately 1.6x. The sites called by the Q30 metric and excluded by the Platypus constituted 48 of the 52 sites that were Sanger sequenced and subsequently discovered to be not true, validating their exclusion by the Platypus.
There is no standard set of filtering parameters to use with GATK, but we can compare to a set of published filter values for a comparable project . Using Bright et al. as a comparison point, we can adapt their filters into our current pipeline. Doing so yields a 91% sensitivity level with a specificity of 45%. We can see that these heuristically chosen values have a reasonable sensitivity threshold but do not hold up to empirically designed filters in terms of specificity.
The assessment of a false positive and false negative rate can of course never be perfected, but in all cases we have detected plausible drug resistance genes in all cases. Comparison with known values and with extensive Sanger sequencing data confirms our calls, and even indicates that these sets of filters may be too lenient – that we may be detecting nonexistent SNVs rather than missing true ones.
CNVs detected using WGS and genomic microarrays
Presumed relevant gene
Copy number (Seq.)
all 3D7 derived lines
all Dd2 derived lines
all Dd2 derived lines
Altogether Platypus identified all 8 unique CNVs that were known to exist in our strains. Our algorithm identified the large ~100 kb CNV surrounding the P. falciparum multidrug resistance protein-1 gene (pfmdr1, PF3D7_0523000) in the 13 Dd2 derived strains [20, 24] and the 5 kb GTP cyclohydrolase amplifications in 13 Dd2 (pfgch, PF3D7_1224000) derived strains as well as the smaller 1.6 kb amplification GTP cyclohydrolase in 13 3D7 derived strains . We were also able to identify several independent larger amplifications that included lysyl tRNA synthetase (pfkrs1, PF3D7_1350100) in 3 strains that are resistant to cladosporin (a drug which targets lysyl-tRNA synthetase) , and an amplification on chromosome 1 in the EvoR5 strain that was grown in the presence of atovaquone, both confirmed by microarray as well . We were also able to detect an amplification on chromosome 12 (containing pfatp4) in 3 of the spiroindolone resistant samples [6, 11]. Although there was some ambiguity as to the number of copies (i.e. duplication or triplication), the Platypus also reported a SNV in one copy of pfatp4 but not in the other copies of the gene. Furthermore, we discovered no spurious or novel amplification or deletion events, i.e. CNVs that were not detected by tiling microarray.
We compared our CNV calling algorithm to BreakDancer, a similar program used to detect both copy number and recombination events using the default . Using a set of parameters equivalent to those published in Chen et al., (4 standard deviation threshold, Q>39, MQ>35) we see that BreakDancer is fully able to detect all CNVs present in our samples (those detected by microarray and/or whole genome sequencing), but it also identifies 73 other CNVs ranging from 434 bp to 11639 bp that we do not detect by any method. Indeed, qPCR amplification of these speculated regions indicates no change in copy number in any of these regions not detected by other high throughout methods.
A problem with using WGS is that it may be inaccessible to laboratories that are not strong in bioinformatics. To address this issue we integrated these modules into a program that we call Platypus (Figure 3). The pipeline integrates a number of other software programs, and these are referenced in full in this manuscript and in the software documentation. Platypus takes as input either unaligned FastA/FastQ sequencing data, or aligned data in the BAM format. SNVs, CNVs, and potential recombination events are output as annotated text files which can be cross-referenced with PlasmoDB or similar databases.
The Platypus pipeline provides malaria researchers with a powerful tool to analyze short read sequencing data. It provides an accurate way to detect SNVs using known software packages, and a novel methodology for detection of CNVs, though it does not currently support detection of small indels. We have validated that the pipeline detects known SNVs in a variety of samples while filtering out spurious data. We have also tested it against both computational samples and actual data with known CNVs (both deletions and amplifications as verified by microarray) and it can detect the size and boundaries of these CNVs with a high degree of accuracy. The success of the Platypus software in both detecting real genetic variants and avoiding the reporting of false positives over a number of parasite lines can be attributed to its basis on first principles. The SNV detection was specifically designed only to use filters that accurately segregated true and false positives, and the robustness of this approach is evident, as there is a completely smooth transition between sensitivity/specificity levels when varying over the ideal filter set. The CNV detection was based on the fundamental theorem of digital signal processing, and indeed the assumptions of this field applies directly to the signals coming off a next-generation whole genome sequencer, complete with random and systematic biases. This streamlined package offers an initial starting point for the field to analyze and report these data in a consistent manner.
Availability and requirements
The program is platform independent and can be run on ordinary desktop computers: In our case all analysis and computer programming was done using Mac OSX 10.7.3 on a Mac Pro with 12 multi-threaded processors on 2 cores and 32Gb of 1066 MHz DDR2 RAM. Altogether 24Gb of RAM was made available to Java while the Platypus was running. We have made Platypus freely available as an open-source package at <http://sourceforge.net/projects/platypusmga/>.
Copy number variant
Fast (all) format
Fast (all) quality format
Genome analysis toolkit
Mega- base pair
Pathogen lovers automated type uncovering software
Sequence alignment map
Single nucleotide variant
- spp :
Whole genome sequencing.
The authors would like to acknowledge Stephan Meister and Shinji Okitsu for advice on the manuscript. This work was supported by 7 R01 AI090141-02 and by the Bill and Melinda Gates Foundation (OPP1054480) and MJM was supported by funding from the University of California San Diego Medical Scientist Training Program, the University of California San Diego Genetics Training Grant, and fellowship from the Hertz Foundation.
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20: 1297-1303. 10.1101/gr.107524.110.View ArticlePubMed CentralPubMedGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, et al: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419: 498-511. 10.1038/nature01097.View ArticlePubMedGoogle Scholar
- Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, O’Brien J, Djimde A, Doumbo O, Zongo I: Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012, 487 (7407): 375-379. 10.1038/nature11174.View ArticlePubMed CentralPubMedGoogle Scholar
- Miotto O, Almagro-Garcia J, Manske M, Macinnis B, Campino S, Rockett KA, Amaratunga C, Lim P, Suon S, Sreng S, et al: Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet. 2013, 45 (6): 648-655. 10.1038/ng.2624.View ArticlePubMedGoogle Scholar
- Samarakoon U, Regier A, Tan A, Desany BA, Collins B, Tan JC, Emrich SJ, Ferdig MT: High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum. BMC Genomics. 2011, 12: 116-10.1186/1471-2164-12-116.View ArticlePubMed CentralPubMedGoogle Scholar
- Bopp SER, Manary MJ, Bright AT, Johnston GL, Dharia NV, Luna FL, McCormack S, Plouffe D, McNamara CW, Walker JR, et al: Mitotic evolution of Plasmodium falciparum shows a stable core Genome but recombination in antigen families. PLoS Genet. 2013, 9: e1003293-10.1371/journal.pgen.1003293.View ArticlePubMed CentralPubMedGoogle Scholar
- Murray CJ, Rosenfeld LC, Lim SS, Andrews KG, Foreman KJ, Haring D, Fullman N, Naghavi M, Lozano R, Lopez AD: Global malaria mortality between 1980 and 2010: a systematic analysis. Lancet. 2012, 379: 413-431. 10.1016/S0140-6736(12)60034-8.View ArticlePubMedGoogle Scholar
- Trager W, Jensen JB: Human malaria parasites in continuous culture. Science. 1976, 193: 673-675. 10.1126/science.781840.View ArticlePubMedGoogle Scholar
- Benjamini Y, Speed T: Estimation and correction for GC-content bias in high throughput sequencing. 2011, Berkeley, CA USA: Tech RepGoogle Scholar
- Hoepfner D, McNamara CW, Lim CS, Studer C, Riedl R, Aust T, McCormack SL, Plouffe DM, Meister S, Schuierer S, et al: Selective and specific inhibition of the plasmodium falciparum lysyl-tRNA synthetase by the fungal secondary metabolite cladosporin. Cell Host Microbe. 2012, 11: 654-663. 10.1016/j.chom.2012.04.015.View ArticlePubMed CentralPubMedGoogle Scholar
- Rottmann M, McNamara C, Yeung BK, Lee MC, Zou B, Russell B, Seitz P, Plouffe DM, Dharia NV, Tan J, et al: Spiroindolones, a potent compound class for the treatment of malaria. Science. 2010, 329: 1175-1180. 10.1126/science.1193225.View ArticlePubMed CentralPubMedGoogle Scholar
- Meister S, Plouffe DM, Kuhen KL, Bonamy GM, Wu T, Barnes SW, Bopp SE, Borboa R, Bright AT, Che J, et al: Imaging of Plasmodium liver stages to drive next-generation antimalarial drug discovery. Science. 2011, 334: 1372-1377. 10.1126/science.1211936.View ArticlePubMed CentralPubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Subgroup GPDP: The sequence alignment/map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.View ArticlePubMed CentralPubMedGoogle Scholar
- Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Daily JP, Sarr O, Ndiaye D, Ndir O, et al: A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007, 39: 113-119. 10.1038/ng1930.View ArticlePubMedGoogle Scholar
- Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS: PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 2009, 37: D539-D543. 10.1093/nar/gkn814.View ArticlePubMed CentralPubMedGoogle Scholar
- Barry AE, Leliwa-Sytek A, Tavul L, Imrie H, Migot-Nabias F, Brown SM, McVean GA, Day KP: Population genomics of the immune evasion (var) genes of Plasmodium falciparum. PLoS Pathog. 2007, 3: e34-10.1371/journal.ppat.0030034.View ArticlePubMed CentralPubMedGoogle Scholar
- Weise T: Global optimization algorithms–theory and application. 2009, La Jolla, CA USA: Self-PublishedGoogle Scholar
- Nair S, Miller B, Barends M, Jaidee A, Patel J, Mayxay M, Newton P, Nosten F, Ferdig MT, Anderson TJ: Adaptive copy number evolution in malaria parasites. PLoS Genet. 2008, 4: e1000243-10.1371/journal.pgen.1000243.View ArticlePubMed CentralPubMedGoogle Scholar
- Kidgell C, Volkman SK, Daily J, Borevitz JO, Plouffe D, Zhou Y, Johnson JR, Le Roch K, Sarr O, Ndir O, et al: A systematic map of genetic variation in Plasmodium falciparum. PLoS Pathog. 2006, 2: e57-10.1371/journal.ppat.0020057.View ArticlePubMed CentralPubMedGoogle Scholar
- Wilson CM, Serrano AE, Wasley A, Bogenschutz MP, Shankar AH, Wirth DF: Amplification of a gene related to mammalian mdr genes in drug-resistant Plasmodium falciparum. Science. 1989, 244: 1184-1186. 10.1126/science.2658061.View ArticlePubMedGoogle Scholar
- Singh A, Rosenthal PJ: Selection of cysteine protease inhibitor-resistant malaria parasites is accompanied by amplification of falcipain genes and alteration in inhibitor transport. J Biol Chem. 2004, 279: 35236-35241. 10.1074/jbc.M404235200.View ArticlePubMedGoogle Scholar
- Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M: Detecting copy number variation with mated short reads. Genome Res. 2010, 20: 1613-1622. 10.1101/gr.106344.110.View ArticlePubMed CentralPubMedGoogle Scholar
- Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009, 19: 1586-1592. 10.1101/gr.092981.109.View ArticlePubMed CentralPubMedGoogle Scholar
- Robinson T, Campino SG, Auburn S, Assefa SA, Polley SD, Manske M, MacInnis B, Rockett KA, Maslen GL, Sanders M: Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients. PLoS One. 2011, 6: e23204-10.1371/journal.pone.0023204.View ArticlePubMed CentralPubMedGoogle Scholar
- Smith SW: The Scientist and Engineer’s Guide to Digital Signal Processing. 1999. 2009, PO Box: California Technical Publishing,Google Scholar
- Ruby JG, Bellare P, DeRisi JL: PRICE: Software for the targeted assembly of components of (Meta) Genomic sequence data. G3 (Bethesda). 2013, 3: 865-880. 2013.View ArticleGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18: 1851-1858. 10.1101/gr.078212.108.View ArticlePubMed CentralPubMedGoogle Scholar
- Dharia NV, Sidhu AB, Cassera MB, Westenberger SJ, Bopp SE, Eastman RT, Plouffe D, Batalov S, Park DJ, Volkman SK, et al: Use of high-density tiling microarrays to identify mutations globally and elucidate mechanisms of drug resistance in Plasmodium falciparum. Genome Biol. 2009, 10: R21-10.1186/gb-2009-10-2-r21.View ArticlePubMed CentralPubMedGoogle Scholar
- Neafsey DE, Galinsky K, Jiang RH, Young L, Sykes SM, Saif S, Gujja S, Goldberg JM, Young S, Zeng Q, Chapman SB, Dash AP, Anvikar AR, Sutton PL, Birren BW, Escalante AA, Barnwell JW, Carlton JM: The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 2012, 44 (9): 1046-50. 10.1038/ng.2373.View ArticlePubMed CentralPubMedGoogle Scholar
- Dharia NV, Bright AT, Westenberger SJ, Barnes SW, Batalov S, Kuhen K, Borboa R, Federe GC, McClean CM, Vinetz JM, Neyra V, Llanos-Cuentas A, Barnwell JW, Walker JR, Winzeler EA: Whole-genome sequencing and microarray analysis of ex vivo Plasmodium vivax reveal selective pressure on putative drug resistance genes. Proc Natl Acad Sci U S A. 2010, 107 (46): 20045-50. 10.1073/pnas.1003776107.View ArticlePubMed CentralPubMedGoogle Scholar
- Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang Q, Locke DP: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods. 2009, 6: 677-681. 10.1038/nmeth.1363.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.