XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data
© Frank; licensee BioMed Central Ltd. 2008
Received: 10 July 2008
Accepted: 07 October 2008
Published: 07 October 2008
Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.
XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; [1–3]) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file.
XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.
The recent explosions in culture-independent studies of environmental DNA sequences ("metagenomics") and automated DNA sequencing capabilities have prompted the creation of numerous software applications designed to aid the analysis of an avalanche of sequence data. However, many of the commonly used, freely available applications require some facility with the UNIX/Linux operating system and/or specialized scripting languages to either manipulate files in batch or pipe data between applications. As automated DNA sequencing and sequence analysis has become commonplace in laboratories that do not specialize in bioinformatics, need has arisen for the development of powerful, yet simple-to-use, software.
XplorSeq was developed for rapid compilation and analysis of rDNA clone libraries, but should be applicable to any sequencing project (computer hardware may, however, limit the scale of projects). Although several commercial and non-commercial software packages implement some of the same basic functionalities as XplorSeq, the development of XplorSeq was motivated by the absence of GUI-based software designed specifically for high-throughput, batch analysis of rDNA sequences, such as arise from culture-independent metagenomic studies. Specifically, the extant software could not accommodate the phylogenetic orientation of analyses and sequence annotations that are most useful for metagenomics. In contrast, XplorSeq implements several domain-specific software tools (e.g. for state of the art phylogenetic tree inference, OTU clustering, biodiversity estimates) that are not available in general-purpose DNA analysis packages. Many published studies, from a variety of laboratories engaged in metagenomics, have used XplorSeq, and thereby established its stability, ease-of-use, and capabilities [4–29]. The software is freely available for non-commercial use at http://vent.colorado.edu/phyloware.
XplorSeq is written in Objective-C using the Cocoa application framework (Apple Inc.). Releases are compiled for the OS X operating system (current versions require OS 10.4.x or 10.5.x) as universal binaries, which run natively on Macintosh computers with Intel or PowerPC microprocessors. Similar to Cocoa, the architecture of XplorSeq is based on the Model-View-Controller (MVC) design pattern. XplorSeq is multi-threaded and can adjust its operation to accommodate multiple shared-memory microprocessors.
The rationale for implementing XplorSeq as a standalone Macintosh application involved 1) desire for a highly responsive, feature-rich graphical output, thus precluding web-based applications; 2) recognition that the BSD-Unix operating system underpinning OS X would allow leveraging of existing open-source software; 3) observation that many computer novices (an intended audience for this software) were more comfortable with OS X than other operating systems; and 4) the maturity, stability, and support inherent in the Cocoa application framework.
Third-party software packages and plugin executables (sortx and biodiv) were written in C and C++. When possible, compiled executables are incorporated directly into the XplorSeq application bundle (essentially, a hidden directory structure) so that users can install and operate XplorSeq without the need for local compilation or extensive configuration. Full implementation of XplorSeq requires separate installation of phred and phrap (obtained at http://www.phrap.org).
Results and discussion
The following sections outline the data structures and analytic tools that form the basis of the XplorSeq workflow.
Data organization and GUI architecture
XplorSeq uses a document-based approach for project data management in which multiple sequences and their associated data are stored and accessed in a single file. As a project evolves, sequences may be added, deleted, amended and analyzed as needed. XplorSeq does not enforce a highly constrained analytic schema and thereby grants the user more autonomy in designing and implementing an analysis plan than typically is possible in a hard-wired software pipeline.
The top layer data object is the "Project", which stores all other data and is synonymous with the document as a whole. Hence, the main XplorSeq window (Fig 1) is the Project Inspector window. Projects organize and manage lists of "Clones", which represent individual cloned genes or PCR amplicons. Clones, in turn, manage groups of "Sequences" which map to unique DNA sequences. Sequences can be imported directly (e.g. as polished GenBank sequences), read from DNA sequencer traces, or assembled from other sequence objects ("contigs"). For each sequence analyzed by BLAST, XplorSeq creates a "BlastInfo" object that summarizes pertinent blast output data: identity and phylogenetic lineage of the sequence's closest BLAST hit, BLAST statistics, etc. Each Clone ranks its constituent Sequence objects based on BLAST bit-score and the "Best Sequence" (i.e. that with the highest bit-score) serves as a proxy for the entire Clone. "Oligo" objects encapsulate data that describe oligonucleotide sequences used in construction of clone libraries.
Data display and control of data processing
A project's Clone, Sequence, and BlastInfo objects are displayed in the Project Inspector window (Fig. 1A), which functions as the main XplorSeq window. Data are arranged hierarchically to reflect nesting of data structures. For each Clone, a summary of its best blast hit, which includes the taxa name, percent sequence identity and bit-score is displayed in the main XplorSeq window. The phylogenetic lineage of the top blast hit can be imported (through either an entrez idfetch query or import of tab-delimited data) to provide information about the taxonomic placement of a clone.
The user controls all steps of data processing by selecting objects to be acted upon and then choosing a function from menu items presented in the tool panel that extends from the main XplorSeq window (Fig 1A). Methods to import, export, and analyze data are accessible through these menus. Below the menus lie controls through which oligonucleotides used to generate PCR libraries can be designated, if relevant to the project; entries in the oligo menus can be modified through a preferences dialog.
Project specific meta-data can be recorded in several text fields presented in the Project window, under the "Project Info" tab. An editable text box is presented in which the user can enter comments, for instance details specific to a project (Fig 1B).
By double-clicking on an entry in the Project window, the user can display and edit more detailed information associated with that entry. For example, the Clone Inspector window (Fig. 3) summarizes the content of the Clone Object, including its top BLAST hit sequence and corresponding BlastInfo Object. The phylogenetic lineage and domain of the Clone Object can be set through the controls at the bottom of the window. A panel extending from the Clone Inspector window presents user-specified meta-data associated with the Clone Object.
DNA sequences, including contig sequences, are displayed in a Sequence Inspector window (Fig. 4). Individual nucleotides are color-coded to represent quality scores generated by the base-calling software (shades of blue) or trimmed sequences (red). Basic sequence information, such as primer sequences and trimmed sequence length, is displayed in a set of text fields at the bottom of the Sequence Inspector window. Similar to Clone Objects, sequence specific meta-data can be viewed through a panel that extends from the inspector window.
Tools for data import and analysis
Summary of XplorSeq functionality
Import DNA chromatograms (.esd, .scf, .abi etc.): phred
Import DNA sequences in Phd format
Import DNA sequences and quality scores in FastA format
Parse Blast records
Import DNA sequences in FastA format
Import XplorSeq document
Import phylogenetic lineage information from entrez
Import metadata in key-value format
Export DNA sequences in variety of formats
FastA + Qual...
Export DNA sequences and quality scores
Export summary of Blast records
Enumerate OTUs belonging to groups of sequences
Calculate OTU richness for set of sequences
Export summary of quality scores
Blast Accession #'s...
Export accession numbers of top Blast hits
Export data in format for Genbank submission (sequin)
Create a Blast database (formatdb)
Export data in XML format
Summarize and export metadata
List selected sequences in Newick format
Pipe data from chromatogram through Blast analysis
Pipe data from contig assembly to Blast analysis
Perform base calling (phred or ttuner)
Perform contig assembly (phrap or TIGR_Assembler)
Blast query of Genbank
Blast query of local blast database
Get Entrez Lineage Info.
Download entrez phylogenetic lineage information (idfetch)
Perform multiple sequence alignment (clustal)
Calculates biodiversity indices with random resampling (biodiv)
XplorSeq Doc Difference...
Generate differences between two XplorSeq documents
Edit Sequence Names...
Alter names of sequences
Edit Lineage Names...
Edit phylogenetic lineage information
Edit metadata associated with sequence
Edit Metadata Keys...
Edit all metadata keys in document
Group sequences and contigs
Ungroup sequences and contigs
Delete blast information, contigs
Sort records in document
Associate primer sequences with sequence objects
Trim sequences based on quality score and primer
Remove trimming information
Reverse complement sequence
DNA -> RNA
Convert DNA sequence to RNA sequence
RNA -> DNA
Convert RNA sequence to DNA sequence
Convert sequence to upper case
Convert sequence to lower case
Cluster Operational Taxonomic Units (sortx)
Clearcut NJ Tree...
Fast neighbor joining trees (clearcut)
Phylip distance matrix...
Calculate distance matrix (dnadist)
Phylip NJ Tree...
Calculate Neighbor joining or UPGMA trees (neighbor)
Generate bootstrap replicates of alignment (seqboot)
Generate consensus of multiple trees (consense)
Generate Maximum Likelihood tree (raxmlHPC)
XplorSeq facilitates batch BLAST [1–3] analyses of DNA sequences through both networked and local searches of nucleotide databases. Local BLAST searches require properly formatted sequence databases, which may be downloaded from NCBI http://ncbi.nlm.nih.gov or created by use of the formatdb executable (through XplorSeq or the command line). XplorSeq dispatches sequences to the appropriate client software (blastcl3 or blastall; [1–3]) and then parses the resulting output file into BlastInfo Objects.
OTU clustering is implemented through the program sortx, which was written in tandem with XplorSeq (Fig. 6F). Sortx uses a fast radial clustering algorithm to bin aligned sequences based on uncorrected pairwise sequence distances (%ID). Clusters can be assembled based on furthest-, mean-, or nearest-neighbor rules. Following cluster formation, sortx selects a representative sequence for each cluster, which maximizes both pairwise similarity to other cluster members and sequence length (simply choosing the sequence with minimum pairwise distance could select for short, but well-conserved sequences, which would not necessarily be representative of the cluster). Finally, the user can select a range of pairwise sequence distance thresholds by which to assemble OTUs in order to create multiple data sets at different phylogenetic depths.
Estimates of biodiversity indices (species richness, diversity, evenness) can be reported through either of two modes. First, the export function OTU Diversity... reports basic calculations of commonly used indices (Sobs, Schao1, CACE, Good's coverage, Shannon diversity; ) for a set of selected sequences. Alternatively, the same biodiversity estimates can be made in a more thorough manner through execution of the analysis function Biodiversity (biodiv)..., which invokes the program biodiv, a standalone command-line tool built in conjunction with XplorSeq. As shown in Fig. 6G, the user selects OTUs definitions for the selected sequences through choice of meta-data options. To compare indices between different groups of sequences, the user can also select multiple "environments" by which to differentiate the sequence subsets; biodiv then performs separate analyses for each designated environment. Biodiv performs random resampling of OTUs and calculates collector's curves and associated biodiversity indices as a function of sampling effort . Biodiv also reports rarefied biodiversity indices, based on resampling, with 95% confidence intervals for each type of environment .
Tools for data export and transformation
Execution times of commonly used software: comparison of XplorSeq with command line implementation
Execution Time (sec.)1
768 .esd files.
384 pairs of reads.
Open XplorSeq file
Save XplorSeq file
Open XplorSeq file
Save XplorSeq file
The XplorSeq file format, implemented using the Cocoa software framework, does not significantly add to the size of data files. For example, an XplorSeq file containing an alignment of 250,000 rRNA sequences (1585 nucleotides per sequence) requires 396 megabytes (MB) of storage, compared to 381 MB for a fasta formated file storing the same alignment. The addition of metadata, such as blast results, increases file size in roughly linear proportion to the size of parsed input text.
Although the use of a flat file format by XplorSeq greatly simplifies data storage and transfer, it does require that all data be read into memory before being manipulated. For relatively large files, bottlenecks are apparent primarily in tasks requiring import and export of data. Thus, sequence analysis projects are likely to be limited as much by computer hardware (e.g., quantity of random access memory and bus speeds) and the performance of underlying 3rd party software tools as by XplorSeq. However, the XplorSeq environment is scalable from laptops to more advanced workstations (e.g., 8-core/32 GB systems currently available) so its capabilities can be expanded as need arises and hardware evolves.
As a rough guide to system requirements, Table 2 presents benchmark comparisons of common XplorSeq tasks performed on a laptop and a workstation. To open the 250,000 sequence XplorSeq alignment file described above, for example, requires ~130 or ~12 seconds, on the laptop or workstation, respectively. Test datasets of 1 × 106 25-nucleotide sequences can be manipulated with similar lag times. More fully annotated files containing ~50,000 16S rRNA sequences, each of length ~1000 nucleotides, have been used routinely on a laptop with little performance degradation [17, 23, 27]. However, projects with > 50,000 annotated sequences likely will require higher performance workstations to insure the responsiveness customary to users of GUI-based software. Alternatively, sequence data can be spread across multiple files, each of which maintains information generated from a particular experiment (e.g. a PCR amplicon library). More complex data storage strategies, such as the use of application-specific databases may be implemented in the future if they do not compromise the XplorSeq philosophy of ease of software installation and use.
Although XplorSeq was developed to expedite the phylogenetic analysis of ribosomal RNA (rRNA) gene libraries, it should prove useful in any sequencing project, particularly ones facilitated by batch analysis of multiple clones. Moreover, any UNIX-based DNA sequence analysis tool that can be ported to Mac OSX can be readily incorporated into XplorSeq. Suggestions for the addition of other modules to the XplorSeq package are most welcome.
Availability and requirements
Project name: XplorSeq
Project home page: http://vent.colorado.edu/phyloware
Operating system: Macintosh OS X (currently requires 10.4.x or 10.5.x)
Programming language: Cocoa/Objective-C, C, C++
Other requirements: phred and phrap are available at http://www.phrap.org
License: Daniel N. Frank. Free for non-commercial use
Any restrictions to use by non-academics: Contact corresponding author. Users are requested to notify the corresponding author when XplorSeq is cited.
The author wishes to thank Prof. Norman R. Pace for encouragement and support, Charles E. Robertson for invaluable mentorship in software engineering and web-site support and Laura Baumgartner, J. Kirk Harris, Jeffrey Walker and members of the Pace laboratory for extensive software testing and feedback. Both anonymous reviewers are thanked for their constructive feedback.
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402.PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2007, (35 Database):D5–12.Google Scholar
- Ley RE, Backhed F, Turnbaugh P, Lozupone CA, Knight RD, Gordon JI: Obesity alters gut microbial ecology. Proc Natl Acad Sci USA 2005, 102(31):11070–11075.PubMed CentralView ArticlePubMedGoogle Scholar
- McManus CJ, Kelley ST: Molecular survey of aeroplane bacterial contamination. J Appl Microbiol 2005, 99(3):502–508.View ArticlePubMedGoogle Scholar
- Papineau D, Walker JJ, Mojzsis SJ, Pace NR: Composition and structure of microbial communities from stromatolites of Hamelin Pool in Shark Bay, Western Australia. Appl Environ Microbiol 2005, 71(8):4822–4832.PubMed CentralView ArticlePubMedGoogle Scholar
- Spear JR, Walker JJ, McCollom TM, Pace NR: Hydrogen and bioenergetics in the Yellowstone geothermal ecosystem. Proc Natl Acad Sci USA 2005, 102(7):2555–2560.PubMed CentralView ArticlePubMedGoogle Scholar
- Spear JR, Walker JJ, Pace NR: Hydrogen and primary productivity: Inference of biogeochemistry from phylogeny in a geothermal ecosystem. In Geothermal Biology and Geochemistry in Yellowstone National Park. Edited by: Inskeep WP, McDermott TR. Bozeman, MT: Thermal Biology Institute, Montana State University; 2005:113–128.Google Scholar
- Walker JJ, Spear JR, Pace NR: Geobiology of a microbial endolithic community in the Yellowstone geothermal environment. Nature 2005, 434: 1011–1014.View ArticlePubMedGoogle Scholar
- Baumgartner LK, Reid RP, Dupraz C, Decho AW, Buckley DH, Spear JR, Przekop KM, Visscher PT: Sulfate reducing bacteria in microbial mats: changing paradigms, new discoveries. Sedimentary Geology 2006, 185: 131–145.View ArticleGoogle Scholar
- Dalby AB, Frank DN, St Amand AL, Bendele AM, Pace NR: Culture-independent analysis of indomethacin-induced alterations in the rat gastrointestinal microbiota. Appl Environ Microbiol 2006, 72(10):6707–6715.PubMed CentralView ArticlePubMedGoogle Scholar
- Ley RE, Harris JK, Wilcox J, Spear JR, Miller SR, Bebout BM, Maresca JA, Bryant DA, Sogin ML, Pace NR: Unexpected diversity and complexity of the Guerrero Negro hypersaline microbial mat. Appl Environ Microbiol 2006, 72(5):3685–3695.PubMed CentralView ArticlePubMedGoogle Scholar
- Rawls JF, Mahowald MA, Ley RE, Gordon JI: Reciprocal gut microbiota transplants from zebrafish and mice to germ-free recipients reveal host habitat selection. Cell 2006, 127(2):423–433.View ArticlePubMedGoogle Scholar
- Salmassi TM, Walker JJ, Newman DK, Leadbetter JR, Pace NR, Hering JG: Community and cultivation analysis of arsenite oxidizing biofilms at Hot Creek. Environ Microbiol 2006, 8(1):50–59.View ArticlePubMedGoogle Scholar
- Spear JR, Walker JJ, Pace NR: Microbial ecology and energetics in yellowstone hotsprings. Yellowstone Science 2006, 14(1):17–24.Google Scholar
- Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI: An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 2006, 444(7122):1027–1031.View ArticlePubMedGoogle Scholar
- Frank DN, St Amand AL, Feldman RA, Boedeker EC, Harpaz N, Pace NR: Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci USA 2007, 104(34):13780–13785.PubMed CentralView ArticlePubMedGoogle Scholar
- Harris JK, De Groote MA, Sagel SD, Zemanick ET, Kapsner R, Penvari C, Kaess H, Deterding RR, Accurson FJ, Pace NR: Molecular identification of bacteria in bronchoalveolar lavage fluid from children with cystic fibrosis. Proc Natl Acad Sci USA 2007, 104(51):20529–20533.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee L, Tin S, Kelley ST: Culture-independent analysis of bacterial diversity in a child-care facility. BMC Microbiol 2007, 7(1):27.PubMed CentralView ArticlePubMedGoogle Scholar
- Spear JR, Barton HA, Robertson CE, Francis CA, Pace NR: Microbial community biofabrics in a geothermal mine adit. Appl Environ Microbiol 2007, 73(19):6172–6180.PubMed CentralView ArticlePubMedGoogle Scholar
- Walker JJ, Pace NR: Phylogenetic Composition of Rocky Mountain Endolithic Microbial Ecosystems. Appl Environ Microbiol 2007, 73(11):3497–3504.PubMed CentralView ArticlePubMedGoogle Scholar
- Feazel LM, Spear JR, Berger AB, Harris JK, Frank DN, Ley RE, Pace NR: Eucaryotic Diversity in a Hypersaline Microbial Mat. Appl Environ Microbiol 2008, 74(1):329–332.PubMed CentralView ArticlePubMedGoogle Scholar
- Frank DN, Pace NR: Gastrointestinal microbiology enters the metagenomics era. Curr Opin Gastroenterol 2008, 24(1):4–10.View ArticlePubMedGoogle Scholar
- Frank DN, Wysocki A, Specht-Glick DD, Rooney A, Feldman RA, St Amand AL, Pace NR, Trent JD: Microbial diversity in chronic open wounds determined by culture-independent molecular methods. Wound Repair and Regeneration 2008, in press.Google Scholar
- Isenbarger TA, Finney M, Rios-Velazquez C, Handelsman J, Ruvkun G: Miniprimer PCR, a new lens for viewing the microbial world. Appl Environ Microbiol 2008, 74(3):840–849.PubMed CentralView ArticlePubMedGoogle Scholar
- Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS, Schlegel ML, Tucker TA, Schrenzel MD, Knight R, et al.: Evolution of Mammals and Their Gut Microbes. Science 2008.Google Scholar
- Peterson DA, Frank DN, Pace NR, Gordon JI: Metagenomic approaches for defining the pathogenesis of inflammatory bowel diseases. Cell Host Microbe 2008, 3(6):417–427.PubMed CentralView ArticlePubMedGoogle Scholar
- Sahl JW, Schmidt R, Swanner ED, Mandernack KW, Templeton AS, Kieft TL, Smith RL, Sanford WE, Callaghan RL, Mitton JB, et al.: Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado. Appl Environ Microbiol 2008, 74(1):143–152.PubMed CentralView ArticlePubMedGoogle Scholar
- Turnbaugh PJ, Backhed F, Fulton L, Gordon JI: Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe 2008, 3(4):213–223.PubMed CentralView ArticlePubMedGoogle Scholar
- Denisov GA, Arehart AB, Curtin MD: A system and method for improving the accuracy of DNA sequencing and error probability estimation through application of a mathematical model to the analysis of electropherograms. Edited by 6681186 USP 2004.Google Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998, 8(3):186–194.View ArticlePubMedGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998, 8(3):175–185.View ArticlePubMedGoogle Scholar
- TIGR Assembler[http://www.jcvi.org/cms/research/software/]
- Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 2003, 31(13):3497–3500.PubMed CentralView ArticlePubMedGoogle Scholar
- Higgins DG, Thompson JD, Gibson TJ: Using CLUSTAL for multiple sequence alignments. Methods Enzymol 1996, 266: 383–402.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673–4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. 2005.Google Scholar
- Sheneman L, Evans J, Foster JA: Clearcut: a fast implementation of relaxed neighbor joining. Bioinformatics 2006, 22(22):2823–2824.View ArticlePubMedGoogle Scholar
- Stamatakis A, Ludwig T, Meier H: RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 2005, 21(4):456–463.View ArticlePubMedGoogle Scholar
- Schloss PD, Handelsman J: The last word: books as a statistical metaphor for microbial communities. Annu Rev Microbiol 2007, 61: 23–34.View ArticlePubMedGoogle Scholar
- Magurran AE: Measuring Biological Diversity. Malden, USA: Blackwell Publishing; 2003.Google Scholar
- Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar , Buchner A, Lai T, Steppi S, Jobb G, et al.: ARB: a software environment for sequence data. Nucleic Acids Res 2004, 32(4):1363–1371.PubMed CentralView ArticlePubMedGoogle Scholar
- Schloss PD, Handelsman J: Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 2005, 71(3):1501–1506.PubMed CentralView ArticlePubMedGoogle Scholar
- DeSantis TZ Jr, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, Andersen GL: NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 2006, (34 Web Server):W394–399.Google Scholar
- Pruesse E, Quast C, Knittel K, Fuchs B, Ludwig W, Peplies J, Glockner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 2007, 35(21):7188–7196.PubMed CentralView ArticlePubMedGoogle Scholar
- Lozupone C, Knight R: UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 2005, 71(12):8228–8235.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.