FANTOM: Functional and taxonomic analysis of metagenomes
© Sanli et al.; licensee BioMed Central Ltd. 2013
Received: 27 September 2012
Accepted: 29 January 2013
Published: 1 February 2013
Interpretation of quantitative metagenomics data is important for our understanding of ecosystem functioning and assessing differences between various environmental samples. There is a need for an easy to use tool to explore the often complex metagenomics data in taxonomic and functional context.
Here we introduce FANTOM, a tool that allows for exploratory and comparative analysis of metagenomics abundance data integrated with metadata information and biological databases. Importantly, FANTOM can make use of any hierarchical database and it comes supplied with NCBI taxonomic hierarchies as well as KEGG Orthology, COG, PFAM and TIGRFAM databases.
The software is implemented in Python, is platform independent, and is available at http://www.sysbio.se/Fantom
Metagenomics  is the culture independent study of an environmental sample by sequencing of the recovered genetic materials of targeted ribosomal RNAs (16S) through amplicon sequencing or whole genomic DNA. This allows for determining the ecosystems taxonomic diversity, functional capacity, dynamics and comparison with other environments. Typically for whole genome based metagenomics, extracted DNA from an environmental sample is a starting material to generate short reads of DNA through next generation sequencing (NGS) technologies that represent the microbiota of the sample. The generated raw sequence reads data typically contain errors that need to be eliminated before further steps using trimming and filtering processes based on a base calling quality score (Phred) [2, 3]. The high quality reads can be annotated to reference taxonomic and functional features using sequence similarity based alignment methods i.e. BLAST , HMMER , etc. against reference databases. Another approach is based on mapping high quality reads on reference genomes or well annotated genes by short read aligners . There are web services such as CAMERA , IMG/M  and MG-RAST , available for performing the above mentioned pipeline of NGS processing and annotation in an automated fashion. Depending on user-given parameters such as percentage similarity or e-value thresholds, each of these individual software tools or web services are able to report the annotated sequences in terms of abundance data for each feature in the subjected database. Further analysis of the hereby obtained quantitative abundance data of metagenomics features, in particular together with sample meta data is important for biological interpretation [10, 11].
Although, the above mentioned web-services can to some extent provide both analysis tools for the comparative analysis of metagenomes, these methods have some limitations; 1) statistical and visual analysis capabilities are limited, 2) functional annotation sources might not satisfy user’s demand, and 3) users may simply not want to upload their sequencing data to an online service. There are several standalone software tools available for statistical analysis and visualization of annotated metagenomics data, e.g. MEGAN , SmashCommunity , STAMP , shotgunFunctionalizeR , VEGAN , QIIME  and Mothur .
We identified the requirement for a user-friendly comparative analysis and data visualization tool where annotated metagenomics data can meet sample metadata and be analyzed at different hierarchy levels using a built-in or user provided biological database. This tool, FANTOM for F unctional AN notation and T axonomic analysis O f M etagenomes, is an easy installed, standalone software tool that is accessed through a graphical user interface to analyze abundance of metagenomics features that are easily integrated with NCBI taxonomy, KEGG , COG  and protein family databases PFAM  and TIGRFAM  with hierarchy information. We believe that this tool will be highly useful for a broad community of scientists desiring to analyze metagenomics data.
The software installer, user manual and demonstration videos can be found and downloaded at the website http://www.sysbio.se/Fantom
FANTOM was implemented in Python allowing it to operate platform independent in addition to the utilization of core scientific packages including numpy, scipy and matplotlib to implement statistical functions and various plotting options. wxPython was incorporated to provide graphical user interface components and storm package was used for object relational mapping of data from the local SQLite database. The software was tested successfully on Windows, Linux and OSX operating systems and the installers are provided for the different platforms.
FANTOM requires two input files; a metagenomics abundance file, which could be derived from annotation of metagenomics data, including either taxonomic or functional annotations and another file containing the samples’ metadata (see user manual and demonstration videos). Besides, there are web services such as CAMERA , IMG/M  and MG-RAST  that allow the users to easily obtain metagenomics abundance from their metagenome data. Metadata can either be numerical or categorical and the software will automatically recognize the format and display options for selecting and filtering samples. Functional hierarchy information was downloaded from KEGG Orthology, COG, PFAM and TIGRFAM databases and taxonomic lineage information was downloaded from the NCBI taxonomy database and constitute the standards feature databases in the software package. Moreover, FANTOM provides the option that allows the user to create and use a custom made hierarchical database. The custom database can be easily imported as a tabular input file to analyze the abundances of corresponding database levels.
In FANTOM, the abundance can be specified at different levels in hierarchical databases, which are called nodes (e.g. pathways or Genera), the abundance of a higher node in the hiearchy is calculated by summing the abundance of all member nodes further down in the hierarchy structure (e.g. orthologs or species). The abundance of nodes that are members of more than one higher level nodes are split equally between higher nodes.
The metadata file can include both categorical and numerical properties of each sample, which can then be used in FANTOM to filter and select sample groups of interest for comparative analysis. Numerical variables can further be used for correlation analysis with the annotated features. Taxonomic or functional feature abundances can be displayed and processed either as absolute counts or as normalized relative values. After selecting relevant subsets of metagenomics data, principal component analysis can be applied to reduce the dimensionality. Furthermore, hierarchical clustering, another multivariate analysis method is implemented to evaluate high dimensional metagenomics data by drawing dendograms for features and samples as well as a heatmap with 2-dimensional clustering, reflecting abundance values.
Results and discussion
The software was evaluated using metagenomics data from the gut microbiome of 124 subjects in the MetaHIT  project. Sequences were quality trimmed (SolexaQA -p 0.05) and sequences shorter than 35 bp were filtered out. High quality reads were aligned to a reference catalogue of 440 genomes to obtain taxonomic abundance. Moreover, the reads were aligned to the MetaHIT gene catalogue of 3.3 million genes to get the abundance of genes. The genes were annotated to the KEGG and COG database and this information was used to transform gene abundance to KEGG KO and COG abundances. This data are available as example files together with metadata included bundled with the software.
The MetaHIT study focused on two human diseases, obesity and inflammatory bowel disease (Crohn’s disease and ulcerative colitis), which we make use of here as example capabilities of FANTOM.
We provide an open source standalone user-friendly software tool, FANTOM, for data analyses and data mining of read counts from whole shotgun metagenomics or amplicon sequencing studies. FANTOM allows the user to integrate sample metadata, taxonomy and gene functional profiling in the analysis, and FANTOM is supplied with access to biological databases as well as the possibility to upload custom made databases.
Availability and requirements
Project name: FANTOM : Functional and taxonomic analysis of metagenomes
Project home page: http://www.sysbio.se/Fantom
Operating system(s): Windows, Linux, Mac OSX
Programming language: python
Other requirements: -
License: GNU-GPL version 3 software license
Any restrictions to use by non-academics: No
We would like to thank Chalmers Foundation, Knut and Alice Wallenberg Foundation and Bioinformatics Infrastructure for Life Sciences (BILS) for financial support. The open access charge is funded by Chalmers Library.
- The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington (DC); 2007. http://www.ncbi.nlm.nih.gov/books/NBK54006
- Cox MP, Peterson DA, Biggs PJ: SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinforma 2010, 11: 485. 10.1186/1471-2105-11-485View ArticleGoogle Scholar
- Schmieder R, Edwards R: Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011,27(6):863-864. 10.1093/bioinformatics/btr026PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990,215(3):403-410.View ArticlePubMedGoogle Scholar
- Eddy SR: Accelerated Profile HMM Searches. PLoS Comput Biol 2011,7(10):e1002195. 10.1371/journal.pcbi.1002195PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Homer N: A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 2010,11(5):473-483. 10.1093/bib/bbq015PubMed CentralView ArticlePubMedGoogle Scholar
- Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M: CAMERA: a community resource for metagenomics. PLoS Biol 2007,5(3):e75. 10.1371/journal.pbio.0050075PubMed CentralView ArticlePubMedGoogle Scholar
- Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen IM, Grechkin Y, Dubchak I, Anderson I, et al.: IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 2008, 36: D534-D538. Database issue Database issuePubMed CentralView ArticlePubMedGoogle Scholar
- Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinforma 2008, 9: 386. 10.1186/1471-2105-9-386View ArticleGoogle Scholar
- Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, et al.: Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 2011,29(5):415-420. 10.1038/nbt.1823PubMed CentralView ArticlePubMedGoogle Scholar
- Yilmaz P, Gilbert JA, Knight R, Amaral-Zettler L, Karsch-Mizrachi I, Cochrane G, Nakamura Y, Sansone SA, Glockner FO, Field D: The genomic standards consortium: bringing standards to life for microbial ecology. ISME J 2011,5(10):1565-1567. 10.1038/ismej.2011.39PubMed CentralView ArticlePubMedGoogle Scholar
- Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res 2007,17(3):377-386. 10.1101/gr.5969107PubMed CentralView ArticlePubMedGoogle Scholar
- Arumugam M, Harrington ED, Foerstner KU, Raes J, Bork P: SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 2010,26(23):2977-2978. 10.1093/bioinformatics/btq536View ArticlePubMedGoogle Scholar
- Parks DH, Beiko RG: Identifying biologically relevant differences between metagenomic communities. Bioinformatics 2010,26(6):715-721. 10.1093/bioinformatics/btq041View ArticlePubMedGoogle Scholar
- Kembel SW, Cowan PD, Helmus MR, Cornwell WK, Morlon H, Ackerly DD, Blomberg SP, Webb CO: Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010,26(11):1463-1464. 10.1093/bioinformatics/btq166View ArticlePubMedGoogle Scholar
- Dixon P: VEGAN, a package of R functions for community ecology. J Veg Sci 2003,14(6):927-930. 10.1111/j.1654-1103.2003.tb02228.xView ArticleGoogle Scholar
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, et al.: QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010,7(5):335-336. 10.1038/nmeth.f.303PubMed CentralView ArticlePubMedGoogle Scholar
- Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009,75(23):7537-7541. 10.1128/AEM.01541-09PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 2000,28(1):27-30. 10.1093/nar/28.1.27PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000,28(1):33-36. 10.1093/nar/28.1.33PubMed CentralView ArticlePubMedGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al.: The Pfam protein families database. Nucleic Acids Res 2010, 38: D211-D222. 10.1093/nar/gkp985PubMed CentralView ArticlePubMedGoogle Scholar
- Haft DH, Selengut JD, White O: The TIGRFAMs database of protein families. Nucleic Acids Res 2003,31(1):371-373. 10.1093/nar/gkg128PubMed CentralView ArticlePubMedGoogle Scholar
- Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010,464(7285):59-65. 10.1038/nature08821PubMed CentralView ArticlePubMedGoogle Scholar
- Ley RE, Backhed F, Turnbaugh P, Lozupone CA, Knight RD, Gordon JI: Obesity alters gut microbial ecology. Proc Natl Acad Sci USA 2005,102(31):11070-11075. 10.1073/pnas.0504978102PubMed CentralView ArticlePubMedGoogle Scholar
- Schwiertz A, Taras D, Schafer K, Beijer S, Bos NA, Donus C, Hardt PD: Microbiota and SCFA in lean and overweight healthy subjects. Obesity (Silver Spring) 2010,18(1):190-195. 10.1038/oby.2009.167View ArticleGoogle Scholar
- Duncan SH, Lobley GE, Holtrop G, Ince J, Johnstone AM, Louis P, Flint HJ: Human colonic microbiota associated with diet, obesity and weight loss. Int J Obes (Lond) 2008,32(11):1720-1724. 10.1038/ijo.2008.155View ArticleGoogle Scholar
- Sokol H, Pigneur B, Watterlot L, Lakhdari O, Bermudez-Humaran LG, Gratadoux JJ, Blugeon S, Bridonneau C, Furet JP, Corthier G, et al.: Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc Natl Acad Sci USA 2008,105(43):16731-16736. 10.1073/pnas.0804812105PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.