- Methodology article
- Open Access
SAMSA: a comprehensive metatranscriptome analysis pipeline
© The Author(s). 2016
- Received: 9 April 2016
- Accepted: 21 September 2016
- Published: 29 September 2016
Although metatranscriptomics—the study of diverse microbial population activity based on RNA-seq data—is rapidly growing in popularity, there are limited options for biologists to analyze this type of data. Current approaches for processing metatranscriptomes rely on restricted databases and a dedicated computing cluster, or metagenome-based approaches that have not been fully evaluated for processing metatranscriptomic datasets. We created a new bioinformatics pipeline, designed specifically for metatranscriptome dataset analysis, which runs in conjunction with Metagenome-RAST (MG-RAST) servers. Designed for use by researchers with relatively little bioinformatics experience, SAMSA offers a breakdown of metatranscriptome transcription activity levels by organism or transcript function, and is fully open source. We used this new tool to evaluate best practices for sequencing stool metatranscriptomes.
Working with the MG-RAST annotation server, we constructed the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) software package, a complete pipeline for the analysis of gut microbiome data. SAMSA can summarize and evaluate raw annotation results, identifying abundant species and significant functional differences between metatranscriptomes.
Using pilot data and simulated subsets, we determined experimental requirements for fecal gut metatranscriptomes. Sequences need to be either long reads (longer than 100 bp) or joined paired-end reads. Each sample needs 40–50 million raw sequences, which can be expected to yield the 5–10 million annotated reads necessary for accurate abundance measures. We also demonstrated that ribosomal RNA depletion does not equally deplete ribosomes from all species within a sample, and remaining rRNA sequences should be discarded. Using publicly available metatranscriptome data in which rRNA was not depleted, we were able to demonstrate that overall organism transcriptional activity can be measured using mRNA counts. We were also able to detect significant differences between control and experimental groups in both organism transcriptional activity and specific cellular functions.
By making this new pipeline publicly available, we have created a powerful new tool for metatranscriptomics research, offering a new method for greater insight into the activity of diverse microbial communities. We further recommend that stool metatranscriptomes be ribodepleted and sequenced in a 100 bp paired end format with a minimum of 40 million reads per sample.
- Big data
- Software package
- Best practices
Metatranscriptomics, the large-scale sequencing of mRNAs from complex microbial communities, allows for the observation of gene expression patterns [1–4]. Metatranscriptomics is a relatively new field, with the first mention around 2008 , but it is growing quickly. Using high-throughput techniques developed in conjunction with “big data” computer-automated analysis approaches, metatranscriptomics offers a novel and complete method for looking at not just the organisms present, but also the transcriptional activity occurring within a complex and diverse population at any chosen specific point in time [2, 3]. Metatranscriptomics is especially useful for analyzing complex populations in flux, such as the gut microbiome, which can be impacted and altered by a large number of transitory factors [6–8].
However, because of the complexity of metatranscriptomic data, extensive analysis is needed to convert raw data into simplified and easily understood results. The raw data comprise tens of millions of individual reads per sample . Simplifying and condensing this very large data set requires multiple steps in a software pipeline and generally requires a dedicated bioinformatician to perform the analysis. Current pipelines or in-house methods often require significant computing power or use several different tools, many of which were not originally intended for metatranscriptome analysis [2, 3, 9]. For researchers who want to utilize metatranscriptomic analysis but may not have the necessary bioinformatics experience, there is a strong need for a complete pipeline, designed to analyze this data from beginning to end without requiring extensive technical expertise.
In addition, as metatranscriptomics is a new area of genomic exploration, there are few established guidelines or standard protocols. Standardized protocols exist for mRNA collection  and for stool-specific extraction [10, 11], but there is also a pressing need for an investigation of necessary sequencing parameters and depth, minimum read quality standards, and reliable reference databases against which metatranscriptome reads can be aligned. Any researcher planing to sequence metatranscriptomic data must know the target sequencing depth and minimum recommended read length, whether to filter out ribosomal sequences, and the relative confidence of their predicted results based on these parameters.
In this paper, to develop a standard metatranscriptomic analysis pipeline that can be applied to a wide range of microbiome samples, we extracted and sequenced RNA from human fecal material using experimentally validated protocols . By testing various parameters and modifications, we determined the best practices at each step in obtaining and analyzing the metatranscriptomic data. In addition, we incorporated a publicly available metatranscriptomic data set examining the fecal gut microbiome in tyrosine kinase two knockout (Tyk2−/−) mice compared to wild-type controls over time during the outbreak of dextran sodium sulfate (DSS) induced colitis . Examining this data, we provide new insights into differences in functional expression between the two different microbiomes over the course of the colitis development. By applying the approaches detailed here, researchers wishing to include metatranscriptomic analysis in their experiments can obtain accurate results without an investment in the development of bioinformatics tools.
Creation of SAMSA, a new pipeline for metatranscriptome analysis
During preprocessing, raw sequences are trimmed to remove reads containing low-quality bases and eliminate adaptor contamination using Trimmomatic, a flexible read trimming tool for Illumina NGS data . Next, each pair of paired-end reads are aligned to each other using FLASh, a short read aligning program . In our pilot samples, approximately 32–54 % of the raw reads in each sample were successfully aligned, with an average aligned read length of 178 base pairs.
Next, these sequences are submitted for annotation to Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) . MG-RAST includes several steps, including an initial sequence quality control check through SolexaQA, gene calling through FragGeneScan, clustering of amino acid sequences at 90 % identity through the uclust implementation of QIIME, and then using sBLAT on each protein sequence cluster to locate the best match reference. For each sequence cluster, MG-RAST selects the best match through the sBLAT similarity search. If multiple reference database matches tie for best matching score, they are both included in the final results. If the read does not achieve a match score above the minimum e-value cutoff, it is discarded. Each match is linked to MG-RAST’s internal identifier system and assigned an M5nr ID, correlating with linked matches in all subsystems databases. The annotated output can be provided on a per-match basis, using the M5nr ID to link each read to its best match from the subsystems database of choice. To create sorted abundance measures of the metatranscriptome using the SAMSA pipeline, all annotations with an acceptable best-match to the NCBI Reference Database (RefSeq)  are downloaded from MG-RAST. Annotations are downloaded for the best match to both organism and individual transcript. Annotations are downloaded directly in tab-delimited form using MG-RAST’s RESTful API interface  and a custom Python program to assemble the API call command. In addition, the annotated output is also downloaded from the SEED Subsystems reference database  to provide ontology annotations.
A custom Python program parses each annotated output, storing each unique annotation match in a dictionary and maintaining counts of the number of occurrences of each unique annotation. After the annotation file is processed, unique annotations are sorted by abundance and exported as output.
Annotation and abundance information from the Aggregation phase are inputs to custom R scripts, which generate barplots and dendograms (see Fig. 1). To compare experimental versus control metatranscriptomes and determine significantly differentially expressed transcripts, R’s DESeq2 package is used to test the output files for differential expression . DESeq2’s testing method adjusts for multiple hypothesis testing, performing pairwise comparisons in the scope of a larger overall data set. The pipeline’s R scripts export sorted lists of log2fold change in both organism and functional category activity, sorted by adjusted p-value. Results are also stored in the local R environment and can be used for graphical output.
This complete pipeline, coded in Python and R, is fully open source, is set up for streamlined use from the command line, does not require a local server for intense computation, and is freely available for download through GitHub.
Measurement of host transcripts in fecal metatranscriptomes
Given that metatranscriptomes theoretically contain the transcriptional profile of all organisms present, we asked whether the transcriptomes of host cells could still be elucidated from fecal metatranscriptomes for which the primary goal was bacterial metatranscriptome analysis. We tried two methods of RNA extraction followed by either ribosomal depletion or poly(A) enrichment (see Methods). Neither method yielded many reads aligning to the human genome. We found 417 and 340 ribodepleted reads (0.0019 and 0.0085 % of all ribodepleted reads) matching the human genome, and 2273 and 2100 poly(A) enriched reads (0.116 and 0.097 % of all poly(A) enriched reads) matching the human genome. Thus, we conclude that host transcriptomes cannot be reliably extracted from stool metatranscriptomes that have been isolated and sequenced using standard methods.
Use of ribosomal depletion methods in metatranscriptome preparation
Comparison of rRNA abundance in control vs. ribodepleted samples
Optimal sequencing depth for gut microbiome metatranscriptomics
An important consideration when creating a metatranscriptome is sequencing depth: how many reads must be obtained in order to provide proper representation of all reads within a sample? As the proportionality of different reads within the sample is of crucial importance, sequencing depth must be large enough to ensure a balanced representation of each read’s relative abundance within the total sample. Too little depth can result in inaccurate abundance and activity measurements.
Paired or single end sequencing for metatranscriptome analysis
Another important consideration when sequencing a metatranscriptome is whether paired-end sequencing is necessary for proper read annotation. Paired-end sequencing is more expensive on a per-read basis but the average output read length is significantly increased, allowing for a more accurate best-match annotation in the pipeline.
Read count comparison between single and paired end files
Paired assembled reads
The paired-end sequencing data matched to a higher number of unique transcripts than the single read data (270,384 unique transcripts versus 215,599 transcripts), suggesting that increased read length leads to greater specificity in transcript annotation. Correlations between paired-end and single-read processed identical metatranscriptomes averaged only 0.72 between both pilot samples. Given that these paired-end and single read metatranscriptomes were originally identical before the trimming of excess bases beyond 100 bases per read, these results indicate significant mislabeling or lost information in the single read approach when compared to the more accurate paired-end sample.
Taken with the organism matching results, these numbers suggest that the increased read length generated either through paired-end sequencing or through 150-bp single-end sequencing is necessary to ensure accurate metatranscriptome annotation.
Determining organisms present within fecal metatranscriptomes
Applying this pipeline to a public metatranscriptomic data set: organism and functional analysis
In the original paper , which focused primarily on the response of mouse gut epithelial cells to dextran sodium sulfate (DSS) induced colitis, the authors used primarily only the rRNA sequences from 16S amplicon sequencing of the samples to examine shifts in mouse gut microbiomes. They did report an increase in mRNA matching Enterobacteriaceae species in Tyrosine Kinase 2 (Tyk2) deficient mice, rising on day 3 of treatment, but provided no other mRNA analysis of these samples.
Examining the results of our pipeline, we observed a 1.558-fold increase in Enterobacteriaceae mRNA expression in day 3 samples (p = 0.0055). Indeed, looking across all bacterial species with greater than 100 average expressed transcripts, we found significant increases in transcriptional activity in Escherichia and Providencia species, both of which are associated with ulcerative colitis . In addition, we noted a significant decrease in Butyrivibrio species (p = 0.03), when comparing between wild-type and Tyk2-deficient mice over the course of DSS-induced colitis. Butyrivibrio is a probiotic-associated strain that may normally offer protection against colitis . We also observed several significant changes in functional activity between the two groups, identifying more than 300 protein-coding transcripts differentially expressed between wild-type and Tyk2-deficient mice (adjusted p < 0.05, Additional file 1).
Although still an emerging field, metatranscriptomics offers a powerful approach for analyzing complex functional activity within a heterogeneous microbial population, such as that found within the gut microbiome. Previous approaches for the study of the gut microbiome provide an incomplete picture, limited to only identifying variations in organism population levels and not including data on the gene expression of different species within the environment.
Our pipeline fulfills the need for four main steps in metatranscriptomic data analysis: preprocessing, annotation, aggregation, and analysis. By leveraging MG-RAST’s annotation server, we can provide comprehensive annotation and analysis of a metatranscriptome without requiring a dedicated private server. SAMSA generates outputs at each step, creating a streamlined pipeline where each stage in data analysis can be independently examined.
Using the SAMSA pipeline, we have established a set of “best practices” for metatranscriptome sequencing. Our work demonstrates that the increased specificity provided by a paired-end sequencing approach significantly increases specificity of read annotations. In addition, to provide a complete and accurate measure of read abundance within a metatranscriptome, approximately 40 million raw reads must be sequenced, providing an estimated 10 million mRNA annotations.
Due to the vast majority of extracted total RNA originating from ribosomes, ribodepletion is strongly recommended for all metatranscriptome sample processing. We demonstrate that although ribodepletion is not successful at removing all rRNA, and skews the proportions of remaining rRNA within a sample, it greatly increases the number of mRNA annotations obtained per metatranscriptome. Although we only tested one commercially available ribodepletion kit, we believe that all ribodepletion methods based on hybridization with complementary rRNA oligonucleotide probes will carry some intrinsic bias, skewing the proportions of remaining rRNA within the sample. Therefore, we recommend discarding all remaining ribosomal reads before performing further analysis on these samples.
Although ribodepletion results in a loss of information regarding organism abundance within a metatranscriptomic sample, we show that total mRNA can be used as an alternate method of evaluating overall organism presence within a sample. Compared to mRNA results, we found a higher number of false annotations among the rRNA reads, particularly for eukaryotic organisms which were not present in the original fecal samples, including plant species such as Sisymbrium (cabbage), Arabidopsis, and Populus (cottonwood), Bos taurus, human, and Coptotermes (termite). This increased mismatch rate may be due to the fact that MG-RAST is not tailored to work with eukaryotic organisms. Another reason that mRNA and rRNA results were not perfectly correlated is that there were insufficient mRNA reads in the data set used for direct comparison. Our data (Fig. 3) suggested that a minimum of 10 million annotated reads are required for accurate annotations, while the non-ribosomal depleted data set averaged only 1–2 million annotated mRNA reads per sample. While total mRNA may not be a perfect proxy of organism population size, the extent of mismatches in the rRNA data suggest that mRNA is at least better than total rRNA for judging overall activity-based abundance.
Analyzing a public data set, we confirmed the rRNA patterns previously described by the producers of that data , and also identified significant changes in other, more specific genera of microbial species, including increases in Escherichia and Providencia and decreases in Butyrivibrio. Both the increase in Escherichia and Providencia and the decrease in Butyrivibrio species were only notable at the genus level and would not be identifiable in 16S data, demonstrating the value of mRNA-based metatranscriptomics analysis. In addition, we successfully identified a variety of functional expressed activities of gut microbes that significantly differed between wild-type and Tyk2−/− strains of mice strains.
As the field of metatranscriptomics continues to grow and expand, we expect that metatranscriptome analysis will become increasingly important to understand the functional responses of gut microbiome communities. By using mRNA transcripts to identify both the overall transcriptional activity levels of organisms within a sample and changes in specific gene or functional expression, we can gain a better understanding of the capabilities and actions within an active gut microbiome at any chosen point in time. Because all parts of SAMSA are open source and publicly accessible, this tool can be used even by researchers with little previous experience in working with metatranscriptomes. We hope to encourage the more widespread use of metatranscriptomics as the next “big data” tool for determining activity within complex microbiome populations.
Although the SAMSA pipeline successfully annotated and summarized gut microbiome metatranscriptome data without the need for large server resources, several limitations still exist. MG-RAST’s annotation servers require waiting in a queue, and processing may be slowed by days to weeks. Additionally, although SAMSA can reveal shifts in functional expression patterns, shifts in microbial population sizes cannot be measured as accurately as using shotgun metagenomics, and future approaches may focus on incorporating both techniques to ensure completeness.
Future versions of this pipeline will focus on a more rapid in-house annotation step, using custom built reference databases and allowing for greater speed and additional analysis options, drawing from multiple reference databases as data progresses through the pipeline.
We have created a new pipeline for metatranscriptome analysis, functioning in conjunction with the MG-RAST annotation pipeline. This pipeline is capable of determining functional expression activity within a sample at the transcript level, and provides measures of total transcription, differentiated either by organism or by functional category of transcript. This pipeline will enable more rapid adoption of metatranscriptomics methods. Finally, we recommend that stool metatranscriptomes be ribodepleted and sequenced in a 100 bp paired end format with a minimum of 40 million reads per sample.
Stool samples were collected from two normal healthy adults. Stool was added to 50 mL tubes containing 25 mL RNAlater until the total volume reached 30 mL. Samples were then sealed and shaked vigourously until the contents appeared to be uniform in consistency. The samples were then temporarily stored at −20 C before being moved to −80 C.
Two different RNA extraction protocols were applied to the two human subject stool samples, adapted from Giannoukos et al.. Samples were initially frozen at −80° C in RNAlater. Samples were partially thawed and mixed with bacterial lysis buffer, incubated, and then homogenized through both bead-beating and QIAshredder treatments. Extraction was performed using the Qiagen RNeasy isolation kit, with additional rigorous Turbo DNAse treatment to remove DNA contamination. Ribodepletion was performed on two of the samples (designated P1) using the RiboZero Magnetic GOLD kit. Poly(A) selection was performed on two samples (designated P4) using the Illumina TruSeq kit protocol.
For each sample, RNA-Seq libraries were prepared from 20uL of >2000 ng/uL at the DNA Sequencing Core of the UC Davis Genome Center. RNA extracted using the bacterial metatranscriptome protocol (P1) was first ribodepleted using the RiboZero Magnetic Gold Kit (Epidemiology), catalog number MRZE706. The Illumina TruSeq protocol, without poly(A) selection, was then used to prepare RNA-Seq libraries. For the TRIzol-extract RNA (P4), the Illumina TruSeq protocol with poly(A) selection was used to prepare RNA-Seq libaries. All four samples were run on a single lane of Illumina HiSeq 2000 with indexing to allocate ~40 % of the lane to each bacterial metatranscriptomes and ~10 % of the lane to each poly(A)-selected metatranscriptome. The four metatranscriptomes have been deposited in the NCBI SRA repository, in BioProject PRJNA313102, SRA study SRP071017.
Preprocessing and annotation of metatranscriptome reads
Raw sequences were obtained for two pilot samples, labeled as 4012 and 4015. Cleaning of the raw sequences to remove reads containing low-quality bases and eliminate adaptor contamination was performed using Trimmomatic, a flexible read trimming tool for Illumina NGS data. At default parameters, Trimmomatic removed low-quality reads to meet the minimum threshold of acceptability for MG-RAST submission.
Paired end raw sequence files were aligned using FLASh, a short read aligning program. Approximately 32–54 % of the raw reads in each sample were successfully aligned, with an average aligned read length of 178 base pairs.
The trimmed and aligned sequences were submitted for annotation to Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) . MG-RAST includes several steps, including an initial sequence quality control check through SolexaQA, gene calling through FragGeneScan, clustering of amino acid sequences at 90 % identity through the uclust implementation of QIIME, and then using sBLAT on each protein sequence cluster to locate the best match reference.
For each sequence cluster, MG-RAST selects the best match through the sBLAT similarity search. If multiple reference database matches tie for best matching score, they are both included in the final results. If the read does not achieve a match score above the minimum e-value cutoff, it is discarded. Each match is linked to MG-RAST’s internal identifier system and assigned an M5nr ID, correlating with linked matches in all subsystems databases. The annotated output can be provided on a per-match basis, using the M5nr ID to link each read to its best match from the subsystems database of choice.
Post-annotation processing and analysis
To create sorted abundance measures of the metatranscriptome, all annotations with an acceptable best-match to the NCBI Reference Database (RefSeq) were downloaded from MG-RAST. Annotations were downloaded for the best match to both organism and individual transcript. Annotations were downloaded directly in tab-delimited form using MG-RAST’s RESTful API interface and a custom Python program to assemble the API call command. In addition, the annotated output was also downloaded from the SEED Subsystems reference database to provide ontology annotations.
A custom Python program parsed through each annotated output, storing each unique annotation match in a dictionary and maintaining counts of the number of occurrences of each unique annotation. After the annotation file was processed, the unique annotations were sorted by abundance and exported as output.
Evaluating minimum viable metatranscriptome read counts
To determine minimum viable metatranscriptome size, the original samples, which were very deeply sequenced, were digitally broken down and reshuffled to create smaller, randomly generated subset metatranscriptomes. 100 smaller subset metatranscriptomes were generated for each subset size, with the subset sizes consisting of 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, and 90% of the final metatranscriptome size.
Each of these smaller subsets was subjected to identical analysis using the same programs and pipeline to determine variation in annotation abundance. By comparing relative variation in the abundance of different transcripts, the minimum necessary metatranscriptome size needed for stable abundance percentage estimates could be computed.
To determine whether paired-end sequencing was necessary for ensuring accuracy of the annotation process, the paired-end reads were digitally trimmed to 100 base pairs, creating a simulated single-read metatranscriptome containing the same number of identical sequences to the paired-end file. This digitally created single-read metatranscriptome was analyzed using the same pipeline as the paired-end original file, and the results were compared for to determine level of variation using statistical testing in R via the DESeq2 package.
Comparison of tyrosine kinase 2 knockout versus wild-type mice in colitis
The constructed pipeline was tested using publicly available RNA-seq data from Hainzl et al. . The original authors obtained Illumina-sequenced metatranscriptomes from 15 gnotobiotic mice, 9 with a tyrosine kinase 2 knockout genotype (Tyk2−/−) and 6 wild-type controls, at different stages of dextran sodium sulfate (DSS) induced colitis. The original authors used only the rRNA gene copies from the metatranscriptomes, demonstrating through PCA analysis that both the wild-type and Tyk2−/− mice showed similar shifts in organism population based on stage of DSS-induced inflammation.
Version control & access to the SAMSA pipeline
The version of SAMSA used in this paper was version 1.0.0. The version of MG-RAST used was version 3.6, current as of May 2016. All components and tools used in the SAMSA pipeline, as well as documentation files, are freely available from GitHub at http://github.com/transcript/SAMSA.
The authors would like to thank Joyce Lee and Justin Fontaine for their technical help with RNA extraction from stool samples. We also thank Dr. Jennifer Smilowitz and Prof. Daniela Barile of University of California, Davis for their donation of the human stool samples. We also thank Folker Meyer and the team at MG-RAST for providing assistance with correcting API issues and inconsistencies. We thank Hainzl 2015 et al. from the Institute of Animal Breeding & Genetics for making their data publically available through NCBI.
This research was partially supported by an industry/campus supported fellowship of STW under the Training Program in Biomolecular Technology (T32-GM008799) at the University of California, Davis, and also by the Peter J. Shields Endowed Chair in Dairy Food Science. The sequencing was carried out at the DNA Technologies and Expression Analysis Cores at the UC Davis Genome Center, supported by the NIH Shared Instrumentation Grant 1S10OD010786-01.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the NCBI Short Read Archive (SRA) repository, accession numbers SRP026343 and SRP026649 for public data and SRP071017 for pilot data, at https://www.ncbi.nlm.nih.gov/Traces/sra/.
DGL conceived of the study. DGL and STW designed experiments. STW developed the pipeline. STW and DGL conducted experiments. IK reviewed code. STW and DGL drafted the manuscript. STW, DGL, DAM, and IK interpreted data and contributed to the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Pilot metatranscriptome samples were derived from a study that is registered on clinicaltrials.gov (NCT01814540). The UC Davis Institutional Review Board approved all aspects of the study and written informed consent was obtained from all participants prior to study procedures.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Moran M. Metatranscriptomics: Eavesdropping on Complex Microbial Communities. Microbe. 2009;4(7):7.Google Scholar
- Leimena MM, et al. A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets. BMC Genomics. 2013;14(530):14.Google Scholar
- Embree M, et al. Single-cell genome and metatranscriptome sequencing reveal metabolic interactions of an alkane-degrading methanogenic community. ISME J. 2014;8(4):757–67.View ArticlePubMedGoogle Scholar
- Franzosa EA, et al. Relating the metatranscriptome and metagenome of the human gut. PNAS. 2014;10. doi: 10.1073/pnas.1319284111.
- McGrath KC, et al. Isolation and analysis of mRNA from environmental microbial communities. J Microbiol Methods. 2008;75(2):172–6.View ArticlePubMedGoogle Scholar
- Dimitrov DV. The human gutome: nutrigenomics of the host-microbiome interactions. OMICS. 2011;15(7–8):419–30.View ArticlePubMedGoogle Scholar
- Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012;13(4):260–70.PubMedPubMed CentralGoogle Scholar
- Round JL, Mazmanian SK. The gut microbiota shapes intestinal immune responses during health and disease. Nat Rev Immunol. 2009;9(5):313–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Gosalbes MJ, et al. Metatranscriptomic approach to analyze the functional human gut microbiota. PLoS One. 2011;6(3), e17447.View ArticlePubMedPubMed CentralGoogle Scholar
- Giannoukos G, et al. Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 2012;13(23):13.Google Scholar
- Reck M, et al. Stool metatranscriptomics: A technical guideline for mRNA stabilisation and isolation. BMC Genomics. 2015;16:494.View ArticlePubMedPubMed CentralGoogle Scholar
- Hainzl E, et al. Intestinal Epithelial Cell Tyrosine Kinase 2 Transduces IL-22 Signals To Protect from Acute Colitis. J Immunol. 2015;195(10):5011–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Meyer F, et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386.View ArticlePubMedPubMed CentralGoogle Scholar
- Tatusova T, et al. RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res. 2014;42(Database issue):D553–9.View ArticlePubMedGoogle Scholar
- Wilke A, et al. A RESTful API for accessing microbial community data for MG-RAST. PLoS Comput Biol. 2015;11(1), e1004008.View ArticlePubMedPubMed CentralGoogle Scholar
- Overbeek R, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42(Database issue):D206–14.View ArticlePubMedGoogle Scholar
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.View ArticlePubMedPubMed CentralGoogle Scholar
- Costa V, et al. Massive-scale RNA-Seq analysis of non ribosomal transcriptome in human trisomy 21. PLoS One. 2011;6(4), e18493.View ArticlePubMedPubMed CentralGoogle Scholar
- Neut C, Guillemot F, Colombel JF. Nitrate-Reducing Bacteria in Diversion Colitis: A Clue to Inflammation? Dig Dis Sci. 1997;42(12):4.View ArticleGoogle Scholar
- Ohkawara S, et al. Effect of oral administration of Butyrivibrio fibrisolvens MDT-1 on experimental enterocolitis in mice. Clin Vaccine Immunol. 2006;13(11):1231–6.View ArticlePubMedPubMed CentralGoogle Scholar