Volume 14 Supplement 1
SymGRASS: a database of sugarcane orthologous genes involved in arbuscular mycorrhiza and root nodule symbiosis
© Belarmino et al.; licensee BioMed Central Ltd. 2013
Published: 14 January 2013
The rationale for gathering information from plants procuring nitrogen through symbiotic interactions controlled by a common genetic program for a sustainable biofuel production is the high energy demanding application of synthetic nitrogen fertilizers. We curated sequence information publicly available for the biofuel plant sugarcane, performed an analysis of the common SYM pathway known to control symbiosis in other plants, and provide results, sequences and literature links as an online database.
Sugarcane sequences and informations were downloaded from the nucEST database, cleaned and trimmed with seqclean, assembled with TGICL plus translating mapping method, and annotated. The annotation is based on BLAST searches against a local formatted plant Uniprot90 generated with CD-HIT for functional assignment, rpsBLAST to CDD database for conserved domain analysis, and BLAST search to sorghum's for Gene Ontology (GO) assignment. Gene expression was normalized according the Unigene standard, presented as ESTs/100 kb. Protein sequences known in the SYM pathway were used as queries to search the SymGRASS sequence database. Additionally, antimicrobial peptides described in the PhytAMP database served as queries to retrieve and generate expression profiles of these defense genes in the libraries compared to the libraries obtained under symbiotic interactions.
We describe the SymGRASS, a database of sugarcane orthologous genes involved in arbuscular mycorrhiza (AM) and root nodule (RN) symbiosis. The database aggregates knowledge about sequences, tissues, organ, developmental stages and experimental conditions, and provides annotation and level of gene expression for sugarcane transcripts and SYM orthologous genes in sugarcane through a web interface. Several candidate genes were found for all nodes in the pathway, and interestingly a set of symbiosis specific genes was found.
The knowledge integrated in SymGRASS may guide studies on molecular, cellular and physiological mechanisms by which sugarcane controls the establishment and efficiency of endophytic associations. We believe that the candidate sequences for the SYM pathway together with the pool of exclusively expressed tentative consensus (TC) sequences are crucial for the design of molecular studies to unravel the mechanisms controlling the establishment of symbioses in sugarcane, ultimately serving as a basis for the improvement of grass crops.
Among the most ancient symbiotic associations of plants are the intracellular arbuscular mycorrhiza (AM) with fungi of the phylum Glomeromycota that emerged approximately 475 million years (Myr) ago. Approximately 400 Myr later, the nitrogen-fixing root nodule symbiosis (RNS) with rhizobacteria (rhizobia) evolved in association with a subset of the dicotyledonous angiosperms (mostly legumes). The understanding of mechanisms involved in the evolution of AM and RNS has in recent years faced dramatic advances through genetic analysis of the plant host. AM symbiosis requires several of the genes identified in RNS, which are therefore referred to as common symbiosis (SYM) genes. It is evident that the common SYM pathway evolved in the context of AM and became secondarily involved in RNS . Besides of being capable of AM symbiosis, sugarcane (Saccharum officinarum) associates with rhizhospheric, associative and endophytic nitrogen fixing bacteria , that posses unique features yet to be characterized, but as occurred with RNS, it is possible that this system of beneficial plant-microbial association evolved already in the context of AM symbiosis.
Sugarcane is an important crop around the world for the production of sucrose. Sugarcane derivatives and byproducts aroused interest with focus on ethanol production, which may replace up to 10% of the world's gasoline consumption in this decade, leading to a reduction of 50 tons of carbon emission per year . Sugarcane is considered the most suitable tropical crop for biofuel production, but surprisingly high N fertilizer applications in main producer countries raise doubt about the sustainability of production and are at odds with a carbon-based crop. Interestingly, the amounts of N fertilizer in Brazil's sugarcane plantation are very low, although neither the yields nor the soil N reserves appear to diminish . This is believed to be a consequence of the association of sugarcane with soil-borne fungi and nitrogen-fixing bacteria, that both play a critical role in nutritional and plant growth processes [5, 6]. Molecular studies on these associations in sugarcane are lagging far behind studies in other plants. Thus, SymGRASS database is considered a first step towards linking the already available information for other plants to sugarcane, and also represent a platform for the development and design of molecular experiments to study arbuscular mycorrhiza and nitrogen-fixing symbiosis in sugarcane.
SymGRASS relational database was implemented in mySQL and PERL CGI scripting. The sugarcane sequences used in this work were downloaded from the NCBI's dbEST . Prior to assembly, the sequences were trimmed for vector contamination, poly A/T, and sequences containing more than 30% of Ns, and cleaned using Seqclean . The data was assembled using TGICL v2.1 . A super assembly was generated from the first one using a scaffolding step with the translation mapping method and the proteome of Sorghum bicolor as reference to assemble contigs belonging to a same coding region . Functional annotation was accomplished through a BLASTx run against the UniprotKB downloaded from the Uniprot protein database . Gene Ontology terms were assigned to the TCs based on their similarity to UniprotKB protein accessed through the BLASTx and mapped to Gene Ontology association (GOA) . Conserved domain annotation was performed with the aid of rpsBLAST run against a locally formatted CDD database .
EST information files for the S. officinarum complex were downloaded from the NCBI dbEST. A PERL script was designed to collect the experimental information about tissue, organ and developmental stage sampling contained therein. After assembling, the numbers of reads building a contig as well as their sample of origin were retrieved with the aid of a script written in PERL. The expression abundance was normalized, following the formula r.10e5/n, where r represents the number of reads, and n is the total number of read in the particular library, since this is the presentation standard by Unigene. After normalization, the TCs were ranked according the percentage of EST in the libraries. The differential expression of a TC is statistically given as R-value proposed by Stekel and colleagues .
The sequences from the SYM pathway used as query were obtained from the NCBI non redundant protein databank according to the related literature searched. Accession number and links to the literature are available at http://symgrass.dyndns.org. The sequences were compared to the SymGRASS sugarcane sequence database through a tBLASTn search with a cut-off e-value of 10e-5. The original annotation performed for the TCs were adopted for the SYM gene candidates.
For the search of antimicrobial peptide, the entire PhytAMP database was downloaded and used as query in a tBLASTn search against the SymGRASS sugarcane sequence database with default parameters . Also the original annotation performed for the TCs were adopted for the antimicrobial peptide candidates. The electronic differential display was generated with the Bioconductor R statistical package . All the procedures were conducted in a server with 48 cores and 128 GB random access memory.
Results and discussion
EST libraries and sequence data available at SymGRASS.
Apical meristem and neighboring tissue (immature plants)
Apical meristem and neighboring tissue (mature plants)
Developing inflorescence 1 cm long
Developing inflorescence base 5 cm long
Developing inflorescence and rhachis 10 cm long
Developed inflorescence and rhachis 20 cm long
Developed inflorescence 20 cm long without rhachis
Etiolated leaves from in vitro grown seedlings
First apical stalk internode of adult plants
First to third meristem internodes
Fourth apical stalk internode of adult plants
Sixth to eleventh internodes
Germinating sett roots
Lateral buds from field grown adult plants
Lateral buds from greenhouse grown adult plants
Leaf roll from field grown adult plants
Leaf roll including apex
Mature leaf tissue
Pool of calli exposed to low (4 °C) and high (37 °C) temperatures
Root apex from adult plants
Root tips 0,3 cm long from adult plants
Shoot-root transition zone from young plants
Shoot-root transition zone from adult plants
Seedlings inoculated with Glucoacetobacter diazotroficans
Seedlings inoculated with Herbaspirillum rubrisubalbicans
Stalk bark from adult plants
Pool of tissues
De novo assembly
To obtain the best assembly with the publicly available sugarcane data, we used a combination of TGICL package with a subsequent scaffolding step using a translation mapping method and the proteome of S. bicolor as reference to assemble contigs belonging to a same coding region. This procedure generated 124,533 tentative consensus (TC) transcripts, designated as TC00001 to TC124533, the majority of which are singletons that may represent untranslated regions but could also contain reads originating from specific sugarcane genes for which no homologous sequences are present in sorghum. The putative function has been assigned to the transcripts based on their homology to known proteins in UniprotKB. GO terms have been assigned to the transcripts through mapping f the best BLASTx match in the UniprotKb to the Gene Ontology association files. This information can be accessed for each TC or a group of TCs on the SymGRASS online.
EST information files for the S. officinarum complex were downloaded from the NCBI dbEST. The experimental information showed that several stages, tissues and organs from sugarcane were sampled and sequenced in the last decade. This collection enabled the delineation of an expression profile for sugarcane TCs, which is presented online as the percentage of a particular TC amongst all TCs. R-value indicates the significance of the observed differential expression be due to biological mechanisms. The user can access the expression profile for a TC over the libraries, for a group of TCs or for a particular library.
Search of symbiosis related genes in the SymGRASS
Out of the 214 exclusively expressed TCs in the SC11 library 134 were annotated to uncharacterized protein, 50 had no match to any protein at all and are considered sugarcane specific transcripts, and the remainder had annotation to defense genes, histones, phosphate transporters, actins and actin cap proteins, but the most interesting was the expression of a calcium calmodulin dependent kinase, a protein responsible for the Ca spike decodification in the SYM pathway , two transcription factors, and one Mutator-like transposon. Regarding the specific TCs observed in the library SC45, 171 out of 272 were annotated to uncharacterized proteins, 54 had no hit to known protein and rather represent sugarcane specific transcripts. It is also interesting to note that two other transcription factors and one retrotransposon were exclusively present in this library. Detailed annotation is available at http://symgrass.dyndns.org.
The TCs annotation for the exclusively expressed genes common to both interaction follows the same pattern as observed for the specific interactions with different symbionts. Of the TCs, 41 out of 64 TCs annotated to uncharacterized proteins, 8 TCs had no hit to known proteins, one TC was annotated as a transcription factor, and two TCs matched to retrotransposons proteins. To our knowledge, this is the first time that transposons and retrotransposons are found exclusively expressed in symbiotic interactions. How the transposable elements contribute to the establishment of the symbiotic interaction yet must be addressed.
Present status and prospects
The present data curation status for the collected sequences in the SymGRASS database is summarized in Table 1. Links to research articles in the field of symbiosis are provided in the database concerning the common SYM pathway. We believe that the candidate sequences for the SYM pathway together with the pool of exclusively expressed TCs provide good background for the design of molecular studies to unravel the mechanisms controlling the establishment of symbioses in sugarcane. Since SymGRASS is an ongoing effort, we will continue to curate and consistently update the database as soon as new or updated data/tools are made available. We also plan to include other grass species such as Sorghum bicolor, Zea mays, Oryza sativa, Setaria italica and Brachypodium distachyon. Additional to more plants we intend to incorporate also bacteria and fungi in the database, and provide a routine to compare genes and grass/microorganism interactions between different species. As species are being added and the amount of publications concerning symbiosis increases, data curation will become more challenging. We expect that SymGRASS becomes a community-based database.
SymGRASS regards the first web based data repository of molecular, cellular and physiological information on grasses and their symbionts, consisting of a user friendly resource available for plant biologists, microbiologists and breeders. The available annotated and assembled candidate sequences for the SYM pathway will help the recognition and manipulation of members involved in mechanisms by which sugarcane (and in the future other grasses) controls establishment and efficiency of endophytic associations.
The publication costs for this article were funded by the corresponding author's institution.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 1, 2013: Computational Intelligence in Bioinformatics and Biostatistics: new trends from the CIBB conference series. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S1.
List of abbreviations used
million years ago
root nodule symbiosis
common symbiosis pathway
The authors thank CNPq (Brazilian Council of Scientific and Technological Development), CAPES (Brazilian Coordination of Improvement of Higher Education Personnel) and DAAD (German Academic Exchange Service) for financial support and fellowships.
- Ercolin F, Reinhardt D: Successful joint ventures of plants: arbuscular mycorrhiza and beyond. Trends Plant Sci. 2011, 16 (7): 356-362. 10.1016/j.tplants.2011.03.006.View ArticlePubMedGoogle Scholar
- Baldani JI, Baldani VLD: History on the biological nitrogen fixation research in graminaceous plants: special emphasis on the Brazilian experience. An Acad Bras Cienc. 2005, 77 (3): 549-579. 10.1590/S0001-37652005000300014.View ArticlePubMedGoogle Scholar
- Goldemberg J: Ethanol for a sustainable energy future. Science. 2007, 315 (5813): 808-810. 10.1126/science.1137013.View ArticlePubMedGoogle Scholar
- Döbereiner J: Biological nitrogen fixation in the tropics: social and economic contributions. Soil Biol Biochem. 1997, 29 (5-6): 771-774. 10.1016/S0038-0717(96)00226-X.View ArticleGoogle Scholar
- Baldani JI, Caruso L, Baldani VLD, Silvia R, Döbereiner J: Recent advances in BNF with non-legume plants. Soil Biol Biochem. 1997, 29 (5-6): 911-922. 10.1016/S0038-0717(96)00218-0.View ArticleGoogle Scholar
- Saravanan VS, Madhaiyan M, Osborne J, Thangaraju M, Sa TM: Ecological occurrence of Gluconacetobacter diazotrophicus and nitrogen-fixing Acetobacteraceae members: their possible role in plant growth promotion. Microb Ecol. 2008, 55 (1): 130-140. 10.1007/s00248-007-9258-6.View ArticlePubMedGoogle Scholar
- EST database (National Center for Biotechnology Information). [http://www.ncbi.nlm.nih.gov/nucest]
- Sequence Cleaner (Seqclean). [http://sourceforge.net/projects/seqclean/]
- TGI Clustering Tool (TGICL v2.1). [http://sourceforge.net/projects/tgicl/]
- Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 2010, 20 (10): 1432-1440. 10.1101/gr.103846.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Protein Knowledgebase (UniprotKB). [http://www.uniprot.org]
- Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R: The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 2009, 37 (Database): D396-D403. 10.1093/nar/gkn803.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH: CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011, 39 (Database issue): D225-D229.PubMed CentralView ArticlePubMedGoogle Scholar
- Stekel DJ, Git Y, Falciani F: The comparison of gene expression from multiple cDNA libraries. Genome Res. 2000, 10 (12): 2055-2061. 10.1101/gr.GR-1325RR.PubMed CentralView ArticlePubMedGoogle Scholar
- Hammami R, Hamida JB, Vergoten G, Fliss Ismail: PhytAMP: a database dedicated to antimicrobial plant peptides. Nucleic Acids Res. 2009, 37 (Database): D963-D968. 10.1093/nar/gkn655.PubMed CentralView ArticlePubMedGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Priyam A, Woodcroft BJ, Wurm Y: SequenceServer: BLAST searching made easy. Unpublished data. 2012Google Scholar
- Banba M, Gutjahr C, Miyao A, Hirochika H, Paszkowski U, Kouchi H, Imaizumi-Anraku H, Divergence of Evolutionary Ways Among Common sym Genes: CASTOR and CCaMK show functional conservation between two symbiosis systems and constitute the root of a common signaling pathway. Plant Cell Physiol. 2008, 49 (11): 1659-1671. 10.1093/pcp/pcn153.View ArticlePubMedGoogle Scholar
- Mithöfer A: Suppression of plant defence in rhizobia-legume symbiosis. Trends Plant Sci. 2002, 7 (10): 440-444. 10.1016/S1360-1385(02)02336-1.View ArticlePubMedGoogle Scholar
- De Hoff PL, Brill LM, Hirsch AM: Plant lectins: the ties that bind in root symbiosis and plant defense. Mol Genet Genomics. 2009, 282 (1): 1-15. 10.1007/s00438-009-0460-8.PubMed CentralView ArticlePubMedGoogle Scholar
- Graham MA, Silverstein KAT, Cannon SB, VandenBosch KA: Computational identification and characterization of novel genes from legumes. Plant Physiol. 2004, 135 (3): 1179-1197. 10.1104/pp.104.037531.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.