Crowdsourcing the nodulation gene network discovery environment
© Li and Jackson. 2016
Received: 31 October 2015
Accepted: 21 May 2016
Published: 26 May 2016
The Legumes (Fabaceae) are an economically and ecologically important group of plant species with the conspicuous capacity for symbiotic nitrogen fixation in root nodules, specialized plant organs containing symbiotic microbes. With the aim of understanding the underlying molecular mechanisms leading to nodulation, many efforts are underway to identify nodulation-related genes and determine how these genes interact with each other. In order to accurately and efficiently reconstruct nodulation gene network, a crowdsourcing platform, CrowdNodNet, was created.
Utilizing the continuously updated, collaboratively written, and community-reviewed Wikipedia model, the platform could, in a short time, become a comprehensive knowledge base of nodulation-related pathways. The platform could also be used for other biological processes, and thus has great potential for integrating and advancing our understanding of the functional genomics and systems biology of any process for any species. The platform is available at http://crowd.bioops.info/, and the source code can be openly accessed at https://github.com/bioops/crowdnodnet under MIT License.
Symbiosis between legumes and rhizobia leads to the development of specialized root organs, called nodules, in which nitrogen-fixing bacteria are accommodated intracellularly and are able to efficiently convert atmospheric nitrogen into ammonia and transfer it to the host plants . The legume nodule symbiosis is a one of the most efficient systems for plants to acquire nitrogen from the atmosphere. Legumes (Fabaceae or Leguminosae) have the unique ability to carry out symbiotic nitrogen fixation with rhizobial bacteria in root nodules. Many efforts are underway to identify nodulation-related genes and their functions in Lotus japonicas, Medicago truncatula, and Glycine max, long-established models for the study of legume biology [2–10]. However, owing to the complexity of the transcriptional regulation of root nodule symbiosis, not all nodulation-related genes have been identified. More importantly, the functions of nodulation-related genes and how they interact with each other are even less well understood. Thus, it is necessary to study nodulation using gene network analysis that can help to reveal many, or even most, of the underlying gene interactions and functions .
Gene networks or pathways can be computationally predicted using biological data, e.g. trancriptomes, protein-protein interactions, Gene Ontology (GO) similarities and text mining . These computational methods often suffer from high rates of false discovery. For example, very few systematic or genome-wide studies of nodulation gene interactions, other than co-expression, have been done [5, 12, 13]. It is difficult to accurately identify the direct interactions using gene co-expression . Second, most of the nodulation genes have orthologs in other plants, including Arabidopsis thaliana , and GO annotations of legume genes are often based on the GO terms of their orthologs in A. thaliana. Since nodulation is peculiar to legumes and a very few other species, the function of many nodulation genes may differ from their orthologs in non-nodulating species such as Arabidopsis. Therefore, the use of GO annotations from orthologous genes may be misleading. Third, text mining can automatically search the scientific literature and extract information for potential gene interactions. Many tools have been developed to reconstruct gene networks from text mining [16–24]. However, the complexity of natural language and the inconsistent use of gene names make text mining error-prone .
We would like to create a user-curated, highly accurate database that would serve as a knowledge base for experimentally verified nodulation genes and their interactions. Furthermore, this resource can also be used as highly accurate prior information in order to increase the power of gene network predictions . Manual curation from peer-reviewed literature is the most accurate, but is time-consuming. In order to increase the efficiency of manual curation, we propose an alternative solution for creating a nodulation gene association database: crowdsourcing. Crowdsourcing is an online activity in which an undefined, but generally large, group of people voluntarily undertake a task via an open call . The most extraordinary example of crowdsourcing is probably Wikipedia (https://www.wikipedia.org/), an online encyclopedia that anyone can edit. The number of Wikipedia articles quickly increased to over four million in the first five years since it was established, and the quality of articles in science was close to those in Encyclopaedia Britannica . The continuously updated, collaboratively written, and community-reviewed wiki model has been applied for both biology and bioinformatics. For example, WikiGenes is a wiki-based platform for the scientific community to collect, communicate and evaluate knowledge about genes, chemicals, diseases and other biomedical concepts . Gene Wiki  and LncRNAWiki  are similar platforms, but focused on human genes and long non-coding RNAs, respectively. WikiPathways is a wiki-based pathway curation resource, coupled with an graphical pathway editing tool .
We present here an online platform for crowdsourcing nodulation gene network reconstruction with three goals: to i) comprehensively integrate knowledge of all known nodulation-related genes and gene interactions in legumes, ii) provide a user-friendly tool for interactively visualizing gene interactions and gene annotations, and iii) utilize the wiki model for collaborative editing.
The platform is open access and open source under MIT License (https://opensource.org/licenses/MIT). The web server is a Ubuntu 14.04.1 virtual machine image pre-built by Bitnami (https://aws.amazon.com/marketplace/pp/B0062NF3ME), and is hosted on a t1.micro instance on Amazon cloud. The image (version 5.5.25-0) has pre-installed software packages for basic web development, including Apache 2.4.12, MySQL 5.6.23 and PHP 5.5.25. The platform has been tested on Firefox 38, Chrome 43, Safari 8, and all features should work in these browsers with equal or higher versions. The platform was written in JavaScipt and PHP, and the source code is hosted on GitHub (https://github.com/bioops/crowdnodnet). With only a few configurations, the source code can be used to build similar platforms for other biological processes. Researchers are welcome and encouraged to modify and improve the source code via GitHub pull requests.
Results and discussion
Users can hover over a gene to see the gene’s full name, and the gene and its edges are highlighted. Users can further click on the gene and a document containing the gene’s annotation information, retrieved from the MediaWiki page, will appear on the left part of the same page ensuring that users can easily access the annotation information for a gene.
New nodes and edges can be added by point-and-click on the main page. Under the editing mode, users can click a blank region to add a new node. A dialog window with a form will appear for users to enter the new gene’s information, including symbol, full name and ID. In order to ensure unique gene IDs and maintain flexibility of editing at the same time, we implemented autosuggestion and autocompletion features for the input form. Thus, users can type in a gene symbol, full name or a Uniprot ID, and a list of possible genes will appear for the user to choose from. In this case, the added gene is automatically assigned a Uniprot ID. Users may manually input new genes without Uniprot IDs via entering a gene ID, symbol, and full name.
A new edge can be added in the same way as a new gene. Users can manually connect two genes under the editing mode. In the dialog window, they will then select the direction (from, to or unknown) and the type (activation or inhibition) of interaction. Different lines represent different types of interactions: solid line with arrow indicates activation; a dashed line with arrow indicates inhibition; and a solid line without arrow indicates other or unknown interactions type (select “unknown” direction). Users can hover over an edge to see the interaction type (“Activates” or “Inhibits”).
MediaWiki (https://www.mediawiki.org/), the open-source software originally developed for Wikipedia, can be easily used to create wiki-like websites. CrowdNodNet has some MediaWiki webpages containing information about the gene network: genes (nodes), gene interactions (edges) and gene annotations. A PHP script parses these webpages and displays the annotation in the main page. Once the MediaWiki pages are modified, the gene annotation shown in the main page is updated accordingly. Therefore, users can edit the gene annotation by modifying MediaWiki pages, much in the same way as editing Wikipedia pages (https://www.mediawiki.org/wiki/Help:Editing_pages). Each gene usually has its own MediaWiki page containing the annotation information. From the main page, the gene’s document appears once a user clicks that gene node. An “edit” link is shown at the top of the document, which allows the user to edit the gene’s MediaWiki page. When a new gene is added and a user wants to annotate that gene, they can create a page using the “Create annotation” link in the gene’s document. Currently, editing is open to everyone and users can edit after registering an account.
Gene annotation information in CrowdNodNet
mRNA and protein sequences
FASTA sequences or links to UniProt or GenBank. If available, the probeset mapping of Affymetrix Lotus GeneChip® is also shown in this part.
Genotypes and phenotypes
The detailed gene function during nodulation and/or arbuscular mycorrhizal (AM) symbiosis
Expression patterns during nodulation and whether expressed in other organs, e.g. root, leaf, and stem.
Interactions with other genes
Experimentally verified gene interactions, e.g. receptor-binding, phosphorylation reactions and protein complexes
Orthologs in legumes and non-legumes, and their functions in AM, actinorhizal and legume-rhizobial symbiosis
The online platform, CrowdNodNet, was created for crowdsourcing nodulation gene networks. Researchers can use the platform to interactively visualize and easily edit this, or just about any, gene network. It is expected that this will become a comprehensive and collaborative knowledge base of nodulation-related genes and pathways which should help researchers access integrated information and share new discoveries in legume-rhizobial symbiosis.
The platform allows users to interactively visualize and edit the gene network in a dynamic and interactive manner, and easily access information about the network. Moreover, all the gene annotation information is written on MediaWiki pages. The network can be edited by by point-and-click on the main page and editing gene information is the same as editing a Wikipedia page, so the learning process is relatively short or even negligible. The platform we constructed is focused on a single pathway. As compared with existing pathway and network databases that include all biological processes and many species, the decentralized and single pathway-based database is easy to circulate within a relatively small community. All these features make it easier to attract experts for continued contribution and development.
CCAMK, Calcium/calmodulin-dependent kinase; NFR1, Nod factor receptor 1; NFR5, Nod factor receptor 5; NIN, Nodule inception; SIP2, SYMRK-interacting protein 2; SYMRK, Symbiosis receptor-like kinase.
We would like to acknowledge the anonymous reviewers for their constructive feedback.
National Science Foundation (MCB 1339194).
Availability of data and materials
Project name: CrowdNodNet.
Project home page: http://crowd.bioops.info/.
Source code repository: https://github.com/bioops/crowdnodnet.
Operating system (s): Platform independent.
Other requirements: Equal or higher version of Firefox 38, Chrome 43, Safari 8.
Any restrictions to use by non-academics: No.
YL and SAJ conceived the project. YL constructed CrowdNodNet. YL and SAJ drafted the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Brewin NJ. Development of the legume root nodule. Annu Rev Cell Biol. 1991;7:191–226.View ArticlePubMedGoogle Scholar
- Stracke S, Kistner C, Yoshida S, Mulder L, Sato S, Kaneko T, et al. A plant receptor-like kinase required for both bacterial and fungal symbiosis. Nature. 2002;417(6892):959–62.View ArticlePubMedGoogle Scholar
- Smit P, Raedts J, Portyanko V, Debelle F, Gough C, Bisseling T, et al. NSP1 of the GRAS protein family is essential for rhizobial Nod factor-induced transcription. Science. 2005;308(5729):1789–91.View ArticlePubMedGoogle Scholar
- Gherbi H, Markmann K, Svistoonoff S, Estevan J, Autran D, Giczey G, et al. SymRK defines a common genetic basis for plant root endosymbioses with arbuscular mycorrhiza fungi, rhizobia, and frankia bacteria. Proc Natl Acad Sci U S A. 2008;105(12):4928–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Madsen LH, Tirichine L, Jurkiewicz A, Sullivan JT, Heckmann AB, Bek AS, et al. The molecular network governing nodule organogenesis and infection in the model legume lotus japonicus. Nat Commun. 2010;1:10.View ArticlePubMedGoogle Scholar
- Chen T, Zhu H, Ke D, Cai K, Wang C, Gou H, et al. A MAP kinase kinase interacts with SymRK and regulates nodule organogenesis in lotus japonicus. Plant Cell. 2012;24(2):823–38.View ArticlePubMedPubMed CentralGoogle Scholar
- Singh S, Katzer K, Lambert J, Cerri M, Parniske M. CYCLOPS, a DNA-binding transcriptional activator, orchestrates symbiotic root nodule development. Cell Host Microbe. 2014;15(2):139–52.View ArticlePubMedGoogle Scholar
- Young ND, Debelle F, Oldroyd GED, Geurts R, Cannon SB, Udvardi MK, et al. The medicago genome provides insight into the evolution of rhizobial symbioses. Nature. 2011;480(7378):520–4.PubMedPubMed CentralGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma JX, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463(7278):178–83.View ArticlePubMedGoogle Scholar
- Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, Nakao M, et al. Genome structure of the legume. Lotus japonicus DNA Res. 2008;15(4):227–39.View ArticlePubMedGoogle Scholar
- Li YP, Pearl SA, Jackson SA. Gene networks in plant biology: approaches in reconstruction and analysis. Trends Plant Sci. 2015;20(10):664–75.View ArticlePubMedGoogle Scholar
- Zhu M, Dahmen JL, Stacey G, Cheng J. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data. BMC Bioinf. 2013;14:278.View ArticleGoogle Scholar
- Soyano T, Hayashi M. Transcriptional networks leading to symbiotic nodule organogenesis. Curr Opin Plant Biol. 2014;20:146–54.View ArticlePubMedGoogle Scholar
- Djordjevic D, Yang A, Zadoorian A, Rungrugeecharoen K, Ho JWK. How difficult is inference of mammalian causal gene regulatory networks? PLoS One. 2014;9(11), e111661.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhu H, Riely BK, Burns NJ, Ane JM. Tracing nonlegume orthologs of legume genes required for nodulation and arbuscular mycorrhizal symbioses. Genetics. 2006;172(4):2491–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Yuryev A, Mulyukov Z, Kotelnikova E, Maslov S, Egorov S, Nikitin A, et al. Automatic pathway building in biological association networks. BMC Bioinf. 2006;7:171.View ArticleGoogle Scholar
- Bandy J, Milward D, Mcquay S. Mining protein-protein interactions from published literature using linguamatics I2E. Protein Netw Pathway Analysis. 2009;563:3–13.View ArticleGoogle Scholar
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, et al. STRING 8-a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009;37:D412–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Song YL, Chen SS. Text mining biomedical literature for constructing gene regulatory networks. Interdiscip Sci. 2009;1(3):179–86.View ArticlePubMedGoogle Scholar
- Ananiadou S, Pyysalo S, Tsujii J, Kell DB. Event extraction for systems biology by text mining the literature. Trends Biotechnol. 2010;28(7):381–90.View ArticlePubMedGoogle Scholar
- Kemper B, Matsuzaki T, Matsuoka Y, Tsuruoka Y, Kitano H, Ananiadou S, et al. PathText: a text mining integrator for biological pathway visualizations. Bioinformatics. 2010;26(12):I374–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Krallinger M, Leitner F, Valencia A. Analysis of biological processes and diseases using text mining approaches. Bioinf Methods Clin Res. 2010;593:341–82.View ArticleGoogle Scholar
- Usie A, Karathia H, Teixido I, Valls J, Faus X, Alves R, et al. Biblio-MetReS: a bibliometric network reconstruction application and server. BMC Bioinf. 2011;12:387.View ArticleGoogle Scholar
- Tibiche C, Wang E. GeneNetMiner: accurately mining gene regulatory networks from literature. eprint arXiv: 1409.1975. 2014, [http://arxiv.org/abs/1409.1975].
- Winnenburg R, Wachter T, Plake C, Doms A, Schroeder M. Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies. Brief Bioinform. 2008;9(6):466–78.View ArticlePubMedGoogle Scholar
- Li YP, Jackson SA. Gene network reconstruction by integration of prior biological knowledge. G3-Genes Genom Genet. 2015;5(6):1075–9.Google Scholar
- Estelles-Arolas E, Gonzalez-Ladron-De-Guevara F. Towards an integrated crowdsourcing definition. J Inf Sci. 2012;38(2):189–200.View ArticleGoogle Scholar
- Giles J. Internet encyclopaedias go head to head. Nature. 2005;438(7070):900–1.View ArticlePubMedGoogle Scholar
- Hoffmann R. A wiki for the life sciences where authorship matters. Nat Genet. 2008;40(9):1047–51.View ArticlePubMedGoogle Scholar
- Huss JW, Orozco C, Goodale J, Wu CL, Batalov S, Vickers TJ, et al. A gene wiki for community annotation of gene function. PLoS Biol. 2008;6(7):1398–402.View ArticleGoogle Scholar
- Ma LN, Li A, Zou D, Xu XJ, Xia L, Yu J, et al. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res. 2015;43(D1):D187–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C. WikiPathways: pathway editing for the people. PLoS Biol. 2008;6(7):1403–7.View ArticleGoogle Scholar
- Antolin-Llovera M, Ried MK, Parniske M. Cleavage of the SYMBIOSIS RECEPTOR-LIKE KINASE ectodomain promotes complex formation with Nod factor receptor 5. Curr Biol. 2014;24(4):422–7.View ArticlePubMedGoogle Scholar