An Atlas of annotations of Hydra vulgaris transcriptome
© The Author(s) 2016
Published: 22 September 2016
RNA sequencing takes advantage of the Next Generation Sequencing (NGS) technologies for analyzing RNA transcript counts with an excellent accuracy. Trying to interpret this huge amount of data in biological information is still a key issue, reason for which the creation of web-resources useful for their analysis is highly desiderable.
Starting from a previous work, Transcriptator, we present the Atlas of Hydra’s vulgaris, an extensible web tool in which its complete transcriptome is annotated. In order to provide to the users an advantageous resource that include the whole functional annotated transcriptome of Hydra vulgaris water polyp, we implemented the Atlas web-tool contains 31.988 accesible and downloadable transcripts of this non-reference model organism.
Atlas, as a freely available resource, can be considered a valuable tool to rapidly retrieve functional annotation for transcripts differentially expressed in Hydra vulgaris exposed to the distinct experimental treatments.
Web resource URL
Hydra vulgaris is a small fresh water organism belonging to genus Hydra of the phylum cnidaria and class hydrozoa. The genus Hydra is well known for its regeneration capability, firstly observed by Abraham Trembley in 1744. Since the last two hundred years, it attracts the interest of the scientific community because of its unique regeneration ability, and it appears not to age or die’ status. In particular, researchers show interest in studying Hydra as model organism with respect to diverse biological research realms ranging from embryogenesis , nervous system development , aging mechanism , and to the effects of toxicity in ecosystems . Recently, hydra also become very popular in stem cell research due to the inherent nature of its specific ectodermal, endodermal epithelial and interstitial stem cells . Though the cellular organization of hydra is well established, researchers are working on the molecular mechanisms behind the above mentioned aspects of hydra, more specifically at the molecular level. In 2010, a draft genome of Hydra magnipappilata  was reported. Recently, the transcriptomics analysis of hydra  has been carried out to unveil the genetic cascades upholding the biological demeanor with respect to regeneration ability, such as immunity, cell cycle regulation, cell death, transcription and chromatin regulation. However, generally in case of Hydra, the interpretation of transcriptomics data in the absence of well annotated genome or transcriptome is a difficult task, and without the help of biologists friendly tools, it appears to be a problematic case. By searching the literature, we observed that only two web resources are available: Compagen  and Cnidbase . On the one hand, Compagen basically stores all the raw and processed sequences from sponges, cnidarians, tunicates and lower vertibrates to retrospect evolutionary relationship among them. It is a comparative genomics platform, though it lacks in reflecting any functional annotation aspect of sequences associated to hydra genus. On the other hand, Cnidbase is a evolutionary genomics database, which basically highlights the evolutionary relationship among various species in phylum Cnidaria. Both these resources does not provide functional aspects of the Hydra transcripts. To acknowledge this limitation, we previously developed a HvDbase database to integrate 15,522 transcripts along with their functional information . We upgraded this resource to develop a new web application Atlas to store Hydra vulgaris specific transcripts and annotate all the relevant functional information with respect to GO terms, pathways, protein domains and other important data and information using Transcriptator software . Atlas is an easy to use application to obtain functionally related information for each and every transcript. Each entry is also hyper-linked with external database for crosschecking and further downstream analysis. At present, around 70 % of the Hydra vulgaris transcritome is annotated and managed by Atlas application.
Transcriptomic data retrieval
The Hydra vulgaris RNA-Seq transcriptomic data were published in a prevoius research work . They produced RNA-Seq transcriptome by Illumina and 454 reads obtained from theHydra vulgaris strain "Basel". The assembly of reads were carried out by both genome assisted (using Hydra magnipapillata genome) as well as de-novo based assembly. Finally, a dataset was obtained with the longest ORFs, both from genome assisted and de-novo assembly methodology was obtained. It contains 48,909 sequences, out of which the 45,269 transcripts longer than 200 base pairs have been deposited to European Nucleotide Archieve (ENA), with accession numbers HAAC01000001-HAAC01045269. We retrieved the raw transcripts data for annotation purposes and carried out our downstream analysis.
Atlas web application is designed to accommodate a vast amount of information ranging from Gene Ontological (GO) terms related to biological activity, molecular function and cellular components with respect to each stored transcript. It also took into account associated protein domain information from various protein domains databases such as COG, Inter-Pro, PFAM and SMART. In Atlas, we also include enriched pathways information from KEGG, Panther, BioCarta for each given transcripts related to Hydra vulgaris. This information is relevant to dissect high level biological function and biomolecular interaction network in cellular context. It also provides information about interaction partners for the protein products of the respective transcripts. To gather this information, protein interaction databases BIND and MINT are exhaustively searched and indexed in Atlas. Apart from functional aspects, it also reports other relevant information from Swiss-Prot, UniProt-Knowledgebase and OMIM.
Atlas application is based on Transcriptator workflow [11, 12]. This pipeline employs web-services from DAVID [13, 14] and Quick-GO . DAVID web-service client is written in python utilizing light weight soap client suds-0.4 module . The client for Quick-GO uses python package ’Bio-Services’ and provides wrapper framework based on wsdl/SOAP and REST protocols to the basic pipeline. The main purpose of the pipeline is to annotate the given transcript(s) for functional and biological relevant information. To achieve this, it carries out the processing in four main steps: a) finding the best hit protein for a given transcript sequence in locally installed Swiss-Prot  and Uni-Prot  Blast  formatted databases; b) obtaining functionally relevant information for best hit protein from DAVID database; c) assigning GO slim terms to these protein hits from Quick-Go database; d) integrating all the relevant information in tabular and graphical format for the respective best hit protein, for the given transcript. Blast search is carried out on local cluster, while the second and third steps simultaneously employs the above mentioned DAVID and QUICK-GO web-services. The last step, integrates the results and carry out statistical analysis and generate easy to read tables and graphical charts.
Resource development and description
Results and discussion
General framework of Atlas
Atlas consists of seven sections, among which the Transcriptome section, conceived to contain two separate web pages Transcripts List and Database List, represents the resource’s core. The Transcript List subsection hosts the whole transcripts list of the Hydra vulgaris transcriptome, as well as associated functional annotations through custom made Python scripts to access open source tools and public databases (Fig. 1). The second subsection Database List queries the five database sets, which we have hosted and suitably merged in: Domain, Ontology, Pathways, Interaction and Miscellaneous. All the other sections were considered to host in-depth pages contents of the web application.
Atlas collects data for 19 different functional terms, deriving from scientific repositories and integrates them in tables that can be ordered by column and filtered for features, in order to be easily readable. The information of each single transcript were organized, under Transcripts List web-page, in: Ena Id, Uniprot Id, Name, Score, E-Value and, under Databases List web-page, in five databases groups in which additional specific descriptions are reported. Moreover, all parameters or databases have more in-depth explanations at the bottom of the page.
Statistical analysis of Hydra vulgaris transcripts
The Atlas content at a glance
A case study
The new high-throughput technologies allow us to sequence new organisms in a fast and easy way, but the problem they pose is to infer the relevant information in the huge amount of data returned from the experiments. A database designing devoted to non-reference model organisms is needed. We have developed an elegant approach to address the de-novo assembled reads from Hydra vulgaris and to formulate the structure to handle the functional annotation information for all those organisms which are not referenced, and for which there is very little information. Atlas is an intuitive and easy-to-use web resource for researchers interested in studies of this non-reference model organism which can be extended to the cases where the transcriptome is available, but the genome is not yet well annotated. Atlas has been designed to integrate 19 repositories of functional annotations and several functionalities, for which it is possible to gain access without credentials. Moreover, being a modular platform, it is easily scalable and customizable for future demands and developments. This work is likely to constitute an interesting starting point for developing similar web-resources. Indeed, we are processing new functional annotation data, in order to upgrade the Atlas and make it much more informative and attractive.
Publication of this article was funded by INTEROMICS flagship Italian project, PON02-00612-3461281 and PON02-00619-3470457. Mario R. Guarracino work has been conducted at National Research University Higher School of Economics and supported by RSF grant 14-41-00039.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 11, 2016. Selected articles from the 11th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2014). The full contents of the supplement are available online https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-11.
Availability of data and materials
All supporting data are included within the manuscript and the web resource.
MRG, DE, KPT designed the study and wrote the manuscript. DE, KPT performed the statistical analysis of novel transcripts and developed the computational framework. DE collected the data, implemented the web resource and developed the database; KPT developed the Python scripts to perform the functional annotation. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Martin VJ, Littlefield CL, Archer WE, Bode HR. Embryogenesis in hydra. MBL. 1997; 192:345–63.Google Scholar
- Burnett AL, Diehl NA. The nervous system of hydra. i. types, distribution and origin of nerve elements. Wiley Online Library. 1964; 157:217–26.Google Scholar
- Tomczyk S, Fischer K, Austad S, Galliot B. Hydra, a powerful model for aging studies. Taylor & Francis. 2015; 59:11–6.Google Scholar
- Castillo GC, Vila IC, Neild E. Ecotoxicity assessment of metals and wastewater using multitrophic assays. Wiley Online Library. 2000; 15:370–5.Google Scholar
- Bosch TC. Stem cells: from hydra to man: Springer; 2008.Google Scholar
- Chapman JA, Kirkness EF, Simakov O, Hampson SE, Mitros T, Weinmaier T, Rattei T, Balasubramanian PG, Borman J, Busam D, et al. The dynamic genome of hydra. Nat Publishing Group. 2010; 464:592–6.Google Scholar
- Wenger Y, Galliot B. Rnaseq versus genome-predicted transcriptomes: A large population of novel transcripts identified in an illumina-454 hydra transcriptome. BMC Genomics. 2013. doi:10.1186/1471-2164-14-204.
- Hemmrich G, Bosch TC. Compagen, a comparative genomics platform for early branching metazoan animals, reveals early origins of genes regulating stem-cell differentiation. Wiley Online Library. 2008; 30:1010–8.Google Scholar
- Ryan JF, Finnerty JR. Cnidbase: the cnidarian evolutionary genomics database. Oxf Univ Press. 2003; 31:159–63.Google Scholar
- Evangelista D, Tripathi KP, Scuotto V, Guarracino MR. Hvdbase: A web resource on hydra vulgaris transcriptome. Lecture Notes in Computer Science. V 9044. Springer International Publishing: 2015. p. 355–62, doi:10.1007/978-3-319-16480-9_35.
- Tripathi KP, Evangelista D, Zuccaro A, Guarracino MR. Transcriptator: An automated computational pipeline to annotate assembled reads and identify non coding rna. Public Libr Sci. 2015; 10:0140268.Google Scholar
- Tripathi KP, Evangelista D, Cassandra R, Guarracino MR. Transcriptator: a computational pipeline to annotate transcripts and assembled reads from rna- seq data In: Springer, editor. Lecture Notes in Bioinformatics, XI International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics: 19-21 October 2011; Cambridge (UK): 2014.Google Scholar
- Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, et al. David bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007; 35(suppl 2):169–75.View ArticleGoogle Scholar
- Dennis Jr G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA, et al. David: database for annotation, visualization, and integrated discovery. Genome Biology. 2003; 4:3. doi:10.1186/gb-2003-4-9-r60.View ArticleGoogle Scholar
- Binns D, Dimmer E, Huntley R, Barrell D, O’Donovan C, Apweiler R. Quickgo: a web-based tool for gene ontology searching. Oxf Univ Press. 2009; 25:3045–6.Google Scholar
- Lightweight SOAP Client. https://pypi.python.org/pypi/suds.
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, et al. The swiss-prot protein knowledgebase and its supplement trembl in 2003. Oxf Univ Press. 2003; 31:365–70.Google Scholar
- Consortium U, et al. The universal protein resource (uniprot). Oxf Univ Press. 2008; 36:190–195.Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Elsevier. 1990; 215:403–410.Google Scholar
- Scarpato M, Esposito R, Evangelista D, et al. Analysis of expression on human chromosome 21, ale-hsa21: a pilot integrated web resource. Database. 2014. doi:10.1093/database/bau009.
- The Apache HTTP Server Project. http://httpd.apache.org/.
- The World’s Most Popular Open Source Database. https://www.mysql.com/.
- phpMyAdmin to Handle the Administration of MySQL over the Web. http://www.phpmyadmin.net/home_page/index.php.
- A Popular General-purpose Scripting Language that Is Especially Suited to Web Development. http://php.net/.
- The Lightweight, Interpreted, Object-oriented Scripting Language for Web Pages. http://www.ecma-international.org/.
- HTML: the Markup Language for Describing Web Documents. http://www.w3.org/TR/html5/.
- Cascading Style Sheets: a Mechanism for Adding Style to Web Documents. http://www.w3.org/Style/CSS/.
- The World Wide Web Consortium. http://www.w3.org.
- Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat Protoc. 2009. doi:10.1038/nprot.2008.211..
- Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009. doi:10.1093/nar/gkn923.