ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data

Background In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. Results We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. Conclusions ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1494-2) contains supplementary material, which is available to authorized users.


Description
ATGC is a web application that allows users to work with NGS transcriptomic data without a reference genome, using an ontology driven database schema to data store and management and provide interfaces to create schema structure and load ontologies and data, visualization, searching possibilities and data integration. Using this application is possible visualize, explore, analyze and share de novo transcriptomic data generated by NGS platforms using a Chado database to store the data. It is an open source and free available application, with support to store information in several Chado modules and uses different ontologies to classify data and then explore this data using a ontology structure with graphics, searches and description tables. ATGC is open source, so all we ask is that you cite our paper in any publications that use this application:

Cita
ATGC is a open source application, for more information, new releases, download source code and complete virtual machines (see below) or download the manual please visit the homepage: http://atgcinta.sourceforge.net ATGC is available for installing in two ways, the first is a complete instalation mode (only tested for Linux machines) or the second is the use of virtual machines, downloading a VM image with the complete application (all requirements installed there). For detailed installation instructions please view the next points of this manual.

Install application
The application is dowloadble from: http://atgcinta.sourceforge.net, from this URL is possible to download source code and virtual machine (VM) images.

System Requirements
The software have been tested in the following Operative Systems: • Distributions of Linux derived from Debian (Extensively tested in Debian7.0/5.0, Ubuntu12.04/14.04/15.04/16.04) • Distributions of Linux derived from Red-Hat (Tested in Fedora20) • For Windows or Mac (if you want also in any Linux distribution) please view the details below.
The application was tested (and run correctly) in a machine with: • 1 CPU • 1 Gb RAM (Using 500 Mb in average)

Software requirements
The ATGC application require the following to run succesfully. In the absence of one or more of these packages, some ATGC parts may fail to run correctly. Listed in parenthesis are the versions used to test the application. These versions, or subsequent versions should assure the proper execution. These utilities must be accessible via the system path:

PostgeSQL configuration
The application need to create and enable the user with which are running in the postgresql package. For this, execute the following commands replacing <username> by the correct name.

Uncompress application file
To uncompress the application run the following command: tar xzvf ATGC-1.0_source.tar.gz This file contains a directoy called web2py and include all files to run the application.
For the specific files for ATGC application, you must go to <path to>/web2py/applications/ATGC

Runing application (Using Web2Py executable)
The password is needed when the user want to access to the administrative interface.
For the virtual machine mode installation replace <path to> by /home/atgc, for complete instalations replace by the correct path.

Using Apache web server
To use apache http server, you must install 2 packages and configure Web2py applications on apache configuration files with the follow commands:
Then, open /etc/apache2/sites-enabled/web2py.conf file and replace: <hostname> for the correct hostname defined in /etc/hosts file with the selected IP adress (atgc-VirtualBox in the virtual machine) <username> for the correct username (atgc in the case of virtual machine) <path to web2py home> for the complete path to web2py main directory (/home/atgc/web2py in the case of the virtual machine) Put the follow two lines in /etc/apache2/apache2.conf file:

Application access using the web browser
To access to the application, write this URL in the web browser: http://localhost:8000 (other options above)

Database creation and load ontologies (Setup menu)
For the application usage, the first steps are: create a Chado database and load ontologies in the database. You must use the functions in the setup option of the navigation bar. When you start the application for the first time or when not exists a database, the application looks like this:

Database creation
For database creation you only must choose a database name, and the application creates a database using a template of Chado schema and automatically load basic ontologies:

Database selection
You can have several databases in the same instance of the application, but you must select the database with you want work.

Load Ontologies
After database creation, you must load the ontologies into the database, the main ontologies to follow with the data loading are Sequence Ontology (SO) and Gene Ontology (GO). You can load ontologies using "Load Ontology" option in Setup menu. When you load these two ontologies, you can follow loading data into the database (i.e. fasta files, GO annotations).
If you use a software with a output you want load into the database using a GO ontology, you must load <software>2GO file using "Load dbxref2ontology file" option. One example of this, is InterProScan software, you can download interpro2go file from http://geneontology.org/external2go/ and load this file to generate relationships between InterPro and GO terms into the database (after loading GO ontology).

Dump and Restore
The "Dump Chado DB" is used to do a backup a complete database in a "sql" compatible file format. Then is possible to use this file to restore a database to a previous state (before an error or unwanted changes) using the "Restore Chado DB" option.

Create Organism
The first step then load ontologies is the creation of a organism for the database, along with this step, you must load a image to identify your organism, this image will be used in the application header (left image) and as image in the browser tab. Only is possible create one organism by database, if you need to use the application for several organisms you must create several databases (one for each organism). For this job, you don't need a new complete proccess of database creation and load ontologies, you can make a dump of the original database, create a new database and then, you can load the complete "dump" of the original database into the new database (basically creating a copy of the original database) with "Restore Chado DB".
After create an organism, you can start with loading the data, creating experiments, libraries, lines and load features from fasta files and lists files (for features without sequence). The menu looks like this:

Load features from fasta
After creating the organism, you can load the sequences of transcripts or genes in "fasta" format, choosing the feature type for Sequence Ontology (contig for example), then all data can be loaded using this sequences as repository or reference.
Loading features in fasta format allows to add all associated information, such as, functional annotation, blast results, expression levels, markers, alleles, genotypes and relationships with other features.

Features → CV associations (load cvterms for features)
Associations from features to controlled vocabularies can be made using several sofware results, such as, Blast2GO (annot file), Interproscan (raw file), RFAM (gff3 file) or tab files (in general). To load Interproscan results you must load "interpro2go" file from: http://geneontology.org/external2go/, to load RFAM results you can load "rfam2go" file from the same place or "rfam2so" file developed by ATGC creators placed into: web2py/applications/ATGC/private/ontology directory. The information placed on "Description" field will be used to identificate the source of annotations in feature detail pages.

Blast run results (XML Files)
In the same way that CV association, is possible to load the results of Blast alignments against a database using xml as output type.

Genotypes (lines), markers and alleles
You can load a set of markers and genotypes defined by alleles of this markers, to make this, you must load first a genotypes and then the markers and alleles.
Is possible to describe the characteristics of the genotypes using terms of controlled vocabularies, in the tab of Lines → CV associations, first choosing the CV (for example INTA_CV or any ontology previously loaded) and then adding a characteristic and value for this line.
For example, setting the characteristic "Fertility" on "Yes" 13 To load markers you can use Load Markers section, given the markers infomation on several file types, such as: VCF, CSV or specific formats of some sofwares. If you want to create a relation between markers and genotype, you must to select the option "Allele creation and line association" and select the genotype name.
In the same way, if you want create an association of marker with other line, you can use the window: Markers → Line associations, from a csv file to describe the allele of each marker in the new line. 14

Load expression information
To load expression information in the database, you first create a estructure of experiments (with your characteristics, any treatment in the assay is a different experiment, for example, if you have two treatments, control and treated, you must to create two experiments) and libraries (like biological or technical replicates) and finally you can use the "Feature → Library associations" window to load expression or other measure variables to any feature (contigs for example).
Is very important to add information to fully describe the experiments, then this information will be used to create dynamic expression graphics in feature detail pages. 15

Feature relationships
You can load relationships between features, for example, if you predict clusters of contigs that possible come from the same gene, you can load this information using the option of Figure 19, you must define the type of the relationship using ontology terms and load previously contigs and genes as features.
Other posibility to load feature relationships is using gff3 files, and you can choose the type of parental relationship between features (for example, relation between exon and transcript), in this case, is only necesary have loaded the references of the gff3 file as features in the database. 17

Search features by name
Using the application, you can explore all information related to the features from different ways, one of this ways, is search features from the name of the feature, or an expression related with the name (using % character as wildcard) 18

Search features by ontology term name or accession
In the other way, you can search features by functional annotation, using ontology term name or accession, and then obtain the list of features annotated with this ontology term.
The results of these searches can be see on two formats, the list of features annotated (direct and indirect) called term list view or a summary called term list view, with only the amount of features of each condition.

Search by Blast matches
The last way, is search through the headers or descriptions of blast results for each transcript, such as a more general and nonspecific form to relate features with functional annotation or similar sequences.

Ontology exploration
You can explore the data across the ontology using the graph structure of terms and connections, showing a pie chart with the distibution of annotated features of each ontology term and a dropdown menu to move along the graph.

Download
Using the Download section, is possible to obtain sequences and annotations from database to text files to make other analysis, such as, functional enrichment. You can select the sequence type and download directly the file in fasta format or download the complete set of functional annotations in tabular format. 24

Software
Using the application is possible to make Blast alignments from sequences stored in the database as subject and external sequences as query.

Blast
You can make blast aligments of query sequences using the sequences stored in the database as subject. The first step to make this alignments is to create a database choosing the sequences of interest from the database. For example in the Figure 33 the user is creating a Blast database called contigs that contain the sequences of all contigs stored in the database.
With the database created, you can make the alignment using the option Software → BLAST → Run BLAST, in this place is possible to choose the type of Blast alignment among the options: BlastN, TblastN or TblastX. The query sequence or sequences can be inserted directly in the text box or using a text file. 25

Modify and delete
You can modify and remove data stored in the database using the section "Modify and Delete", having two options to select the work entry: Using the entry name or selecting the entry from a complete list of features.
27 To share information, you can use a Web2py internal web server or install and configure apache web server, in any way you can choose that part of menu bar is visible to other people ass follows: Go to the follow direction in the web browser and enter the password choosen when you run Web2py (Web2py administrative interface, only available whit the Web2py web server): localhost:8000/admin 29 Figure 40: Web2py administration interface: write a password In the first page, you must click on "Manage" and choose "Edit" option In the next page, you can see the table to set up the menu bar, at this point, you can choose which part of menu is visible or not, setting on "True" or "False" the field called "is_visible" for the correct table entry. For example, to hide the "Setup" menu you must complete the fields as the next figure: For others main parts of the menu bar, the next table has a id value, you can hide any option of the menu, such as, complete part (for example, all Setup menu) or a single part (for example, only):

Menu option
Id (