- Open Access
TranSeqAnnotator: large-scale analysis of transcriptomic data
BMC Bioinformatics volume 13, Article number: S24 (2012)
The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs.
TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels.
TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at http://estexplorer.biolinfo.org/TranSeqAnnotator/.
Expressed sequence tags or ESTs, derived from complementary DNA (cDNA) libraries provide a low-cost transcriptomic alternative to whole genome sequencing as these are short, unedited, randomly selected single-pass sequence reads of approximately 200-800 base pairs (bp) which represent a small region or a part of nucleotide sequence from a transcribed protein coding or non-coding messenger mRNA. They play vital role in gene identification and verification of gene prediction as they represent the expressed region of a genome. The analysis of EST data can facilitate gene discovery, help in gene structure identification, complement genome annotation, establish the viability of alternative transcripts, direct single nucleotide polymorphism (SNP) characterization and facilitate proteomic exploration [1–3]. They were used as the primary source for human gene discovery in early 1990s . Besides ESTs, millions of sequencing reads of 35-250 bp are generated with the advent of "next-generation" sequencing (NGS) which further help in the study of transcriptome data mainly for neglected organisms and also, understanding different isoforms of an organism at different stages of development. Studies using experimental proteomic approach have shown the identification of proteins in ESP with transcriptome assembly . Many challenges are faced in the areas of bioinformatics analysis in data storage and management solution and developing informatics tools for analysis with the focus on sequence quality scoring, alignment, assembly, and data processing with the advent of short read strategy of NGS [6, 7]. A comprehensive analysis pipeline is required to store, organize and annotate ESTs with several computational tools for pre-processing, clustering, assembly into contiguous segments known as contigs and annotation to yield biological information. The web resources available were reviewed for large-scale EST dataset at each step including clustering, assembly, consensus generation and tools for DNA, protein and ES annotation . A number of analysis steps and tools confounded computational strategies to organize and analyse transcriptomic dataset  which is compounded by the ability of some tools to handle high-throughput EST data. An evaluation revealed that all available platforms terminated prior to downstream functional annotation, including gene ontologies (GOs), motif/pattern analysis and pathway mapping. Hence, the establishment of a comprehensive large-scale transcriptomic analysis pipeline  was required to be developed to keep up with the rapidity with which enormous amounts of sequence data are currently being generated. An urgent need for advanced, high-throughput computational analyses of EST and genomic sequence datasets using automated platforms is highlighted. EST data are been applied to study of functional biomolecules [9, 10] but, predicting ES proteins, from ESTs have been uncommon. Excretory/Secretory (ES) products are the molecules excreted or secreted by a cell or an organism that can circulate throughout the body of an organism (e.g., in the extracellular space) or are localized to or released from the cell surface, making them readily accessible to drugs and/or the immune system. ES products cover 8 ± 20% of the proteome of an organism  and include molecules of varied functionality, including chemokines, digestive enzymes cytokines, hormones, toxins, antibodies, morphogens, extracellular proteinases and antimicrobial peptides. They are known to be involved in vital biological processes, including cell adhesion, cell migration, cell-cell communication, differentiation, proliferation, morphogenesis and immune responses . Biochemical and immunological studies of parasitic helminths were focussed on ES proteins. Worms secrete biologically active mediators which can transform or customize their niche within the host [13–15] to regulate or to elude immune attack or stimulate a particular host response.
Some platforms terminate at the assembly level, providing contigs and singletons  (referred to as rESTs) while other platforms exclusively run nucleotide-based programs with limited annotation at the protein level [17–20]. Based on the benchmarking results, a robust transcriptome analysis pipeline (TranSeqAnnotator) is constructed with contig generation from ESTs and short reads, updated pathway analysis, non-classically secreted protein identification and extensive annotation with an option to select specific analysis phases by users (detailed below). Proteins secreted by classical and non-classical pathways are identified by a combination of computational approaches to predict ESPs. The pipeline accepts ESTs, quality values, protein sequences and short reads as input and provides as output, assembled rESTs and their annotations including gene ontologies, secretory proteins, mapping to protein domains, motifs, metabolic pathways and interaction databases. TranSeqAnnotator (TSA) is available as web service and can be downloaded for local installation.
TranSeqAnnotator workflow has three phases with Phase I (a) for EST or (b) short read fasta sequence pre-processing, assembly, conceptual translation and blast against NR, Phase II for the identification of putative ES proteins, from classically and non-classically secreted proteins and the elimination of transmembrane proteins and Phase III for the combined annotation of the protein sequence and ES proteins involving a carefully selected suite of bioinformatic tools, based on a large-scale transcriptome analysis  (Figure 1). TranSeqAnnotator currently implements the genetic codes for 15 organisms, covering the most studied organisms, including human, rat, pig, dog, chicken, rice, wheat, thale cress (Arabidopsis thaliana), zebrafish, yeast and a free-living roundworm (Caenorhabditis elegans).
Phase I accept ESTs and short reads as well as quality values in the case of ESTs as input for pre-processing and assembly (Figure 1).
The sequence cleaning step uses seqclean  and seqtrim  with ESTs alone and with ESTs and quality sequences respectively followed by masking the repeats using RepeatMasker  which is optional. The Phase I (b) accepts short reads and pre-processing is carried out using seqclean. The masked sequences are then passed on for clustering and assembly with iAssembler http://bioinfo.bti.cornell.edu/tool/iAssembler/ which incorporates MIRA  and CAP3 assemblers for ESTs and short reads. For conceptual translation into proteins, the program ESTScan  applies the genetic code from the nearest organism to the contig and singleton sequences generated by CAP3 or iAssembler.
In Phase II, the protein sequences generated in Phase I, using TMHMM  and putative ES proteins identified using SecretomeP  are annotated (Figure 1). Firstly, the signal sequence is checked with SignalP while, SecretomeP looks for non-classically secreted proteins and the hidden Markov model probability scores (SignalPNN and SignalP-HMM), using default parameters that can be modified by experienced users. Subsequently, all proteins with signal sequences are passed on to TMHMM, a hidden Markov model-based transmembrane helix prediction program, to ''filter out'' of transmembrane proteins. ES proteins, the subset lacking transmembrane helices are further annotated. Phase III, the annotation level for protein sequences or ES proteins comprises a suite of computational tools InterProScan  for domain analysis and Gene Ontology, pathway mapping using KOBAS (KEGG Orthology-Based Annotation System) [30, 31]. Also, protein BLAST is employed to search databases derived from Wormpep  for locating nematode homologues and a list of homologous proteins in C. elegans, archived in WormBase as well as interaction databases like IntAct , BioGrid  and DIP  which give information on molecular interaction data and experimentally verified protein-protein interactions.
TSA accepts a dataset submitted by the user and optional programs can be selected as required (Figure 2). The progress of the analysis is monitored on the status page which is updated after each selected process is completed and the output of each program is available along with a summarized output. Some of these tools are provided in the ESTExplorer  and EST2Secretome  pipeline but, the analysis of large-scale EST dataset and short read sequences with updated bioinformatics tools is incorporated with TranSeqAnnotator as part of the benchmarking with the large-scale analysis of Teladorsagia circumcincta dataset (unpublished work). Also, the program SecretomeP showed the identification of important proteins which the previous pipelines failed to identify with SignalP. The identification of both classically and non-classically secreted proteins with secretomeP is the highlight of the robust analysis pipeline as our earlier analysis on Fasciola hepatica .
TranSeqAnnotator is developed using PERL v5.10.0 which links the different bioinformatics programs and MySQL as backend for data management and analysis. The front end is developed using PHP and the processes are run based on CPU availability. Each input sequence submitted by the user is tagged with a request ID to trace the process. The pipeline runs on a 16-node Linux cluster (2.4 GHz, Intel(R) Xeon (R) CPU, 16 Processors, 32 GB RAM) running on ubuntu server operating system. The output files for viewing and downloading are provided as final results which are available for a week.
Results and discussion
Application of TranSeqAnnotator
Ascaris lumbricoides, the soil-transmitted helminths or geohelminths is the largest common intestinal nematode parasites of human that causes the disease ascariasis . It infects an estimated 1.2 billion people worldwide, but is usually asymptomatic . 1822 A. lumbricoides EST sequences from dbEST , were analysed using the TranSeqAnnotator. The dataset is from the adult male whole body Ascaris lumbricoides cDNA clone. The phase I of pre-processing (SeqClean and RepeatMasker) aligned/clustered using CAP3 followed by assembly, was carried out which yielded 236 contigs and 658 singletons. These rESTs were mapped to the non-redundant (NR) dataset using BLAST, for nucleotide level annotation. Using a translational matrix, ESTScan conceptually translates these high quality rESTs, which are then transferred to Phase II of TSA, for the prediction of ES proteins, by sequentially running SecretomeP (with a threshold value for the NN-score of 0.9) and TMHMM programs. The cluster dataset, translated peptide sequences and ES proteins were annotated with biochemical pathways, employing KOBAS, domain/family motif and GeneOntology using InterProScan. The query sequences were compared using BLASTP against Wormpep  and against the IntAct database (version 1.7.0) to extract all interaction partners. The 894 rESTS were conceptually translated to yield 510 peptide sequences. The GO terms were identified for these putative protein sequences using InterProScan, with 108 peptide sequences assigned biological process (BP), 156 associated with molecular function (MF) and 83 as part of a cellular component (CC) (Additional File 1). The analysis revealed that translation (GO:0006412) and oxidation-reduction process (GO:0055114) were the highly represented GO categories signifying biological processes. The major number of GO terms in molecular function was structural constituent of ribosome (GO:0003735), oxidoreductase activity (GO:0016491) and ATP binding (GO:0005524) whereas in cellular component, the highly represented GO terms were ribosome (GO:0005840) and extracellular space (GO:0005615).
A total of 239 peptide sequences were mapped to 113 KEGG pathways using KOBAS. The main KEGG pathways mapped included ribosomal protein assembly pathway (n = 34) and cytoskeleton proteins (n = 19). Other well represented pathways include tight junction (n = 14), regulation of actin cytoskeleton (n = 12), focal adhesion (n = 12), valine, leucine and isoleucine degradation (n = 8) and propanoate metabolism (n = 7). Peptides were mapped to several pathways, including glycolysis/gluconeogenesis, MAPK signaling pathway and ubiquitin mediated proteolysis (Additional File 2).
Domain mapping by Interproscan provides details as to the family, fold and functional domains present in the putative peptides. The most represented was the collagen triple helix repeat of proteins, comprising 14 protein entries, followed by C-type lectin fold and transthyretin-like family, with nine protein entries each. Other highly represented domains are the actin-like and C-type lectin (Additional File 3).
A total of 32 were predicted by SecretomeP. Of these, 6 are classically secreted peptides; with N-terminal signal sequences while 26 are non-classical, supporting the use of SecretomeP vs. SignalP alone, which can only predict classically secreted proteins. Of these 32, six proteins with transmembrane helices, predicted by TMHMM were eliminated, resulting in 26 excreted/secreted proteins inferred from the present dataset of 894 rESTs. We could identify cecropin (including the cecropin-P1, cecropin-P2, cecropin-P3), cathepsin L from Ascaris suum and cathepsin L-like protease from Strongylus vulgaris, chymotrypsin/elastase isoinhibitor 1 from Ascaris suum, C-type lectin protein 160 from Ascaris suum and C-type lectin domain-containing protein 160 from Ascaris suum. Gelsolin from Ascaris suum and GelSoliN-Like family member (gsnl-1) from Caenorhabditis elegans were also identified (Additional File 4). Cecropins, represent a large family of antibacterial and toxic peptides are known to execute host defence functions mainly against micro-organisms [42, 43] and are found in insects . Ascaris cecropins (P1-P4) were identified as antimicrobial peptides that were positively inducible by bacterial injection. Ascaris cecropins synthesized chemically were bactericidal against a wide range of microbes, i.e. Gram-positive (Staphylococcus aureus, Bacillus subtilis and Micrococcus luteus) and Gram-negative (Pseudomonas aeruginosa, Salmonella typhimurium, Serratia marcescens and Esherichia coli) bacteria, and were weakly but detectably active against yeasts (Saccharomyces cerevisiae and Candida albicans) . A large family of proteins that binds carbohydrate moieties in a Ca2+-dependent manner are represented by C-type lectins (CTLs) which act as a pathogen recognition molecule or an antibacterial protein in immune responses to protect the worm itself against microbial infection [46–49]. They also play vital role in immune homeostasis by endogenous 'self' ligand recognition , and they themselves have a bactericidal activation . Studies have shown that A. suum C-type lectin-1(As-CTL-) shows high similarity to Toxocara canis C-type lectin (Tc-CTLs) and are exposed to attack by host immune responses. Hence, to avoid protective immune responses in infected animals during tissue migration A. suum larvae might interfere with host inflammation processes by As-CTL-1 . The Gelsolin family belongs to a group of actin binding proteins are known to be involved in cell structure, motility, apoptosis, amyloidosis and cancer. Gelsolin-like protein-1 (GSNL-1) from C. elegans is a new member of the gelsolin family of actin regulatory proteins which provide new insight into functional diversity and evolution of gelsolin-related proteins [53, 54]. We were able to functionally assign GO terms to 26 putative ES proteins with proteolysis (GO:0006508) the most common GO category representing biological processes, cysteine-type peptidase activity (GO:0008234) in molecular function and extracellular region (GO:0005576) in cellular component. Protein processing in endoplasmic reticulum, phagosome, lysosome, antigen processing and presentation, rheumatoid arthritis represented the sequences mapped to KEGG pathways using KOBAS. The TranSeqAnnotaor methodology was benchmarked using the large-scale dataset of Teladorsagia circumcincta (unpublished work) and applied for the annotation of A. lumbricoides.
TranSeqAnnotator currently supports nucleotide, short reads, protein and ES level annotation. Our aim is to extend the pipeline with updating the masking the repeats with repeatless libraries to annotate newly sequenced organisms and also to carry out annotations for different datasets like RNA-seq, microarray datasets.
Rudd S: Expressed sequence tags: alternative or complement to whole genome sequences?. Trends Plant Sci. 2003, 8 (7): 321-329. 10.1016/S1360-1385(03)00131-6.
Dong Q, Kroiss L, Oakley FD, Wang BB, Brendel V: Comparative EST analyses in plant systems. Methods Enzymol. 2005, 395: 400-418.
Jongeneel CV: Searching the expressed sequence tag (EST) databases: panning for genes. Brief Bioinform. 2000, 1 (1): 76-92. 10.1093/bib/1.1.76.
Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF: Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991, 252 (5013): 1651-1656. 10.1126/science.2047873.
Moreno Y, Gros PP, Tam M, Segura M, Valanparambil R, Geary TG, Stevenson MM: Proteomic analysis of excretory-secretory products of Heligmosomoides polygyrus assessed with next-generation sequencing transcriptomic information. PLoS neglected tropical diseases. 2011, 5 (10): e1370-10.1371/journal.pntd.0001370.
Wold B, Myers RM: Sequence census methods for functional genomics. Nat Methods. 2008, 5 (1): 19-21. 10.1038/nmeth1157.
Yang MQ, Athey BD, Arabnia HR, Sung AH, Liu Q, Yang JY, Mao J, Deng Y: High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC genomics. 2009, 10 (Suppl 1): I1-10.1186/1471-2164-10-S1-I1.
Ranganathan S, Menon R, Gasser RB: Advanced in silico analysis of expressed sequence tag (EST) data for parasitic nematodes of major socio-economic importance--fundamental insights toward biotechnological outcomes. Biotechnol Adv. 2009, 27 (4): 439-448. 10.1016/j.biotechadv.2009.03.005.
Nagaraj SH, Gasser RB, Ranganathan S: A hitchhiker's guide to expressed sequence tag (EST) analysis. Brief Bioinform. 2007, 8 (1): 6-21.
Adams MD, Kerlavage AR, Fields C, Venter JC: 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nat Genet. 1993, 4 (3): 256-267. 10.1038/ng0793-256.
Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M: Interrelating different types of genomic data, from proteome to secretome: 'oming in on function. Genome Res. 2001, 11 (9): 1463-1468. 10.1101/gr.207401.
Maizels RM, Yazdanbakhsh M: Immune regulation by helminth parasites: cellular and molecular mechanisms. Nat Rev Immunol. 2003, 3 (9): 733-744. 10.1038/nri1183.
Lightowlers MW, Rickard MD: Excretory-secretory products of helminth parasites: effects on host immune responses. Parasitology. 1988, 96 (Suppl): S123-166.
Hawdon JM, Jones BF, Hoffman DR, Hotez PJ: Cloning and characterization of Ancylostoma-secreted protein. A novel protein associated with the transition to parasitism by infective hookworm larvae. J Biol Chem. 1996, 271 (12): 6672-6678. 10.1074/jbc.271.12.6672.
Maizels RM, Gomez-Escobar N, Gregory WF, Murray J, Zang X: Immune evasion genes from filarial nematodes. Int J Parasitol. 2001, 31 (9): 889-898. 10.1016/S0020-7519(01)00213-2.
Masoudi-Nejad A, Tonomura K, Kawashima S, Moriya Y, Suzuki M, Itoh M, Kanehisa M, Endo T, Goto S: EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments. Nucleic Acids Res. 2006, W459-462. 34 Web Server
D'Agostino N, Aversano M, Chiusano ML: ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics. 2005, 6 (Suppl 4): S9-10.1186/1471-2105-6-S4-S9.
Latorre M, Silva H, Saba J, Guziolowski C, Vizoso P, Martinez V, Maldonado J, Morales A, Caroca R, Cambiazo V: JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow. BMC Bioinformatics. 2006, 7: 513-10.1186/1471-2105-7-513.
Paquola AC, Nishyiama MY, Reis EM, da Silva AM, Verjovski-Almeida S: ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics. 2003, 19 (12): 1587-1588. 10.1093/bioinformatics/btg196.
Hotz-Wagenblatt A, Hankeln T, Ernst P, Glatting KH, Schmidt ER, Suhai S: ESTAnnotator: A tool for high throughput EST annotation. Nucleic Acids Res. 2003, 31 (13): 3716-3719. 10.1093/nar/gkg566.
Menon R, Gasser RB, Miterva M, Ranganathan S: An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications. BMC Genomics. 2012,
Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI: An optimized procedure greatly improves EST vector contamination removal. BMC Genomics. 2007, 8: 416-10.1186/1471-2164-8-416.
Falgueras J, Lara AJ, Fernandez-Pozo N, Canton FR, Perez-Trabado G, Claros MG: SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics. 2010, 11: 38-10.1186/1471-2105-11-38.
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004, 14 (6): 1147-1159. 10.1101/gr.1917404.
Iseli C, Jongeneel CV, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol. 1999, 138-148.
Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.
Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S: Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel. 2004, 17 (4): 349-356. 10.1093/protein/gzh037.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, D211-215. 37 Database
Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, Kong L, Gao G, Li CY, Wei L: KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic acids research. 2011, 39 (Web Server): W316-322. 10.1093/nar/gkr483.
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic acids research. 2006, D354-357. 34 Database
Bieri T, Blasiar D, Ozersky P, Antoshechkin I, Bastiani C, Canaran P, Chan J, Chen N, Chen WJ, Davis P: WormBase: new content and better access. Nucleic Acids Res. 2007, D506-510. 35 Database
Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, Feuermann M, Ghanbarian AT, Kerrien S, Khadake J: The IntAct molecular interaction database in 2010. Nucleic Acids Res. 2010, D525-531. 38 Database
Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bahler J, Wood V: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, D637-640. 36 Database
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic acids research. 2004, D449-451. 32 Database
Nagaraj SH, Deshpande N, Gasser RB, Ranganathan S: ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform. Nucleic Acids Res. 2007, 35 (Web Server): W143-147. 10.1093/nar/gkm378.
Nagaraj SH, Gasser RB, Ranganathan S: Needles in the EST haystack: large-scale identification and analysis of excretory-secretory (ES) proteins in parasitic nematodes using expressed sequence tags (ESTs). PLoS Negl Trop Dis. 2008, 2 (9): e301-10.1371/journal.pntd.0000301.
Robinson MW, Menon R, Donnelly SM, Dalton JP, Ranganathan S: An integrated transcriptomics and proteomics analysis of the secretome of the helminth pathogen Fasciola hepatica: proteins associated with invasion and infection of the mammalian host. Mol Cell Proteomics. 2009, 8 (8): 1891-1907. 10.1074/mcp.M900045-MCP200.
Dold C, Holland CV: Ascaris and ascariasis. Microbes Infect. 2011, 13 (7): 632-637. 10.1016/j.micinf.2010.09.012.
Holland CV: Predisposition to ascariasis: patterns, mechanisms and implications. Parasitology. 2009, 136 (12): 1537-1547. 10.1017/S0031182009005952.
Boguski MS, Lowe TM, Tolstoshev CM: dbEST--database for "expressed sequence tags". Nat Genet. 1993, 4 (4): 332-333. 10.1038/ng0893-332.
Tamang DG, Saier MH: The cecropin superfamily of toxic peptides. J Mol Microbiol Biotechnol. 2006, 11 (1-2): 94-103. 10.1159/000092821.
Bulet P, Stocklin R: Insect antimicrobial peptides: structures, properties and gene regulation. Protein Pept Lett. 2005, 12 (1): 3-11. 10.2174/0929866053406011.
Steiner H, Hultmark D, Engstrom A, Bennich H, Boman HG: Sequence and specificity of two antibacterial proteins involved in insect immunity. Nature 292: 246-248. 1981. J Immunol. 2009, 182 (11): 6635-6637.
Pillai A, Ueno S, Zhang H, Lee JM, Kato Y: Cecropin P1 and novel nematode cecropins: a bacteria-inducible antimicrobial peptide family in the nematode Ascaris suum. Biochem J. 2005, 390 (Pt 1): 207-214.
O'Rourke D, Baban D, Demidova M, Mott R, Hodgkin J: Genomic clusters, putative pathogen recognition molecules, and antimicrobial genes are induced by infection of C. elegans with M. nematophilum. Genome Res. 2006, 16 (8): 1005-1016. 10.1101/gr.50823006.
Schulenburg H, Hoeppner MP, Weiner J, Bornberg-Bauer E: Specificity of the innate immune system and diversity of C-type lectin domain (CTLD) proteins in the nematode Caenorhabditis elegans. Immunobiology. 2008, 213 (3-4): 237-250. 10.1016/j.imbio.2007.12.004.
Drickamer K: Two distinct classes of carbohydrate-recognition domains in animal lectins. J Biol Chem. 1988, 263 (20): 9557-9560.
Drickamer K: Ca(2+)-dependent sugar recognition by animal lectins. Biochem Soc Trans. 1996, 24 (1): 146-150.
Garcia-Vallejo JJ, van Kooyk Y: Endogenous ligands for C-type lectin receptors: the true regulators of immune homeostasis. Immunol Rev. 2009, 230 (1): 22-37. 10.1111/j.1600-065X.2009.00786.x.
Cash HL, Whitham CV, Behrendt CL, Hooper LV: Symbiotic bacteria direct expression of an intestinal bactericidal lectin. Science. 2006, 313 (5790): 1126-1130. 10.1126/science.1127119.
Yoshida A, Nagayasu E, Horii Y, Maruyama H: A novel C-type lectin identified by EST analysis in tissue migratory larvae of Ascaris suum. Parasitol Res. 2012
Liu Z, Klaavuniemi T, Ono S: Distinct roles of four gelsolin-like domains of Caenorhabditis elegans gelsolin-like protein-1 in actin filament severing, barbed end capping, and phosphoinositide binding. Biochemistry. 2010, 49 (20): 4349-4360. 10.1021/bi100215b.
Klaavuniemi T, Yamashiro S, Ono S: Caenorhabditis elegans gelsolin-like protein 1 is a novel actin filament-severing protein with four gelsolin-like repeats. J Biol Chem. 2008, 283 (38): 26071-26080. 10.1074/jbc.M803618200.
We are grateful to Macquarie University for the award of postgraduate research scholarships. Funding to pay the Open Access publication charges for this article was provided by Macquarie University.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 17, 2012: Eleventh International Conference on Bioinformatics (InCoB2012): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S17.
The authors declare that they have no competing interests.
RM carried out the analysis, computational studies and drafted the manuscript. RM, GG, SR and RBG participated in the design of the study and interpretation of data. SR and RBG conceived the project and finalized the manuscript. All authors have read and approved the final manuscript.
Electronic supplementary material
Additional file 1: GO annotation for putative peptides. Gene Ontology annotations from Interproscan reported. (XLSX 29 KB)
Additional file 2: KEGG Pathway analysis of proteins (E-value threshold of 1E-05). Database matches reported. (XLSX 11 KB)
Additional file 3: Domain description for the protein sequences. Interproscan domains reported. (XLSX 27 KB)
Additional file 4: Top BLAST hits for secreted proteins. Non-redundant database matches reported. (XLSX 13 KB)
About this article
Cite this article
Menon, R., Garg, G., Gasser, R.B. et al. TranSeqAnnotator: large-scale analysis of transcriptomic data. BMC Bioinformatics 13, S24 (2012). https://doi.org/10.1186/1471-2105-13-S17-S24
- Ascaris Lumbricoides
- Include Gene Ontology
- Hide Markov Model Probability
- Collagen Triple Helix Repeat