uPEPperoni: An online tool for upstream open reading frame location and analysis of transcript conservation
© Skarshewski et al.; licensee BioMed Central Ltd. 2014
Received: 7 August 2013
Accepted: 11 January 2014
Published: 1 February 2014
Several small open reading frames located within the 5′ untranslated regions of mRNAs have recently been shown to be translated. In humans, about 50% of mRNAs contain at least one upstream open reading frame representing a large resource of coding potential. We propose that some upstream open reading frames encode peptides that are functional and contribute to proteome complexity in humans and other organisms. We use the term uPEPs to describe peptides encoded by upstream open reading frames.
We have developed an online tool, termed uPEPperoni, to facilitate the identification of putative bioactive peptides. uPEPperoni detects conserved upstream open reading frames in eukaryotic transcripts by comparing query nucleotide sequences against mRNA sequences within the NCBI RefSeq database. The algorithm first locates the main coding sequence and then searches for open reading frames 5′ to the main start codon which are subsequently analysed for conservation. uPEPperoni also determines the substitution frequency for both the upstream open reading frames and the main coding sequence. In addition, the uPEPperoni tool produces sequence identity heatmaps which allow rapid visual inspection of conserved regions in paired mRNAs.
uPEPperoni features user-nominated settings including, nucleotide match/mismatch, gap penalties, Ka/Ks ratios and output mode. The heatmap output shows levels of identity between any two sequences and provides easy recognition of conserved regions. Furthermore, this web tool allows comparison of evolutionary pressures acting on the upstream open reading frame against other regions of the mRNA. Additionally, the heatmap web applet can also be used to visualise the degree of conservation in any pair of sequences. uPEPperoni is freely available on an interactive web server at http://upep-scmb.biosci.uq.edu.au.
Keywords5′UTR uORFs mRNAs Sequence conservation Short peptides Poly-cistronic Homology heatmaps
The discovery of mutations in upstream Open Reading Frames (uORFs) associated with disease  has brought renewed interest in uORFs and the peptides they encode. Bioinformatic analyses of cDNA and EST databases have estimated that up to 50% of all eukaryote mRNAs contain upstream AUG (uAUG)/uORFs within the 5′ untranslated region (5′UTR) [2–8]. Recent ribosome profiling studies have indicated that many of these uAUGs are recognised by scanning ribosomes suggesting that their associated uORFs are translated [9–11]. To date, 29 peptides encoded by uORFs have been identified in proteomic studies [12–14] although there is currently no information on their functions. We have previously proposed that part of the eukaryotic proteome is composed of peptides resulting from the translation of uORFs .
The canonical role for uAUGs/uORFs is the regulation of protein expression by modulating translation of the main open reading frame (mORF), which is usually the longest coding sequence (CDS) present on a mRNA. In most cases uAUGs/uORFs lower translation of the mORF by reducing the number of ribosomes reaching and initiating at the main AUG start codon [1, 15–18]. While there are many reports of uORFs reducing translation of the CDS [1, 16, 18], only a few studies have investigated the potential of uORFs to generate bioactive peptides [2, 12, 19, 20]. We use the term uPEPs to describe their origin as uORF-derived peptides.
Searches for cross-species conservation of uORFs can reveal those that encode potential functionally important peptides [2, 12, 19, 20]. High levels of sequence identity between uORF homologues (when compared to the mRNA as a whole) are an indication that the encoded uPEP has been maintained during evolution. Furthermore, protein coding regions generally have more synonymous substitutions than non-synonymous mutations, and that this observation can be used to predict potential protein coding regions . The algorithms presented here screen uORFs for these characteristics in order to identify those encoding potential uPEPs . The uPEPperoni program also includes an algorithm that produces sequence identity heatmaps which allow rapid visual inspection of conserved regions in paired mRNAs.
Results and discussion
To identify conserved uPEPs, a query sequence is aligned against reference uORFs using the tblastx subprogram of NCBI’s blastall standalone executable. The tblastx subprogram is used in preference to nucleotide based blast programs because of its better sensitivity and to preference selection of uPEPs conserved at the amino acid level, rather than uORFs conserved at the nucleotide level. Individual transcripts from the uORF database that are found to contain a putative uPEP homologue are paired with the query sequence, and the pair passed to the heatmap generation utility. As an alternative to receiving input sequences from the conserved uPEP search utility, the heatmap generation utility can accept user entered query/reference nucleotide sequences directly.
The mRNA sequences for each conserved query/reference uORF pair are aligned pairwise using the LAGAN toolkit , with match/mismatch scores and gap penalties specified by the user. We normally use a gap opening penalty of 50, no gap extension penalty, +5 for a nucleotide match and -4 for a mismatch as default parameters. Given a query sequence (Q) of length q, and a reference sequence (R), the alignment produces three sequences of equal length (m). These are; the aligned query (Q’) and aligned reference sequences (R’), comprising the query and reference sequences with alignment gaps inserted, and a match sequence (M) derived by assigning 1 to the i th element, if the i th element of the Q’ and R’ are a nucleotide match, and assigning 0 if otherwise.
In addition, uPEPperoni estimates the ratio of synonymous to non-synonymous substitution rates of the mORF and uORF using the method of Yang and Nielsen , implemented in a library compiled from modified source code of the yn00 program in the PAML package . As synonymous substitutions are favoured in protein coding sequences, the ratio provides additional confidence on the likelihood of any given uORF to encode a bioactive peptide. Furthermore, the synonymous to non-synonymous substitution ratio of the mORF provides an internal control to which the uORF ratio can be compared, allowing for an evaluation of selective pressures on both the uORF and mORF.
List of species with one or more conserved uPEPs using the uORFs identified in Crowe et al
Species containing one or more conserved uPEPs
Number of conserved uPEPsa
Human, mouse, rat, cow, chicken, frog, monkey, horse, chimpanzee, zebra fish, salmon
Human, mouse, rat, orangutan, chicken, frog, zebra fish, salmon
Human, mouse, rat, cow, monkey, chicken, rabbit, chimpanzee
Human, mouse, rat, pig, chicken, cat, horse
Human, mouse, rat, cow, orangutan, monkey
Human, mouse, rat, cow, orangutan, pig
Human, mouse, rat, cow, orangutan, frog
Human, mouse, rat, cow, chicken, frog
Human, mouse, rat, cow, orangutan
Human, mouse, rat, orangutan, chicken
Human, mouse, rat, zebra fish, frog
Human, mouse, rat, orangutan, pig
Human, mouse, rat, pig, monkey
Human, mouse, cow, pig, orangutan
Human, mouse, rat, orangutan
Human, mouse, rat, cow, monkey
Human, mouse, rat, cow, pig
Human, mouse, rat, cow, chicken
Human, mouse, rat, cow, frog
Human, mouse, rat, cow
Human, mouse, cow, orangutan
Human, mouse, cow, pig
Human, mouse, rat, pig
Human, mouse, rat, monkey
Human, mouse, cow, monkey
Human, mouse, orangutan, chimpanzee
Human, mouse, orangutan, hamster
Human, mouse, rat, horse
Human, mouse, rat, chicken
Human, mouse, rat
Human, mouse, cow
Human, mouse, orangutan
Human, mouse, pig
Human, mouse, monkey
We have developed a web tool that facilitates the identification of conserved uORFs. This tool alleviates the need to use several single-facet programs for the detection of uPEPs. UPEPperoni can be used to populate the databases employed in the identification of novel small peptides by mass spectrometry and enhance the discovery of a novel source of regulatory molecules. Given the renewed interest in the role of uORFs in human disease  and the possibility that peptides encoded by uORFs can have functionality beyond regulation of translation [2, 13, 26], uPEPperoni offers improved utility in their identification and will aid in their characterisation.
Availability and requirements
Project name: uPEPperoni: An online tool for upstream open reading frame location and analysis of transcript conservation.
Project home page:http://upep-scmb.biosci.uq.edu.au.
Operating system(s): Platform independent.
Other requirements: None.
License: Not applicable.
Any restrictions to use by non-academics: None.
upstream start codon
upstream open reading frame
main open reading frame
- 5′ UTR:
Five prime untranslated region
- 3′ UTR:
Three prime untranslated region
Coding DNA Sequence (synonymous with mORF).
This work was supported by the National Health and Medical Research Council [grant number 631551]. Thomas Huber is an Australian Research Council Future Fellow.
- Calvo SE, Pagliarini DJ, Mootha VK: Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci USA. 2009, 106 (18): 7507-7512. 10.1073/pnas.0810916106.View ArticlePubMed CentralPubMedGoogle Scholar
- Crowe ML, Wang XQ, Rothnagel JA: Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides. BMC Genomics. 2006, 7: 16-10.1186/1471-2164-7-16.View ArticlePubMed CentralPubMedGoogle Scholar
- Iacono M, Mignone F, Pesole G: uAUG and uORFs in human and rodent 5′untranslated mRNAs. Gene. 2005, 349: 97-105.View ArticlePubMedGoogle Scholar
- Pesole G, Gissi C, Grillo G, Licciulli F, Liuni S, Saccone C: Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs. Gene. 2000, 261 (1): 85-91. 10.1016/S0378-1119(00)00471-6.View ArticlePubMedGoogle Scholar
- Rogozin IB, Kochetov AV, Kondrashov FA, Koonin EV, Milanesi L: Presence of ATG triplets in 5′ untranslated regions of eukaryotic cDNAs correlates with a ‘weak’ context of the start codon. Bioinformatics. 2001, 17 (10): 890-900. 10.1093/bioinformatics/17.10.890.View ArticlePubMedGoogle Scholar
- Suzuki Y, Ishihara D, Sasaki M, Nakagawa H, Hata H, Tsunoda T, Watanabe M, Komatsu T, Ota T, Isogai T, et al: Statistical analysis of the 5′ untranslated region of human mRNA using “Oligo-Capped” cDNA libraries. Genomics. 2000, 64 (3): 286-297. 10.1006/geno.2000.6076.View ArticlePubMedGoogle Scholar
- Yamashita R, Suzuki Y, Nakai K, Sugano S: Small open reading frames in 5′ untranslated regions of mRnas. C R Biol. 2003, 326 (10–11): 987-991.View ArticlePubMedGoogle Scholar
- Chen CH, Liao BY, Chen FC: Exploring the selective constraint on the sizes of insertions and deletions in 5′ untranslated regions in mammals. BMC Evol Biol. 2011, 11: 192-10.1186/1471-2148-11-192.View ArticlePubMed CentralPubMedGoogle Scholar
- Ingolia NT, Lareau LF, Weissman JS: Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011, 147 (4): 789-802. 10.1016/j.cell.2011.10.002.View ArticlePubMed CentralPubMedGoogle Scholar
- Lee S, Liu B, Lee S, Huang SX, Shen B, Qian SB: Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc Natl Acad Sci USA. 2012, 109 (37): E2424-E2432. 10.1073/pnas.1207846109.View ArticlePubMed CentralPubMedGoogle Scholar
- Fritsch C, Herrmann A, Nothnagel M, Szafranski K, Huse K, Schumann F, Schreiber S, Platzer M, Krawczak M, Hampe J, et al: Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting. Genome Res. 2012, 22 (11): 2208-2218. 10.1101/gr.139568.112.View ArticlePubMed CentralPubMedGoogle Scholar
- Oyama M, Itagaki C, Hata H, Suzuki Y, Izumi T, Natsume T, Isobe T, Sugano S: Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res. 2004, 14 (10B): 2048-2052. 10.1101/gr.2384604.View ArticlePubMed CentralPubMedGoogle Scholar
- Oyama M, Kozuka-Hata H, Suzuki Y, Semba K, Yamamoto T, Sugano S: Diversity of translation start sites may define increased complexity of the human short ORFeome. Mol Cell Proteomics. 2007, 6 (6): 1000-1006. 10.1074/mcp.M600297-MCP200.View ArticlePubMedGoogle Scholar
- Slavoff SA, Mitchell AJ, Schwaid AG, Cabili MN, Ma J, Levin JZ, Karger AD, Budnik BA, Rinn JL, Saghatelian A: Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat Chem Biol. 2013, 9 (1): 59-64.View ArticlePubMed CentralPubMedGoogle Scholar
- Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV: Evolutionary conservation suggests a regulatory function of AUG triplets in 5′-UTRs of eukaryotic genes. Nucleic Acids Res. 2005, 33 (17): 5512-5520. 10.1093/nar/gki847.View ArticlePubMed CentralPubMedGoogle Scholar
- Kozak M: Pushing the limits of the scanning mechanism for initiation of translation. Gene. 2002, 299 (1–2): 1-34.View ArticlePubMedGoogle Scholar
- Morris DR, Geballe AP: Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol. 2000, 20 (23): 8635-8642. 10.1128/MCB.20.23.8635-8642.2000.View ArticlePubMed CentralPubMedGoogle Scholar
- Wang XQ, Rothnagel JA: 5′-untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. Nucleic Acids Res. 2004, 32 (4): 1382-1391. 10.1093/nar/gkh305.View ArticlePubMed CentralPubMedGoogle Scholar
- Hayden CA, Bosco G: Comparative genomic analysis of novel conserved peptide upstream open reading frames in Drosophila melanogaster and other dipteran species. BMC Genomics. 2008, 9: 61-10.1186/1471-2164-9-61.View ArticlePubMed CentralPubMedGoogle Scholar
- Hayden CA, Jorgensen RA: Identification of novel conserved peptide uORF homology groups in Arabidopsis and rice reveals ancient eukaryotic origin of select groups and preferential association with transcription factor-encoding genes. BMC Biol. 2007, 5: 32-10.1186/1741-7007-5-32.View ArticlePubMed CentralPubMedGoogle Scholar
- Nekrutenko A, Makova KD, Li WH: The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res. 2002, 12 (1): 198-202. 10.1101/gr.200901.View ArticlePubMed CentralPubMedGoogle Scholar
- Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Program NCS, Green ED, Sidow A, Batzoglou S: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 2003, 13 (4): 721-731. 10.1101/gr.926603.View ArticlePubMed CentralPubMedGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17 (1): 32-43. 10.1093/oxfordjournals.molbev.a026236.View ArticlePubMedGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.View ArticlePubMedGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011, 39 (Database issue): D38-D51.View ArticlePubMed CentralPubMedGoogle Scholar
- Jorgensen RA, Dorantes-Acosta AE: Conserved peptide upstream open reading frames are associated with regulatory genes in angiosperms. Front Plant Sci. 2012, 3: 191-PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.