ProFAT: a web-based tool for the functional annotation of protein sequences
© Bradshaw et al; licensee BioMed Central Ltd. 2006
Received: 10 May 2006
Accepted: 23 October 2006
Published: 23 October 2006
The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches.
We present ProFAT, a web-based tool for the functional annotation of protein sequences based on remote sequence similarity. ProFAT combines sensitive sequence database search methods and a fold recognition algorithm with a simple text-mining approach. ProFAT extracts identified hits based on their biological background by keyword-mining of annotations, features and most importantly, literature associated with a sequence entry. A user-provided keyword list enables the user to specifically search for weak, but biologically relevant homologues of an input query. The ProFAT server has been evaluated using the complete set of proteins from three different domain families, including their weak relatives and could correctly identify between 90% and 100% of all domain family members studied in this context. ProFAT has furthermore been applied to a variety of proteins from different cellular contexts and we provide evidence on how ProFAT can help in functional prediction of proteins based on remotely conserved proteins.
By employing sensitive database search programs as well as exploiting the functional information associated with database sequences, ProFAT can detect remote, but biologically relevant relationships between proteins and will assist researchers in the prediction of protein function based on remote homologies.
Functional prediction of experimentally uncharacterized proteins is an important research area in bioinformatics. On the one hand, the functional prediction of a protein can help in advancing biological science by generating testable hypothesis for experimental research. On the other hand, it improves the annotation of sequenced genomes by assigning functional information to predicted genes. Functional prediction relies mostly on the similarity between sequences and standard sequence similarity search tools have been successfully applied in protein functional annotation, provided that the similarity between related proteins is significant enough for sequence-based detection. If, however, the similarity between related protein sequences is low, profile-based database search methods like PSI-BLAST or HMMer, as well as fold recognition tools have proven successful in detecting remote homologies and therefore can assist in predicting the function of uncharacterized proteins [1, 2]).
Experimentally characterized proteins have extensive functional information associated with their sequence records. This functional information includes published literature about a protein or gene, functional classifications as for instance provided by the Gene Ontology (GO-) annotations, conserved domains that potentially link a protein with a molecular function, and sometimes even a short summary about the proteins' function. Given detectable sequence similarity between a functionally characterized and an uncharacterized protein, this information can be used to predict the putative function of the unknown protein. Given the complexity of the output of sequence-, as well as structure-based search techniques, the exploitation of this functional knowledge is often tedious and involves extensive manual mining for the biological context of identified database sequences.
With the advancement of experimental techniques in molecular biology, for example biochemical screens (co-immuno-precipitation experiments) and functional screens, so far uncharacterized proteins can often be put into the context of a cellular process while their molecular function may not become obvious by standard sequence similarity search tools. The combination of sensitive similarity search techniques with functional annotation of identified hits has already been successfully applied for the functional prediction of proteins, where the biological context of the protein of interest could be assigned based on the existing experimental information of remote homologues [3, 4]. Similarity search tools are however generally restricted to the usage of sequence or structural information of proteins and tend to neglect functional information associated with the sequences under analysis, which could be utilized to aid bioinformatics analysis. By performing text-mining on the functional annotation associated with a sequence record, this information can be combined with traditional database search algorithms to filter identified hits based on their biological relevance. However, few efforts so far exist that incorporate functional information in similarity search methods. The program SAWTED (Structure Assignment With Text Description) uses text descriptions from the SWISS-PROT database to circumvent the problem of post-filtering of PSI-BLAST results . OntoBLAST, as another example, takes advantage of the Gene Ontology based annotation of protein sequences to divide BLAST-outputs according to Gene Ontology (GO-) terminology .
With ProFAT, we introduce a tool that combines a remote sequence similarity search tool (PSI-BLAST, ) with fold recognition (Threader 3.5 ) and a text-mining algorithm to extract identified hits of both programs based on their biological function. In addition to Gene Ontology and GenBank feature annotations provided by the NCBI, ProFAT mines the literature associated with a sequence database entry for post-filtering of identified hits based on their biological context and is therefore the first tool that takes advantage of the wealth of published information associated with sequence database entries. The user can furthermore selectively extract hits identified by PSI-BLAST and Threader 3.5 based on a specific biological context by providing a user-specific keyword list, which makes ProFAT customizable to any biological setting. In addition to post-filtering of PSI-BLAST and Threader results for a user-specified biological context, ProFAT also provides a fully annotated output containing all identified hits, which eases the often time-consuming manual mining of literature and sequence record annotations.
Domain searches are carried out using RPS-BLAST  against the CDD-database (NCBI). BLAST and PSI-BLAST  searches against the non-redundant protein database (NCBI) are performed using the stand-alone tools provided by the NCBI (version 2.4.17). Hidden Markov Model based domain searches (HMMer, version 2.3.2, ) are performed against the PFAM conserved domain database. Fold recognition is done using Threader 3.5 . Secondary structure prediction prior to threading runs are performed using the program PSI-PRED , coiled-coil regions are detected using the program COILS2 , low-complexity regions are filtered using the program SEG ; all three programs are executed on the entire sequence with subsequent processing according to conserved domain and region boundaries. Text mining is performed using a Perl implementation for stemming from Porter . For the Gene Ontology (GO-) annotation of hits, a GO-tree is constructed by aligning GO-terms of identified hits to the ontology tree provided by the GO-consortium .
Bulk searches of the Annotation Engine for the HNF-1α, PABP and PLAT domain families were run using no stemming. Only those family members that identified at least 10 hits with a keyword from the respective keyword lists were scored as positive. HMMerThread bulk searches were run using a threading extension of various sizes (0, 3, 5, 8, 10) for HNF-1α, 5 and 8 for PABP and 0 for PLAT family members and a threading hit depth of 20.
Multiple sequence alignments were done with ClustalX  and were manually refined. Structural comparisons were done using the DALI-server . For testing purposes of ProFAT, we verified all the examples of weak domain hits given in this manuscript by independent PSI-BLAST searches.
Results and discussion
The ProFAT server
The input of a ProFAT analysis is a protein sequence and a keyword list that describes the cellular process or putative function relevant for the protein under analysis.
The workflow of ProFAT can be divided into 3 parts (Figure 1): a domain search or domain prediction, whereby identified conserved domains are used to split the input query for further processing with 2) the Annotation Engine and 3) the Threading Engine.
Domain search and prediction
At this stage, selected domains can be in parallel submitted to the Annotation Engine, the Threading Engine, as well as to an HMMerThread run for a keyword-independent domain prediction.
Using Dip13α/APPL1 as a query ProFAT identified sequence similarity between the first 280 amino acids and BAR-domain containing proteins. The presence of a BAR domain in the N-terminus of Dip13α/APPL1 has been previously reported [3, 18].
Gene Ontology tree mapping
One limitation of ProFAT is that if the keyword list does not correspond to the actual biological background of the protein input query, results may be misleading. To avoid this, ProFAT maps the GO-annotation of identified hits onto the GO-tree, whereby the number of hits in a certain branch are shown next to the biological processes, molecular functions and cellular compartments. When the user does not find any significant hit with the keyword list used, the ProFAT search can be repeated with a selection of keywords based on the biological function most relevant to the input query, as defined by the associated GO-terms.
Design of keyword lists
Text-mining for the selection of biologically relevant hits in ProFAT is performed using keywords from a user-provided list. The results from a ProFAT search are therefore directly influenced by the keywords a user provides for the ProFAT search. While the stemming algorithm  used here takes care of differential suffixes of words, users should still follow a few rules in order to obtain optimal results: 1) the user should try to fully describe the process of interest in the keyword list. A CH domain, for example, has been annotated for actin-binding proteins, but is also found in microtubule-associated or cytoskeleton interacting proteins. Assuming that a protein query has been implicated in actin binding, interesting results could therefore be missed, in the case where only the keywords 'actin binding' were present in the keyword list. This is mainly due to firstly, that the actin-binding domain could show remote similarity to a domain which was initially annotated as a microtubule-interacting domain and secondly, because annotations, whether they are manual or automatic, can be inaccurate; 2) the user should try to avoid common words that are found in any GenBank record, like 'RNA' or 'protein' or also names of organisms. Other common words found in protein names are for instance 'alpha', 'beta' or 'delta', which should also be avoided; 3) in case the user is uncertain about the exact wording of keywords that describe a certain process, we would recommend to use commonly used wordings as are for instance found in functional annotation databases such as Gene Ontology or the Panther database ; 4) if the user already has an idea concerning the identity of a weakly conserved domain found in the protein query, it is recommended to include the name of the domain in the keyword list, as the Annotation and Threading Engines will then also specifically show those hits that contain similarity to this conserved domain.
Validation of the ProFAT server
Identification of novel and weak domain hits using ProFAT
Identification of a CH domain in Hook proteins and the microtubule-associated protein KPL2
KPL2 is an essential component of the central pair complex in ciliated cells. The orthologue from rat was characterized as a gene that is specifically expressed in ciliated cells . The orthologue in Sus scorfa was recently linked to an autosomal recessive disease in pigs that leads to immotile short-tail sperm . The orthologue of KPL2 in Chlamydomonas reinhardtii, Cpc1, was identified as a component of the central pair complex, which is a large protein complex that regulates the activity of axonemal dynein . The central pair complex consists of 2 central microtubules that associate with a large number of additional factors , some of which link the two central microtubules. Central pair complex (CPC-) associated proteins also extrude from this structure and thus help in the assembly of a cylindrical cage of filaments surrounding the microtubules. At open positions in this cage, some CPC-associated proteins interact with external radial spokes and thereby transmit signals that regulate dynein activity for coordinated movement of flagella. Mutations in Cpc1 disrupt the assembly of the central pair complex and alter flagellar beat frequency in Chlamydomonas . Biochemical analysis showed that when Cpc1 is deleted, a large portion of the central pair complex is missing.
Rat KPL2 was predicted to have a N-terminal CH domain, with which it could interact with the cytoskeleton or the central microtubule pair . This domain however is undetectable by RPS-BLAST and comes with an insignificant E-value in SMART analysis. We were interested in whether ProFAT would detect a CH domain in human KPL2. The domain search of ProFAT detected a domain of unknown function DUF1042 in the N-terminal part of the protein, which was selected for further processing [see Additional file 4A]. HMMer, on the other hand, detected the presence of a CH domain between amino acids 1 – 105 in the sequence, which was sent to Threader [see Additional file 4A]. The Annotation Engine of ProFAT identified among other CH domain – containing sequences, the proteins Mal3 from S. pombe and the microtubule-associated protein EB1 from Arabidopsis [see Additional file 4B]. Along the same lines, HMMerThread detected the presence of a CH domain with 83% confidence [see Additional file 4C]. The alignment of 3 KPL2 orthologues with representatives from the CH domain family reveals good conservation of KPL2 to CH domain family members (Figure 7A). These results suggest that the domain DUF1042 is essentially a member of the CH domain family.
Identification of a SAM domain in the C-terminus of EPS8 family members
Eps8 proteins are downstream targets of the Epidermal Growth Factor (EGF) pathway. Members of this protein family are implicated in EGF-mediated signal transduction, though their exact role is so far unknown. It has been shown that Eps8 coordinates EGF-receptor signaling via regulation of small GTPases. A C-terminal effector region in Eps8, for instance regulates activation of Rac, which leads to actin cytoskeleton remodeling . Eps8 family proteins are predicted to have a SAM domain in the C-terminus of the protein . Domain searches using RPS-BLAST and/or SMART fail to identify this domain, even at permissive E-values. We were interested in whether ProFAT could detect the SAM domain in those proteins. The domain search of ProFAT identified an EPS8/PTB domain in the N-terminus of EPS8L3, as well as a SH3 domain in the C-terminal part, but failed to recognize the SAM domain. HMMer on the other hand detected a SAM_1 domain with an E-value of 2 in the C-terminus of the protein, which was selected for further processing [see Additional file 5A]. ProFAT's Annotation Engine detected SAM-domain containing proteins, as, for instance, a sequence from chicken and the kinase suppressor of Ras from Drosophila simulans [see Additional file 5B]. The HMMerThread pipeline predicted a SAM_1 domain in the C-terminus of EPS8L3 with a certainty of over 90% [see Additional file 5C]. The multiple sequence alignment of Eps8 and Eps8-like proteins 2 and 3 with representatives of the SAM domain family, as well as the structural representative of the SAM_PNT domain, which is a subfamily of the SAM domain, shows a conserved pattern of hydrophobic, aromatic and charged amino acids (Figure 7B). These results suggest that the C-termini of Eps8 and Eps8 like proteins contain a SAM domain, as was proposed previously .
Identification of an RRM domain in PARN proteins and an uncharacterized protein family
The poly(A)-specific ribonuclease PARN is a 3' exonuclease which is involved in the destruction of cellular mRNAs . Members of the PARN family contain a split CAF1 domain, which has ribonuclease catalytic activity. In the center of the CAF1 domain, RPS-BLAST predicts a PARN_R3H domain, which is predicted to bind single- or double-stranded RNAs. RPS-BLAST also predicts a weakly conserved RRM domain C-terminal of the CAF1 domain with an E-value of 1.8. We were interested as to whether ProFAT could detect the weakly conserved RRM domain in human PARN. The domain search of ProFAT correctly predicts the CAF1 and PARN_RH3 domains and the HMMer module of HMMerThread predicts the presence of an RRM_1 domain adjacent to the CAF1 domain [see Additional file 6A]. We selected the RRM_1 module for further processing with HMMerThread, as well as the C-terminal part of PARN for analysis using the Annotation and Threading Engines of ProFAT. The Annotation Engine identified, as an example, the Bruno-like RNA binding protein 5 from chicken [see Additional file 6B]. HMMerThread identified RRM motifs from several crystallized proteins with a confidence of nearly 90% [see Additional file 6C]. The crystal structure of the region containing the RRM domain of a PARN family member has been determined (Nagata T., et al., 2004, unpublished; [PDB:1WHV]). Using the DALI server, we searched for similar structures to 1WHV. The closest hit is the structure of the central RRM of human La protein ([PDB:1S79], ), which is detected with a significant score of 7.5. We next performed a multiple sequence alignment of PARN family members to representatives of the RRM domain (Figure 7C) and observe a high level of conservation between these two domains. ProFAT was therefore able to detect the weakly conserved RRM domain in PARN family members.
The uncharacterized human protein LOC84060 has not been associated with any biological function. Domain searches using standard parameters did not reveal any conserved domains for this protein. However, when increasing the E-value in the RPS-BLAST search, an RRM domain is found with an E-value of 4.6. Assuming that this protein would be involved in RNA metabolism or regulation, we submitted the protein sequence of LOC84060 to the ProFAT server. The domain search pipeline of ProFAT did not find any conserved domain, while HMMer identified the presence of an RRM_1 domain in this protein [see Additional file 7A]. We selected the RRM_1 domain for processing with HMMerThread and submitted the protein sequence of LOC84060 to ProFAT's Annotation and Threading Engine. For more accurate results, we invoked the option of splitting the input sequence using 150 amino acids. ProFAT's Annotation Engine identified among others the RRM domain in the poly (A)-binding protein PABPC from human [see Additional file 7B]. HMMerThread found the RRM_1 domain of splicing factor U2AF as significantly similar [see Additional file 7C]. Next we aligned LOC84060 to representatives of the RRM domain (Figure 7C). The multiple sequence alignment reveals that LOC84060 shares all except for two residues that are conserved in this domain family. Based on this data, we suggest that LOC84060 is a RRM domain containing, RNA-binding protein.
Identification of an acetyltransferase domain in the unknown human protein LOC79969
No functional information is so far available for the uncharacterized human protein LOC79969. Domain searches using RPS-BLAST or SMART predict the presence of a domain of unknown function, DUF738. As there was no hint on the biological context this protein could be associated with, we performed only an HMMerThread search with the protein sequence of LOC79969. HMMer detected a weakly conserved acetyltransferase domain within the DUF738 region [see Additional file 8A]. We selected the predicted Acetyltransf_1 domain for further processing using the threading pipeline of HMMerThread, which identified the 3-dimensional structures of several acetyltransferases with a confidence of nearly 90% [see Additional file 8B]. We next aligned members of the LOC79969 family to representatives of the Acetyltransf_1 domain family (Figure 7D). LOC79969 seems to be most closely related to the GNAT subfamily of acetyltransferases. Interestingly, the proposed catalytic Tyrosine residue at the C-terminus of the Acetyltranferase domain (reviewed in ) is mutated to a Leucine in human and fly LOC79969 and a Methionine in C. elegans. A conserved Tyrosine is however located 4 residues C-terminal to the proposed catalytic site. As our data suggest that LOC79969 adopts a GNAT-like fold, it will have to be tested experimentally, whether the Acetyltransf_1 domain is catalytically active.
Applications of ProFAT
ProFAT finds its utility in several applications: 1) the ProFAT server should be used when standard similarity search programs fail to predict the function of a so-far uncharacterized protein that can be associated with a certain cellular process/molecular function. In this case, ProFAT would be used as an aid for post-filtering of complex Threading and PSI-BLAST outputs; 2) the user might be interested in whether a conserved domain shows remote sequence similarity or is structurally related to proteins from a specific cellular process/molecular function and can therefore use ProFAT to specifically search for weakly related sequences or structures that are found in the biological context of interest; 3) the domain prediction pipeline is applicable to regions of proteins with no obvious conserved domain. In this case, the combination of RPS-BLAST and a subsequent BLAST-search of weak domain hits with a text-mining step can strengthen evidence from subtle sequence similarity with additional biologically relevant evidence; 4) finally, HMMerThread presents itself as a very powerful pipeline for accurate prediction of weakly conserved domains by looking for remote sequence similarity with conserved domain hits in combination with a subsequent threading step. HMMerThread in addition has the advantage of not relying on the user-provided keyword list and can be applied to proteins, which cannot be associated with any biological function. This module can therefore be used as a means of predicting weakly conserved domains with high accuracy.
ProFAT is a powerful tool for the uncovering of remote but biologically relevant relationships between sequences. While highly powerful tools are already available to discover subtle sequence similarity, for example profile-based database search methods and fold recognition techniques, only few methods so far exist that also provide a means to combine these search tools with a literature-mining step. In particular the text-mining of associated literature abstracts makes ProFAT unique in post-filtering database sequences based on biological features found in associated primary literature. While tools such as OntoBLAST and SAWTED use secondary annotation of sequences for post-filtering of database search results, ProFAT goes back to primary published information of sequence entries, which helps to circumvent the problem of sometimes error-prone functional information found in the annotation of sequences. The strength of ProFAT furthermore lies in the combination of sequence- and structure-based search tools that are able to reliably detect weak sequence relationships. Finally, ProFAT is highly flexible and allows the user to tailor a database search to his own biological interest.
Availability and requirements
Program name: ProFAT
Project home page: http://cluster-1.mpi-cbg.de/profat/profat.html
Operating Systems: platform independent
Programming language: Perl
other requirements: Web-browser, valid e-mail address
License: GNU public license
Any restrictions to use by non-academics: Commercial users are not able to use HMMerThread or the Threading Engine due to license restrictions for Threader 3.5
Charles Bradshaw is supported by BMBF grant 0313082F from BMBF. Vineeth Surendranath was supported by BMBF grant PTJ-BIO/0313130 from BMBF (held by Andrej Shevchenko). We thank Andreas Henschel for MOSIX-related advice and furthermore thank Sabine Bernauer and Michael Volkmer for critical reading of the manuscript. This work was supported by the Max Planck Society.
- Ivanov D, Schleiffer A, Eisenhaber F, Mechtler K, Haering CH, Nasmyth K: Eco1 is a novel acetyltransferase that can acetylate proteins involved in cohesion. Curr Biol 2002, 12(4):323–328. 10.1016/S0960-9822(02)00681-4View ArticlePubMedGoogle Scholar
- Rea S, Eisenhaber F, O'Carroll D, Strahl BD, Sun ZW, Schmid M, Opravil S, Mechtler K, Ponting CP, Allis CD, Jenuwein T: Regulation of chromatin structure by site-specific histone H3 methyltransferases. Nature 2000, 406(6796):593–599. 10.1038/35020506View ArticlePubMedGoogle Scholar
- Miaczynska M, Christoforidis S, Giner A, Shevchenko A, Uttenweiler-Joseph S, Habermann B, Wilm M, Parton RG, Zerial M: APPL proteins link Rab5 to nuclear signal transduction via an endosomal compartment. Cell 2004, 116(3):445–456. 10.1016/S0092-8674(04)00117-5View ArticlePubMedGoogle Scholar
- Uhlmann F, Wernic D, Poupart MA, Koonin EV, Nasmyth K: Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell 2000, 103(3):375–386. 10.1016/S0092-8674(00)00130-6View ArticlePubMedGoogle Scholar
- MacCallum RM, Kelley LA, Sternberg MJ: SAWTED: structure assignment with text description--enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics 2000, 16(2):125–129. 10.1093/bioinformatics/16.2.125View ArticlePubMedGoogle Scholar
- Zehetner G: OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res 2003, 31(13):3799–3803. 10.1093/nar/gkg555PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: A new approach to protein fold recognition. Nature 1992, 358(6381):86–89. 10.1038/358086a0View ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res 2004, 32(Web Server issue):W327–31.PubMed CentralView ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14(9):755–763. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16(4):404–405. 10.1093/bioinformatics/16.4.404View ArticlePubMedGoogle Scholar
- Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science 1991, 252(5010):1162–1164. 10.1126/science.252.5009.1162View ArticlePubMedGoogle Scholar
- Wotton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods in Enzymology 1996, 266: 554–571.View ArticleGoogle Scholar
- Porter M: An algorithm for suffix stripping. Program 1980, 14(3):30–137.View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 2003, 31(13):3497–3500. 10.1093/nar/gkg500PubMed CentralView ArticlePubMedGoogle Scholar
- Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233(1):123–138. 10.1006/jmbi.1993.1489View ArticlePubMedGoogle Scholar
- Habermann B: The BAR-domain family of proteins: a case of bending and binding? EMBO Rep 2004, 5(3):250–255. 10.1038/sj.embor.7400105PubMed CentralView ArticlePubMedGoogle Scholar
- Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, Kitano H, Thomas PD: The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res 2005, 33(Database issue):D284–8. 10.1093/nar/gki078PubMed CentralView ArticlePubMedGoogle Scholar
- Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J: The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res 2004, 32(Database issue):D235–9. 10.1093/nar/gkh117PubMed CentralView ArticlePubMedGoogle Scholar
- Jones DT, Tress M, Bryson K, Hadley C: Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins 1999, Suppl 3: 104–111. Publisher Full Text 10.1002/(SICI)1097-0134(1999)37:3+<104::AID-PROT14>3.0.CO;2-PView ArticlePubMedGoogle Scholar
- Barnard DC, Cao Q, Richter JD: Differential phosphorylation controls Maskin association with eukaryotic translation initiation factor 4E and localization on the mitotic apparatus. Mol Cell Biol 2005, 25(17):7605–7615. 10.1128/MCB.25.17.7605-7615.2005PubMed CentralView ArticlePubMedGoogle Scholar
- Paynton BV: RNA-binding proteins in mouse oocytes and embryos: expression of genes encoding Y box, DEAD box RNA helicase, and polyA binding proteins. Dev Genet 1998, 23(4):285–298. 10.1002/(SICI)1520-6408(1998)23:4<285::AID-DVG4>3.0.CO;2-WView ArticlePubMedGoogle Scholar
- Wakiyama M, Imataka H, Sonenberg N: Interaction of eIF4G with poly(A)-binding protein stimulates translation and is critical for Xenopus oocyte maturation. Curr Biol 2000, 10(18):1147–1150. 10.1016/S0960-9822(00)00701-6View ArticlePubMedGoogle Scholar
- Wormington M, Searfoss AM, Hurney CA: Overexpression of poly(A) binding protein prevents maturation-specific deadenylation and translational inactivation in Xenopus oocytes. Embo J 1996, 15(4):900–909.PubMed CentralPubMedGoogle Scholar
- Walenta JH, Didier AJ, Liu X, Kramer H: The Golgi-associated hook3 protein is a member of a novel family of microtubule-binding proteins. J Cell Biol 2001, 152(5):923–934. 10.1083/jcb.152.5.923PubMed CentralView ArticlePubMedGoogle Scholar
- Ostrowski LE, Andrews K, Potdar P, Matsuura H, Jetten A, Nettesheim P: Cloning and characterization of KPL2, a novel gene induced during ciliogenesis of tracheal epithelial cells. Am J Respir Cell Mol Biol 1999, 20(4):675–683.View ArticlePubMedGoogle Scholar
- Sironen A, Thomsen B, Andersson M, Ahola V, Vilkki J: An intronic insertion in KPL2 results in aberrant splicing and causes the immotile short-tail sperm defect in the pig. Proc Natl Acad Sci U S A 2006, 103(13):5006–5011. 10.1073/pnas.0506318103PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang H, Mitchell DR: Cpc1, a Chlamydomonas central pair protein with an adenylate kinase domain. J Cell Sci 2004, 117(Pt 18):4179–4188. 10.1242/jcs.01297PubMed CentralView ArticlePubMedGoogle Scholar
- Adams GM, Huang B, Piperno G, Luck DJ: Central-pair microtubular complex of Chlamydomonas flagella: polypeptide composition as revealed by analysis of mutants. J Cell Biol 1981, 91(1):69–76. 10.1083/jcb.91.1.69View ArticlePubMedGoogle Scholar
- Di Fiore PP, Scita G: Eps8 in the midst of GTPases. Int J Biochem Cell Biol 2002, 34(10):1178–1183. 10.1016/S1357-2725(02)00064-XView ArticlePubMedGoogle Scholar
- Korner CG, Wahle E: Poly(A) tail shortening by a mammalian poly(A)-specific 3'-exoribonuclease. J Biol Chem 1997, 272(16):10448–10456. 10.1074/jbc.272.16.10448View ArticlePubMedGoogle Scholar
- Alfano C, Sanfelice D, Babon J, Kelly G, Jacks A, Curry S, Conte MR: Structural analysis of cooperative RNA binding by the La motif and central RRM domain of human La protein. Nat Struct Mol Biol 2004, 11(4):323–329. 10.1038/nsmb747View ArticlePubMedGoogle Scholar
- Dyda F, Klein DC, Hickman AB: GCN5-related N-acetyltransferases: a structural overview. Annu Rev Biophys Biomol Struct 2000, 29: 81–103. 10.1146/annurev.biophys.29.1.81View ArticlePubMedGoogle Scholar
- Bradshaw CR, Surendranath V, Habermann B: ProFAT online manual.2006. [http://cluster-1.mpi-cbg.de/profat/BradshawSupplement]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.