- Research article
- Open Access
Computational analysis of the interaction between transcription factors and the predicted secreted proteome of the yeast Kluyveromyces lactis
© Brustolini et al; licensee BioMed Central Ltd. 2009
- Received: 18 November 2008
- Accepted: 25 June 2009
- Published: 25 June 2009
Protein secretion is a cell translocation process of major biological and technological significance. The secretion and downstream processing of proteins by recombinant cells is of great commercial interest. The yeast Kluyveromyces lactis is considered a promising host for heterologous protein production. Because yeasts naturally do not secrete as many proteins as filamentous fungi, they can produce secreted recombinant proteins with few contaminants in the medium. An ideal system to address the secretion of a desired protein could be exploited among the native proteins in certain physiological conditions. By applying algorithms to the completed K. lactis genome sequence, such a system could be selected. To this end, we predicted protein subcellular locations and correlated the resulting extracellular secretome with the transcription factors that modulate the cellular response to a particular environmental stimulus.
To explore the potential Kluyveromyces lactis extracellular secretome, four computational prediction algorithms were applied to 5076 predicted K. lactis proteins from the genome database. SignalP v3 identified 418 proteins with N-terminal signal peptides. From these 418 proteins, the Phobius algorithm predicted that 176 proteins have no transmembrane domains, and the big-PI Predictor identified 150 proteins as having no glycosylphosphatidylinositol (GPI) modification sites. WoLF PSORT predicted that the K. lactis secretome consists of 109 putative proteins, excluding subcellular targeting. The transcription regulators of the putative extracellular proteins were investigated by searching for DNA binding sites in their putative promoters. The conditions to favor expression were obtained by searching Gene Ontology terms and using graph theory.
A public database of K. lactis secreted proteins and their transcription factors are presented. It consists of 109 ORFs and 23 transcription factors. A graph created from this database shows 134 nodes and 884 edges, suggesting a vast number of relationships to be validated experimentally. Most of the transcription factors are related to responses to stress such as drug, acid and heat resistance, as well as nitrogen limitation, and may be useful for inducing maximal expression of potential extracellular proteins.
- Extracellular Protein
- Signal Peptide Cleavage Site
- WoLF PSORT
- Secrete Recombinant Protein
The General Secretory Pathway (GSP) is a protein export process of major biological and technological significance. Cell communication, as well as intercellular signaling and growth during development in multicellular organisms depends on the secretion pathway. The export of a commercial protein into the extracellular medium by a recombinant cell can facilitate its downstream processing. The yeast Kluyveromyces lactis is considered a promising host for heterologous protein production. Because yeasts naturally do not secrete as many proteins as filamentous fungi, they can produce secreted recombinant proteins with few contaminants in the medium . An ideal system for secreting a desired protein could be developed from analysis of the native proteins. The completed K. lactis genome sequence provides the tools to construct such a system . As the genomes of several hemiascomycetes yeasts are now sequenced [3–5] and cross-comparison does not reveal significant differences, the prospect of discovering a potentially significant secreted protein using bioinformatics techniques is high [6–8]. In K. lactis, as in other eukaryotes, secreted proteins are typically recognized by the presence of an N-terminal signal sequence to direct them to GSP . Signal sequences usually have a well-characterized structure composed of a central hydrophobic core (h-region). This consists of an average of 6–15 amino acid (aa) residues that are flanked by hydrophilic N- and C-terminal regions. The h-region is important for correct targeting and membrane insertion of the peptide. At the polar C-terminal region, helix breaking often occurs because of proline and glycine residues and small uncharged residues at the -3 and -1 positions that determine the signal peptide cleavage site [9, 10]. The polar N-terminal region is variable in length and frequently positively charged . Although some proteins lacking N-terminal signal sequences reach the extracellular medium, the majority of soluble secreted proteins in K. lactis are likely to be transported via the GSP . A wide variety of computational methods have been used to predict the subcellular localization of proteins . The methods differ in the input data they demand and the techniques applied to make decisions or predictions about location. Once the input data type are fixed, the methods for making predictions are basically by two methods: the manual construction of explicit rules for localization prediction using current knowledge of sorting signals, or applying data-driven, machine-learning techniques (e.g., Neural Networks (NN) or Hidden Markov Models, (HMMs)) . The latter automatically extracts decision rules from the sets of proteins with known location, without making any prior, detailed assumptions about the features of interest.
In addition to using direct algorithm analysis to predict extracellular proteins, the extracellular secretome can be analyzed through its possible transcription factors (TFs). TFs are part of the signal transduction pathway that modulates the cell metabolism in response to environmental stimuli . The TFs that contain DNA binding motifs are the component of the signaling pathway that is closest to the level of the DNA. To a large degree, the combinatorial presence and absence of transcription factor binding sites (TFBSs) is responsible for gene regulation complexity [14–17]. The identification of TFBSs has been used to infer regulatory networks for several different yeasts .
Using an algorithm approach, we proposed identifying extracellular protein candidates in the yeast K. lactis and determining TFBSs in the promoters of their genes. Analysis of the relationship to transcriptional regulators used the dataset of Bussereau et al , and putative promoter regions 1 kb upstream of the genes that encode the predicted extracellular proteins.
Prediction of K. lactis extracellular proteins
To evaluate the criteria for predicting the presence or absence of N-terminal signal peptides in the K. lactis dataset, the Hotelling T-square multivariate test (Figure 2E) was employed on the basis of NN Mean S/D and HMM scores. The vector parameters for each control set were compared to the predicted set and confirmed by T-square test. The estimated 109 ORFs were closer to the YEP dataset (p = 0.9401) than the KLRS (p < 0.01).
Analysis of annotations
Relationship between the predicted extracellular proteins and transcriptional factors repertoire
Cluster of transcription factors with GeneOntology terms related to the predicted ORFs
Aerobic/Anaerobic and Sterol metabolism
Repression of hypoxic genes, several DAN/TIR genes during aerobic growth, and ergosterol biosynthetic genes
Subunit of the heme-activated, glucose-repressed Hap2p/3p/4p/5p CCAAT-binding complex, a transcriptional activator and global regulator of respiratory gene expression; provides the principal activation function of the complex
The expression of G2/M phase genes; negatively regulates transcriptional elongation; positive role in chromatin silencing at HML and HMR.
Required for nucleosome positioning at this motif; targets Isw1p to DNA
Activates expression of early G1-specific genes, localizes to daughter cell nuclei after cytokinesis and delays G1 progression in daughters.
Maintenance of cell integrity; phosphorylated and activated by the MAP-kinase Slt2p
Transcription factor that activates transcription of genes expressed at the M/G1 phase boundary and in G1 phase
Drugs and metal resistance
Activator of multidrug resistance genes, forms a heterodimer with Pdr1p; interacts with a PDRE (pleotropic drug resistance element)
Required for oxidative stress tolerance; activated by H2O2; mediates resistance to cadmium
Activates genes involved in multidrug resistance; paralog of Yrm1p, acting on an overlapping set of target genes
General stress response
Regulates the unfolded protein response, via UPRE binding, and membrane biogenesis; ER stress-induced splicing pathway utilizing Ire1p, Trl1p and Ada5p facilitates efficient Hac1p synthesis
JmjC domain-containing histone demethylase; transcription factor involved in the expression of genes during nutrient limitation; also involved in the negative regulation of DPP1 and PHR1
Transcriptional activator related to Msn4p; activated in stress conditions, which results in translocation from the cytoplasm to the nucleus; binds DNA at stress response elements of responsive genes, inducing gene expression
Basic helix-loop-helix-leucine zipper (bHLH/Zip) transcription factor that forms a complex with another bHLH/Zip protein, Rtg1p, to activate the retrograde (RTG) and TOR pathways (1, 2)
Involved in cell-type-specific transcription and pheromone response; plays a central role in the formation of both repressor and activator complexes.
Amino acid/Nitrogen starvation response
Amino acid biosynthetic genes in response to amino acid starvation; expression is tightly regulated at both the transcriptional and translational levels
Responsible for the regulation of the sulfur amino acid pathway, requires different combinations of the auxiliary factors Cbf1p, Met28p, Met31p and Met32p
Carbon source response
Glucose-responsive transcription factor that regulates expression of several glucose transporter (HXT) genes in response to glucose; transcriptional activator and repressor
Required for transcription of the glucose-repressed gene ADH2, of peroxisomal protein genes, and of genes required for ethanol, glycerol, and fatty acid utilization
Involved in induction of CLN3 transcription in response to glucose; genetic and physical interactions indicate a possible role in mitochondrial transcription or genome maintenance
pH stress response
Recruits the Cyc8p-Tup1p complex to promoters; mediates glucose repression and negatively regulates a variety of processes including filamentous growth and alkaline pH response
Binds cooperatively with Pho2p to the PHO5 promoter; function is regulated by phosphorylation at multiple sites and by phosphate availability
JmjC domain-containing histone demethylase which can specifically demethylate H3K36 tri- and dimethyl modification states; transcriptional repressor of PHR1; Rph1p phosphorylation during DNA damage is under control of the MEC1-RAD53 pathway
Because of its distinctive physiological properties, K. lactis has become an important model as a non-Saccharomyces yeast. In addition, K. lactis has great potential for biotechnological applications including expression of heterologous proteins . These possibilities motivated us to study the global extracellular proteome and correlate it to TFs using a bioinformatics approach. The final results have shown 109 proteins that are potentially secreted by K. lactis. In addition to using the TMHMM and TargetP algorithsm used by Lee et al  and Swaim et al , the Phobius  and WoLF PSORT  were applied to find transmembrane domains and subcellular addressing that would direct targeted proteins to organelles such as the endoplasmic reticulum, golgi, and proteasomes. The WoLF PSORT algorithm appeared to be more accurate; also, when the dataset of secreted proteins detected experimentally by Swain et al  was compared by the predicting methods of Lee et al , it has detected more proteins (37) than WoLF PSORT (33). However, analysis of the prediction error rate was 69.3% for WoLF PSORT and about 79.2% for TargetP. The appearance of proteins in the medium changes in different physiological conditions , so the predictive methods chosen here decrease error rates and improve the chances of obtaining an actual extracellular protein in a given physiological condition. The error reduction may come from the incremented algorithmic Phobius  combining transmembrane topology and signal peptide prediction, and the new algorithm WoLF PSORT  to predict the subcellular localization of proteins on the basis of their amino acid sequences using k-NN (k-nearest neighbor). As described by Swain et al , in the signal peptide detection step, the prediction algorithm SignalP v3.0  was used to give two NN prediction scores, mean S and mean D, and one HMM score. These NN scores were used for statistical analysis in the first step to identify extracellular proteins by the conserved secretory pathway features of a signal peptide and a signal peptidase cleavage site . Accuracy in identifying extracellular proteins may be decreased because proteins that act in the periplasmic space or the cell wall also pass through the GPS. Motifs or conserved addresses for the perisplasmic space or cell wall have not yet been found. Thus, the strategy adopted to classify the results in this study focused on annotation terms and on PFam, a database of conserved domains and families . The Genolevure third release is the main publicly available annotation dataset for K. lactis sequences. Therefore, the PFam  database was used in addition to updating the Genolevures annotation. Both showed five K. lactis annotated secreted proteins: acid phosphatase, repressible acid phosphatase precursor, guanosine diphosphatase, exo-1,3-beta-glucanase and invertase. Although some of these proteins have not been described as acting in the extracellular space according to Domínguez et al , S. cerevisiae proteins are not found free in the extracellular medium but are retained in the periplasmic space or associated with the cell wall. K. lactis, however, does not seem to have the same characteristic; in fact, it has been reported to secrete high molecular weight proteins . Thus, in this study, proteins from the periplasmic space or associated with the cell wall have been considered as part of the potential extracellular proteins dataset.
Bioinformatics identifications are probabilistic in nature, so the advantage of our analysis lies in the low cost and high speed with which these identifications can be obtained [27, 28]; hence, this analysis exploited an ab initio model of physiological inference. The model was created using the computational extracellular proteome dataset, the transcriptional regulators repertoire mined by Bussereau et al , and the Yeastract methodology created by Teixeira et al http://www.yeastract.com. Since gene expression programs depend on recognition of specific promoter sequences by transcriptional regulatory proteins , we decided to analyze the relationship between the consensus sequences or DNA binding motifs and transcriptional regulators. One of the first changes that occurs in a cell after an environmental stimulus is the content of transcriptional regulators . When a set of S. cerevisiae transcriptional regulators orthologues and their related DNA motifs binding sites was identified, a high level of polymorphism, or DNA binding factors capable of binding to both specific and nonspecific sequences, was observed [23, 29]. Because of the complex relation between TFs and the predicted secretome, the data obtained was analyzed using graph theory . The empirical model may suggest many conditions that have not yet been thought of by intuitive inference. The GO terms described for each TF dataset showed possible major interactions related to stress and the cell cycle. The results of this study are in accordance with the literature, because expressions of extracellular proteins increase in stress situations or in the exponential phase when the cell requires proteins that interact in the cell wall or in the periplasmic space . However, for a good secretion system, a few different proteins that can show high expression and secretion are needed. An ab initio model allows searching for both these proteins and the environmental conditions that might improve their expression and secretion.
Based on selected algorithms SignalP v3, Phobius, bigPI-predictor and Wolf PSORT, and adopting the highest Wolf PSORT k-NN scores and using multivariate T-square analysis for verification, we predicted an extracellular K. lactis secretome of 109 proteins. The well-known extracellular K. lactis proteins such as α-factor mating pheromone, invertase, and acid phosphatase precursor were among the 109 predicted proteins. In addition, by considering the Genolevure annotations and comparing to PFam, 48% of the known proteins had enzyme activity. By applying the S. cerevisiae Yeastract database, 65 transcription factor orthologues were found, 23 of which had binding sites in the promoters of the 109 predicted K. lactis secretome. An ab initio model of physiological inference is presented. The model is a graph with 134 nodes and 884 edges that suggests a large number of relationships between the proteins and physiological conditions that can be experimentally validated. Most of the predicted TF for extracellular proteins are related to stress responses, such as drug, acid and heat resistance, as well as nitrogen limitation, which may prove useful for inducing maximal expression of the potential extracellular proteins. A condition that favors secretion could be used to design a system to improve the secretion of a desired protein. our model is stored in a public database http://www.yeastmolphys.ufv.br/klactis.
The main dataset analyzed in this study was in two files in FASTA format. Both files contained 5076 K. lactis nucleotide and aa sequences. These data are available in the K. lactis third public release from the Génolevures consortium http://cbi.labri.fr/Genolevures.
To test the criteria for extracellular proteins, a validation set consisting of 95 non-redundant yeast extracellular proteins sequences (YEP) and 95 nonredundant K. lactis random sequences (KLRS) was assembled. The YEP dataset was obtained by searching in the UniProt protein database http://www.uniprot.org. The KLRS was assembled using a random number generator and a sequence seeker algorithm. Another validation dataset was manually extracted from Swain et al , consisting of 81 K. lactis extracellular proteins identified by mass spectrometry analysis (EPMS).
The K. lactis TF dataset used in this study was from Bussereau et al . The retrieved data were composed of 102 TFs identified as orthologues of S. cerevisiae transactivators.
Algorithms and Strategy
The entire K. lactis predicted proteins dataset was applied to SignalP v3.0 http://www.cbs.dtu.dk/services/SignalP to identify N-terminal signal peptides. To define a positive SignalP hit, the following simultaneous criteria were used: (a) signal peptide predicted by SignalP NN with the scores mean S and mean D; (b) signal peptide predicted by SignalP HMM considering the value of probability, and (c) signal peptide cleavage site located 10–40 aa from the N-terminus.
The group of predicted ORFs that encoded sequences with N-terminal signal peptides was analyzed according to the three additional characteristics of transmembrane domain, GPI modification site predicted by Phobius http://phobius.sbc.su.se, and PI-predictor http://mendel.imp.ac.at/gpi/gpi_server.html; the subcellular location was estimated using WoLF PSORT http://wolfpsort.org to identify signal addressing for subcellular locations. The obtained dataset comprised all sequences of deduced proteins potentially acting in extracellular space. The outcome set was analyzed by the PFam database http://www.sanger.ac.uk/Software/Pfam in order to update the Genolevure annotations.
To correlate the computational extracellular proteome and the TF repertoire, a supporting algorithm was created on the basis of ANSI/ISO C++ strings operations  in the K. lactis chromosomes dataset. This retrieved one kb upstream sequence as the putative promoter region from each predicted extracellular ORF. The recovered dataset is stored in a FASTA file with the relevant identification. The relationship between this computational extracellular proteome and the transcriptional regulators repertoire was made according to Yeastract . The Yeastract web tools http://www.yeastract.com and database were used to find associated TFBSs in S. cerevisiae. A second supporting C++  algorithm was created to remove S. cerevisiae TFs nonhomologous to K. lactis. The Graphviz (Graph Visualization Software, http://www.graphviz.org) package was used to draw the graph, and the spanning trees operations were implemented by Boost library 1.36 http://www.boost.org.
Multivariate analysis of variance was applied to verify the accuracy and determine the error rate of the computational secretome. The SignalP NN scores (mean S and D) and SignalP HMM probability were used as values in statistical analysis to determine the matrices of variance-covariance of the predicted and validations sets, and the Hotelling T2 multivariate test  was applied to calculate the probability of equality of the means vectors.
We thank the Brazilian Agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais) for financial support.
- Becerra M, Prado SD, Siso MIG, Cerdán ME: New secretory strategies for Kluyveromyces lactis β-galactosidase. Protein Engineering 2001, 14: 379–386. 10.1093/protein/14.5.379View ArticlePubMedGoogle Scholar
- Bolotin-Fukuhara M, Toffano-Nioche C, Artiguenave F, Duchateau-Nguyen G, Lemaire M, Marmeisse R, Montrocher R, Robert C, Termier M, Wincker P, Wésolowski-Louvel M: Genomic Exploration of the Hemiascomycetous Yeasts: 11. Kluyveromyces lactis . FEBS Letters 2000, 487: 66–70. 10.1016/S0014-5793(00)02282-1View ArticlePubMedGoogle Scholar
- Tzung KW, Williams RM, Scherer S, Federspiel N, Jones T, Hansen N, Bivolarevic V, Huizar L, Komp C, Surzycki R, Tamse R, Davis RW, Agabian N: Genomic evidence for a complete sexual cycle in Candida albicans. Proc Natl Acad Sci USA 2001, 98: 3249–3253. 10.1073/pnas.061628798PubMed CentralView ArticlePubMedGoogle Scholar
- Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, Waterston R, Cohen BA, Johnston M: Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 2003, 301: 71–76. 10.1126/science.1084337View ArticlePubMedGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 2003, 423: 241–254. 10.1038/nature01644View ArticlePubMedGoogle Scholar
- Kellis M, Birren BW, Lander ES: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae . Nature 2004, 428: 617–624. 10.1038/nature02424View ArticlePubMedGoogle Scholar
- Ramezani-Rad M, Hollenberg CP, Lauber J, Wedler H, Griess E, Wagner C, Albermann K, Hani J, Piontek M, Dahlems U, Gellissen G: The Hansenula polymorpha (strain CBS4732) genome-sequencing and analysis. FEMS Yeast Res 2003, 4: 207–215. 10.1016/S1567-1356(03)00125-9View ArticlePubMedGoogle Scholar
- Dietrich FS, Voegeli S, Brachat S, Lerch A, Gates K, Steiner S, Mohr C, Pöhlmann R, Luedi P, Choi S: The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 2004, 304: 304–307. 10.1126/science.1095781View ArticlePubMedGoogle Scholar
- Emanuelsson O, Nielsen H, Brunak S, Heijne S: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 2000, 300: 1005–1016. 10.1006/jmbi.2000.3903View ArticlePubMedGoogle Scholar
- Lee SA, Wormsley S, Kamoun S, Lee AFS, Joiner K, Wong B: An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms. Yeast 2003, 20: 595–610. 10.1002/yea.988View ArticlePubMedGoogle Scholar
- Bendtsen DJ, Nielsen H, Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004, 340: 783–795. 10.1016/j.jmb.2004.05.028View ArticlePubMedGoogle Scholar
- Emanuelsson O, Brunak S, Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP, and related tools. Nature Protocols 2007, 2: 953–971. 10.1038/nprot.2007.131View ArticlePubMedGoogle Scholar
- Chekmenev DS, Haid C, Kel AE: P-Match: transcription factor binding site search by combining patterns and weight matrices. Nucleic Acids Research 2005, (33 Web Server):W432-W437. 10.1093/nar/gki441Google Scholar
- Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel A, Kel OE, Ignatieva E, Ananko O, Podkolodnaya F, Kolpakov N: Databases on Transcriptional Regulation: TRANSFAC, TRRD, and COMPEL. Nucleic Acids Res 1998, 26: 364–370. 10.1093/nar/26.1.362View ArticleGoogle Scholar
- Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Prüss M, Reuter I, Schacherer F: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000, 28: 316–9. 10.1093/nar/28.1.316PubMed CentralView ArticlePubMedGoogle Scholar
- Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhäuser R: The TRANSFAC system on gene expression regulation. Nucleic Acids Res 2001, 29: 281–3. 10.1093/nar/29.1.281PubMed CentralView ArticlePubMedGoogle Scholar
- Kel AE, Gössling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E: MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 2003, 31(13):3576–9. 10.1093/nar/gkg585PubMed CentralView ArticlePubMedGoogle Scholar
- Bussereau F, Casaregola S, Lafay JF, Bolotin-Fukuhara M: The Kluyveromyces lactis repertoire of transcriptional regulators. FEMS Yeast Res 2006, 6(3):325–35. 10.1111/j.1567-1364.2006.00028.xView ArticlePubMedGoogle Scholar
- Käll L, Krogh A, Sonnhammer ELL: A Combined Transmembrane Topology and Signal Peptide Prediction Method. Journal of Molecular Biology 2004, 338(5):1027–1036. 10.1016/j.jmb.2004.03.016View ArticlePubMedGoogle Scholar
- Eisenhaber B, Bork P, Eisenhaber F: Sequence properties of GPI-anchored proteins near the omega-site: constraints for the polypeptide binding site of the putative transamidase. Protein Engineering 1998, 12: 1155–1161.View ArticleGoogle Scholar
- Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K: WoLF PSORT: protein localization predictor. Nucleic Acids Res 2007, 35: W585–7. 10.1093/nar/gkm259PubMed CentralView ArticlePubMedGoogle Scholar
- Swaim CL, Anton BP, Sharma SS, Taron CH, Benner JS: Physical and computational analysis of the yeast Kluyveromyces lactis secreted proteome. Proteomics 2008, 8: 2714–2723. 10.1002/pmic.200700764View ArticlePubMedGoogle Scholar
- Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, Mira NP, Alenquer M, Freitas AT, Oliveira AL, Sá-Correia I: The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Res 2006, 34: D446–51. 10.1093/nar/gkj013PubMed CentralView ArticlePubMedGoogle Scholar
- Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA: Structure and evolution of transcriptional regulatory networks. Current Opinion in Structural Biology 2004, 14: 283–291. 10.1016/j.sbi.2004.05.004View ArticlePubMedGoogle Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut JS, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Research 2008, 36: D281-D288. 10.1093/nar/gkm960PubMed CentralView ArticlePubMedGoogle Scholar
- Domínguez A, Fermiñán E, Sánchez M, González FJ, Pérez-Campo FM, García S, Herrero AB, San Vicente A, Cabello J, Prado M: Non-conventional yeasts as hosts for heterologous protein production. Int Microbiol 1998, 1(2):131–42.PubMedGoogle Scholar
- Chen Y, Yu P, Luo J, Jiang Y: Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT. Mammalian Genome 2003, 14(12):859–865. 10.1007/s00335-003-2296-6View ArticlePubMedGoogle Scholar
- Klee EW, Carlson DF, Fahrenkrug SC, Ekker SC, Ellis LBM: Identifying secretomes in people, pufferfish and pigs. Nucleic Acids Research 2004, 32(4):1414–1421. 10.1093/nar/gkh286PubMed CentralView ArticlePubMedGoogle Scholar
- Fischer G, Rocha EPC, Brunet F, Vergassola M, Dujon B: Highly Variable Rates of Genome Rearrangements between Hemiascomycetous Yeast Lineages. PLoS Genetics 2006, 2(3):e32. 10.1371/journal.pgen.0020032PubMed CentralView ArticlePubMedGoogle Scholar
- Stroustrup B: C++ Programming Language. In AT&T Labs. Murray Hill, New Jersey. Addison-Wesley; 2004.Google Scholar
- Hotelling H: The generalization of Student's ratio. Ann Math Statist 1931, 2: 360–378. 10.1214/aoms/1177732979View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.