morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring
© Wagner et al.; licensee BioMed Central Ltd. 2014
Received: 19 May 2014
Accepted: 21 July 2014
Published: 6 August 2014
Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate.
Here, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture.
Trying to find the orthologs of a given protein or DNA sequence has co-evolved with sequencing itself. Fitch defined the terms orthology and paralogy as early as 1970, when only very few protein sequences were known . With the advent of fully sequenced genomes, the computational study of orthologous protein relationships in evolution, comparative genomics, but also for substantiating the evolutionary conservation of fundamental cellular processes increased exponentially. It is widely accepted and has been proven in many cases that orthologs typically have equivalent functions across organisms . Transferring the functional annotation of a protein to its orthologs in other species is therefore routine in genome annotation. Virtually all genome centres provide information on orthologous protein families ([3–6], and see also ).
Two proteins that are each others best hit (also known as reciprocal best hit (RBH) or symmetrical best hit) in a pair-wise genome comparison are considered orthologous. Protein families are in practice more complicated, as genomes have evolved substantially, leading amongst others to gene duplications and losses . Yet, reciprocal sequence similarity is thus far one of the main established methods for defining orthology computationally and is ubiquitously used on a small- as well as a large-scale. Other orthology search methods combine sequence-based searches with phylogenetic methods or graph-clustering algorithms to circumvent computationally intense phylogenetic calculations. These include Berkeley PHOG , FAT-CAT , TreeFam , PhylomeDB , EnsembleCompara , and OrthoMCL .
Due to high sequence divergence, many true orthologs are only discovered using more sophisticated techniques like profile-based database searches (PSI-BLAST , HMMER [15, 16]), profile-profile comparisons (HHblits , HHsenser ) or drastically relaxed E-value thresholds. All those approaches exploit the fact that members of orthologous protein families, even if they are strongly diverged, still share a common sequence pattern. Though powerful in finding more remotely conserved orthologs, profile-based methods are prone to profile drift (see for instance [19, 20], or ). Manual comparison of sequence alignments is oftentimes used to detect remotely conserved orthologs in the twilight zone. Virtually all above-mentioned approaches are hard to run in an unsupervised manner. Szklarczyk and colleagues  have introduced an iterative orthology prediction pipeline based on reciprocal best-hit assessment, Ortho-Profile, that performs sequence-to-sequence, profile-to-sequence and HMM-to-HMM comparisons in a step-wise process to uncover remotely conserved orthologs. Though very powerful in detecting remotely conserved orthologs, there is to date no ready-to-use script or web-interface of Ortho-Profile available. This makes using Ortho-Profile for non-experts difficult, representing a true drawback of the software.
With morFeus, we introduce the first, web-based approach to assign remotely conserved orthologs in an unsupervised manner. To explore a substantial part of sequence space, morFeus uses BLAST with relaxed E-value thresholds. It exploits the conserved sequence pattern of orthologs by clustering the alignments of hits to the query. Bona fide orthologs serve to verify potential orthologs by the RBH-rule in iterative reciprocal BLAST searches. Finally, a score independent of the BLAST E-value, which is based on the network of orthology, is introduced to describe orthologous relationships. We have determined the accuracy and precision of morFeus by testing its performance against a subset of the HomoloGene database , as well as Inparanoid . We demonstrate the sensitivity of morFeus using a set of remotely conserved, mitochondrial protein families that were first uncovered using Ortho-Profile, as well as an example of a remotely conserved, orthologous family, whose members were shown to have identical functions in their respective organisms . morFeus is freely available as a web server at http://bio.biochem.mpg.de/morfeus/. We have submitted its source-code (Additional file 1) to Sourceforge.net (https://sourceforge.net/p/morfeus/) and its virtual machine can be requested from the authors.
morFeus web server implementation
The morFeus web server
A morFeus search starts with a BLAST search (blast+, version 2.2.27) , against a protein sequence database using relaxed parameters (default E-value threshold: 100). We recommend an E-value cut-off of at least 100 for sequences without any apparent homolog in distant species, as it covers a reasonably large sequence space. For sequences with clear homologs in distant species, the E-value can be reduced (E-value < = 10). Currently, the user can choose to search against the entire RefSeq protein database of the NCBI or subsets thereof (Bacteria, Eukaryota, Opisthokonta, plants). The choice of E-value cut-off and database will influence the run-time of morFeus (high E-value thresholds and large search space increase the run-time).
Distance-based clustering of alignments
All pair-wise alignments of the query BLAST search are clustered based on their similarity to each other. Each alignment is transformed into binary format representing the matches (1) and mismatches (0) of the query with the subject. To strengthen the contribution of rare amino acids, we use the weights of the OPTIMA substitution matrix  for the amino acid sequence of the query to calculate the similarity score of two alignments and treating identical and conserved positions as equal. is further used for distance-based hierarchical clustering with a modified average linkage approach. The conserved positions between a new alignment and the alignments in a cluster are not considered by the classical average linkage approach, as it only calculates the distance between of the new alignment and the average of all of an established cluster. We therefore calculate the distance score based on the conserved consensus of the alignments within one cluster and a new alignment (or another cluster).
The resulting hierarchical tree is analysed with respect to its structure for subsequent cluster splitting. In brief, each hierarchical tree is cut based on its distribution of distances. Based on the analysis of 254, randomly chosen protein families, we determined that an exponential function is the best-suited mathematical model to describe the majority of datasets (97% of tested families; see Additional file 2: Figure S1 and Additional file 3: Table S1). The climbing rate of the exponential function is used to identify cluster boundaries and to cut the tree into individual clusters. A small climbing rate describes highly similar alignments; the steeper the climbing rate, the more dissimilar the alignments will be. We therefore cut the tree at the position where the climbing rate accelerates from a flat to a steep curve. At this point, two more distantly related clusters are linked. A detailed description of the clustering approach, as well as the definition of clusters of the distance tree can be found in Additional file 2.
Iterative reciprocal BLAST
Each orthology candidate is submitted to a reciprocal BLAST search and evaluated for its fitness to become a bona fide ortholog. In order to maximize the benefit from the RBH hypothesis, several additional features have been implemented in morFeus’ reciprocal BLAST searches: 1) morFeus does several cycles of reciprocal BLASTs, taking the output of the previous rounds into account for selecting orthology candidates and deciding on orthology relationships; hence, morFeus considers not only the query but also all bona fide orthologs when deciding on the orthology of novel candidates; 2) if a protein is selected for reciprocal BLAST, morFeus includes all proteins present in its respective candidate cluster for reciprocal BLAST searches; 3) all sequences that are found as RBH by more than two verified orthologs are likewise selected for reciprocal BLASTs. To start iterative reciprocal BLASTs, all sequences with more than 80% identity to the query are selected, as are all sequences that are located within the query cluster. In the first round, only the query is taken to decide on the orthologous relationship of a candidate. For all candidates with an E-value < 10–5, we strictly apply the RBH-rule. However, for sequence relationships with a statistically less reliable E-value (>10-5), it cannot be excluded that the second or even the third hit in a species is the true ortholog . An orthology candidate is only excluded from further analysis when it is rejected by more than 33% of bona fide orthologs as a RBH. Reciprocal BLAST searches stop once no new orthology candidates are found.
Orthology network construction and centrality scoring
Once relationships between orthologs based on reciprocal BLASTs have been established, morFeus constructs a network of orthology, which reflects the binary relationships between the included sequences. In the orthology network, we discriminate between best-best (bb), best-acceptable (ba), acceptable-acceptable (aa) relationships, as well as one-sided relationships of the type best (b) and acceptable (a). The latter reflect situations, where only one of the two proteins finds the other by BLAST. The type of relationships (edges) between the proteins (nodes) enables us to score the individual candidates using centrality scoring. More precisely, we apply Eigenvector centrality  as implemented in NetworkX  to score each individual node in the orthology network. To assign initial scores, we use the type of connection between the nodes with descending values: bb = 1, ba = 0.5, aa = 0.25, b = 0.125, a = 0.0625. We use the centrality score as the network score for each node, as it represents a measure of similarity of a node to the group of collected orthologs that is independent of the BLAST E-value.
Description of web output
MorFeus results for the protein family Apc13
As an example of a highly diverged protein family, we chose S. pombe Apc13 as a query, a subunit of the Anaphase Promoting Complex that is remotely conserved from yeast to man . There is no HomoloGene group assigned to fission yeast Apc13. The ANAPC13 HomoloGene group from eukaryotes only includes vertebrates. Likewise, Inparanoid failed to detect any orthologs in metazoans for this fission yeast protein. Of the phylogeny software mentioned above, none could complete this protein family from fungi to mammals.
morFeus found 700 hits for Sp Apc13 with our settings (E-value of 1000, database RefSeq-opisthokonta) and after 380 reciprocal BLAST searches, it identified 70 orthologs from fungi, nematodes, arthropods, vertebrates and mammals (Figure 2 and Additional file 2: Figure S2, Table S2 and Additional file 3: Table S3; see Figure 3 a for a multiple sequence alignment of a subset of Apc13 orthologs). morFeus readily discovered orthologs based on the similarity of their alignments (Additional file 2: Figure S3 a) and was able to discriminate between false positive and true positive hits solely based on a family-specific conservation pattern: although mouse Apc13 is only the 3rd BLAST-hit from Mus musculus, morFeus distinguished its sequence as the orthologous one (Additional file 2: Figure S3 b). morFeus is thus able to effectively distinguish true positive orthologs from a large number of hits in relaxed BLAST searches (Additional file 3: Table S3). 70 of the initial 700 hits are identified by morFeus as orthologs. 66 hits in the initial BLAST are true positive Apc13 orthologs. Only one of the orthologs is not found by morFeus: Strongylocentrotus purpuratus Apc13-like protein (XP_001182211) is rejected, because a second, nearly identical sequence exists in the RefSeq database (XP_001184631). The two sequences exclude themselves due to the RBH-rule. While morFeus did not find Apc13 orthologs from all species, the identified sequences from different phyla can retrieve most missing family members from their respective phylum with a standard BLAST search. Four of the identified 70 sequences are false positives (Additional file 2: Table S2 and Additional file 3: Table S3, see Additional file 2: Figure S3 c for pair-wise alignments of false positive identifications). This amounts to a Precision of 93% for the remotely conserved Apc13 protein family. Note that Recall, Precision and Accuracy of morFeus will differ for each protein family. Additional file 3: Table S11 lists Precision values for other, remotely conserved protein families found by morFeus. morFeus results currently exclude all hits that are found as a RBH by the query alone. With this setting, we most likely miss some true positives. None of the Saccharomycetae orthologs have been found, even though they are known (Swm1 for Saccharomyces cerevisiae). Yet, the number of false positives rises when the query alone is sufficient to include a potential orthologous sequence.
S. japonicus Apc13 identifies more vertebrate and mammalian Apc13 members than S. pombe and also produces no false positive hits (Precision = 100%), when submitted to morFeus (Additional file 2: Table S4). We have observed this in other protein families as well. This is not surprising, as each query will find a slightly different set of hits in a BLAST search. The more divergent two input queries from the same protein family, the more sequence space can consequently be covered. We therefore recommend using more than one member of a protein family as morFeus queries.
Performance in detecting orthologs of conserved protein families
Performance of morFeus, HomoloGene and Inparanoid
HomoloGene - morFeus
Inparanoid - morFeus
HomoloGene - Inparanoid
Inparanoid - HomoloGene
morFeus reached a Recall of 86% and a Precision of 94% when compared against the HomoloGene database, resulting in an F1-score of 89%. Due to the high number of BLAST hits – and therefore true negatives, morFeus’ Accuracy amounted to 99%.
Next, we compared morFeus results of the HomoloGene test set against Inparanoid orthology searches. Results were very similar, with 85% Recall, 94% Precision, an F1-score of 88% and an Accuracy of 98%. Finally, we compared the results from HomoloGene and Inparanoid with each other. When we took HomoloGene as a basis, Inparanoid reached a Recall of 83% and a Precision of 91%, giving an F1-score of 85% and an Accuracy of 99% (300 BLAST hits were considered as true negatives). HomoloGene, when compared to Inparanoid only had a Recall of 66%. This is mostly due to the fact that in conflicting protein family situations, HomoloGene does not assign an ortholog, while Inparanoid does. The Precision was comparable to the other test situations with 90%, resulting in an F1-score of 73% and an Accuracy of 98%.
Based on our data we conclude that morFeus is an accurate and efficient method to detect conserved orthologs and is in its overall performance comparable to the HomoloGene resource, as well as the orthology search engine Inparanoid. We could not observe a high number of false positives. morFeus could indeed complete further 16 (or 8% of) families that were annotated only in fungi and/or plants with orthologs from nematodes, arthropods and vertebrates. In total, morFeus found additional 90 orthologs for the HomoloGene test set (see Additional file 3: Table S10).
Comparison of morFeus with Ortho-Profile: detecting remotely conserved, mitochondrial proteins in higher eukaryotes
Identification of remotely conserved, experimentally verified mitochondrial proteins using morFeus
Gene name yeast
Gene name vertebrate/human
RefSeq ID vertebrate/human
Found with morFeus
FAM36A (M. mulatta)
Only found with S. japonicus, finds S. cerevisiae Pet20 (NP_015166) as ortholog
We next took all 598 proteins that contained assigned human orthologs from  to further test the performance of morFeus on large-scale (E-value was 100, database RefSeq-opisthokonta). We eliminated all proteins that already had bona fide orthologs in higher eukaryotes assigned by HomoloGene and searched with those 184 proteins that did not contain any orthologs from Opisthokonta (Additional file 3: Table S13). 8 searches were stopped, as more than 1500 hits were found, suggesting a multi-branching family with sufficient sequence similarity for phylogenetic methods. For 150 (86%) of the remaining 176 proteins, morFeus readily discovered the fission yeast (if available), as well as vertebrate/mammalian ortholog. In 21 cases (12%), an identified ortholog from the morFeus search with the budding yeast protein was used to retrieve orthologs in higher eukaryotes in a subsequent morFeus run. The use of intermediate species is one of the recommended procedures to discover very distantly related orthologs in other species. Five of the 176 proteins were members of multi-branching families with at least one gene duplication in S. cerevisiae. In all those cases, the yeast paralog was the putative sequence ortholog assigned by Ortho-Profile. It is for this reason that no ortholog was detected using morFeus. Taken together, we conclude that morFeus is as efficient as Ortho-Profile in discovering remotely conserved orthologs with the advantage of a ready-to-use web interface.
morFeus is a new, web-based method to assign remotely conserved orthologs. Based on sampling of a large part of the sequence space due to relaxed E-value settings, the comparison of pair-wise sequence alignments and iteratively establishing reciprocal similarity relationships, our software is able to efficiently identify orthologs with high sequence divergence. We introduce a measure of orthology independent of the E-value, which is based on the connectivity of sequences in a network of orthology. morFeus searches a large part of sequence space and can detect more divergent family members. This is demonstrated with the help of the remotely conserved, mitochondrial protein families introduced by , as well as the example we chose (Apc13 from S. pombe). morFeus is so far the first web-based, ready-to-use software that can reliably detect remotely conserved orthologs of a protein in an unsupervised manner.
Ortho-Profile is in our view the most similar search engine to morFeus. It is designed to detect remotely conserved orthologs by a step-wise procedure to identify them based on the similarity of either their sequences, their sequence profiles or their HMMs. Unlike morFeus, Ortho-Profile does not have a ready-to-use web-interface. It is therefore difficult to use for non-experts, which is one of the main target groups for morFeus. As Ortho-Profile partly relies on sequence profiles and HMMs, respectively, it is also not clear, how specific the pipeline is in multi-branching – and also multi-domain families.
Though we consider morFeus very powerful in finding remote orthologs, we acknowledge its limitations: First, morFeus relies fully on BLAST results. If an orthologous sequence is not present in the sampled sequence space or if BLAST fails to detect the sequence with the chosen settings, morFeus will not list it as an ortholog, as is the case in the Apc13 family. Though the ortholog of S. cerevisiae Apc13 is known, Sp Apc13 does not find it in its initial BLAST search; thus, morFeus fails equally to report this sequence as an Apc13 ortholog. This limitation may be overcome in many cases by using PSI-BLAST instead of BLAST for the initial sequence search, a feature we are planning to implement in future releases of morFeus. We furthermore observed that the success of a morFeus search depends partly on the chosen query sequence. We generally recommend using more than one of the bona fide orthologs as a query for a morFeus search to detect more and also more divergent members of an orthologous family. Second, the Eigenvector centrality scores that are calculated for nodes are not discriminative at low values. This is not unexpected as true positives have in some cases a best-best (or best-acceptable) relationship to only two or a few members of an orthologous family. It is for this reason that we do not exclude putative orthologs based on a low network score. morFeus’ network score is however discriminative at large values and can be used as an independent measure to ascertain an orthologous relationship. Third, morFeus might not be able to distinguish between orthologs and paralogs in all cases. This is a result of our procedure to include or exclude orthology candidates based on their relationship to bona fide orthologs. We only exclude candidates that are rejected by more than 33% of bona fide orthologs as a RBH. By raising this exclusion cut-off, we lose many true positive hits. For the intended use cases of morFeus, where virtually no ortholog is found in more divergent species, finding two potential co-orthologs is better than finding none. Further analysis of the identified sequences using for instance phylogenetic analysis can bring final clarity to the sequence relationships. One possibility to overcome this in our software would be to perform orthology assignment based on the reciprocal smallest distance algorithm (RSD, ), which employs phylogenetics to distinguish between orthologs and paralogs. Though it would be technically possible to implement RSD in morFeus, this procedure is extremely time-consuming, as many sequences needed to be tested by RSD.
When should morFeus be used? morFeus is at its best, when a user searches the (co-)orthologs of a sequence with no close homologs in divergent species and therefore standard similarity search methods fail. If a sequence is a member of a larger protein family, for instance the kinase family, nuclear hormone receptors or Zinc fingers just to name a few, morFeus will not be the method of choice and phylogenetic approaches are better suited to identify orthologs. morFeus is however the method of choice when dealing with sequence orphans or sequences, where classical search methods only detect orthologs in closely related species.
morFeus is the first web-based, fully automated method to detect remotely conserved orthologs of sequence orphans. We have realized this by 1) relaxing search parameters of BLAST to cover more sequence space of potential orthologs; 2) clustering resulting BLAST-alignments according to their similarity in order to identify conserved sequence patterns; 3) performing iterative reciprocal BLAST-searches to not only include orthology candidates that are picked up by more than one verified ortholog in previous rounds, but also to allow already confirmed orthologs, which fulfil the reciprocal best hit (RBH) relationship with the query to serve as RBH-recipients for further candidates; 4) and finally, by introducing a measure of orthology that is independent of the BLAST E-value and is based on the connectivity of a protein in its network of orthology. Our method is equally specific in the detection of well-conserved orthologs and more sensitive in finding remotely conserved orthologs than other web-based software suites available in the field to date.
Availability and requirements
The authors thank Assa Yeroslaviz, Corinna Klein, Thomas Wiehe, James Stewart and Carolin Meharg for critical input and reading of the manuscript.
MV was supported by BMBF-Project 01IH11003C (NGSgoesHPC), JMV was supported by BMBF-Project 0315759 (The Virtual Liver Network (VLN)). This work was supported by the Max Planck Society.
- Fitch WM: Distinguishing homologous from analogous proteins. Syst Zool. 1970, 19 (2): 99-113. 10.2307/2412448.View ArticlePubMedGoogle Scholar
- Gabaldon T, Koonin EV: Functional and evolutionary implications of gene orthology. Nat Rev Genet. 2013, 14 (5): 360-366. 10.1038/nrg3456.View ArticlePubMedGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011, 39: D38-D51. 10.1093/nar/gkq1172.View ArticlePubMed CentralPubMedGoogle Scholar
- O’Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005, 33 (Database issue): D476-D480.View ArticlePubMed CentralPubMedGoogle Scholar
- Kersey PJ, Staines DM, Lawson D, Kulesha E, Derwent P, Humphrey JC, Hughes DS, Keenan S, Kerhornou A, Koscielny G, Langridge N, McDowall MD, Megy K, Maheswari U, Nuhn M, Paulini M, Pedro H, Toneva I, Wilson D, Yates A, Birney E: Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res. 2012, 40 (Database issue): D91-D97.View ArticlePubMed CentralPubMedGoogle Scholar
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ: The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011, 39 (Database issue): D876-D882.View ArticlePubMed CentralPubMedGoogle Scholar
- Ostlund G, Schmitt T, Forslund K, Kostler T, Messina DN, Roopra S, Frings O, Sonnhammer EL: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010, 38 (Database issue): D196-D203.View ArticlePubMed CentralPubMedGoogle Scholar
- Datta RS, Meacham C, Samad B, Neyer C, Sjolander K: Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res. 2009, 37 (Web Server issue): W84-W89.View ArticlePubMed CentralPubMedGoogle Scholar
- Afrasiabi C, Samad B, Dineen D, Meacham C, Sjolander K: The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification. Nucleic Acids Res. 2013, 41 (Web Server issue): W242-W248.View ArticlePubMed CentralPubMedGoogle Scholar
- Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J, Durbin R: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 2006, 34 (Database issue): D572-D580.View ArticlePubMed CentralPubMedGoogle Scholar
- Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, Marcet-Houben M, Gabaldon T: PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Nucleic Acids Res. 2011, 39 (Database issue): D556-D560.View ArticlePubMed CentralPubMedGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19 (2): 327-335.View ArticlePubMed CentralPubMedGoogle Scholar
- Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13 (9): 2178-2189. 10.1101/gr.1224503.View ArticlePubMed CentralPubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.View ArticlePubMed CentralPubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.View ArticlePubMedGoogle Scholar
- Eddy SR: Hidden Markov models. Curr Opin Struct Biol. 1996, 6 (3): 361-365. 10.1016/S0959-440X(96)80056-X.View ArticlePubMedGoogle Scholar
- Remmert M, Biegert A, Hauser A, Soding J: HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012, 9 (2): 173-175.View ArticleGoogle Scholar
- Soding J, Remmert M, Biegert A, Lupas AN: HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res. 2006, 34 (Web Server issue): W374-W378.View ArticlePubMed CentralPubMedGoogle Scholar
- Pearl FM, Lee D, Bray JE, Buchan DW, Shepherd AJ, Orengo CA: The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci. 2002, 11 (2): 233-244.View ArticlePubMed CentralPubMedGoogle Scholar
- Muller A, MacCallum RM, Sternberg MJ: Benchmarking PSI-BLAST in genome annotation. J Mol Biol. 1999, 293 (5): 1257-1271. 10.1006/jmbi.1999.3233.View ArticlePubMedGoogle Scholar
- Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C: Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998, 284 (4): 1201-1210. 10.1006/jmbi.1998.2221.View ArticlePubMedGoogle Scholar
- Szklarczyk R, Wanschers BF, Cuypers TD, Esseling JJ, Riemersma M, van den Brand MA, Gloerich J, Lasonder E, van den Heuvel LP, Nijtmans LG, Huynen MA: Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase. Genome Biol. 2012, 13 (2): R12-10.1186/gb-2012-13-2-r12.View ArticlePubMed CentralPubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005, 33 (Database issue): D39-D45.View ArticlePubMed CentralPubMedGoogle Scholar
- Schwickart M, Havlis J, Habermann B, Bogdanova A, Camasses A, Oelschlaegel T, Shevchenko A, Zachariae W: Swm1/Apc13 is an evolutionarily conserved subunit of the anaphase-promoting complex stabilizing the association of Cdc16 and Cdc27. Mol Cell Biol. 2004, 24 (8): 3562-3576. 10.1128/MCB.24.8.3562-3576.2004.View ArticlePubMed CentralPubMedGoogle Scholar
- Kann MG, Goldstein RA: Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison. Proteins. 2002, 48 (2): 367-376. 10.1002/prot.10117.View ArticlePubMedGoogle Scholar
- Bonacich PB: Factoring and weighing approaches to status scores and clique identification. J Math Sociol. 1972, 2: 113-120. 10.1080/0022250X.1972.9989806.View ArticleGoogle Scholar
- Hagberg AA, Schult DA, Swart PJ: Exploring network structure, dynamics and function using NetworkX. Proceedings of the 7th Python in Science Conference (SciPy2008). Edited by: Varoquaux G, Vaught T, Millman J. 2008, Pasadena, CA USA, 11-15.Google Scholar
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27 (3): 431-432. 10.1093/bioinformatics/btq675.View ArticlePubMed CentralPubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.View ArticlePubMed CentralPubMedGoogle Scholar
- Jensen LJ, Julien P, Kuhn M, von Mering C, Muller J, Doerks T, Bork P: eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res. 2008, 36 (Database issue): D250-D254.PubMed CentralPubMedGoogle Scholar
- Wall DP, Fraser HB, Hirsh AE: Detecting putative orthologs. Bioinformatics. 2003, 19 (13): 1710-1711. 10.1093/bioinformatics/btg213.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.