Skip to main content

Table 2 Proportion of RBH-predicteda orthologs that are likely ssd-orthologsb and likely paralogs, according to Ortholuge analysis.

From: Improving the specificity of high-throughput ortholog prediction

Data setc

Probable ssd-ortholog

Orthology uncertainf

Probable paralog

 

Ratio Rangec

Proportion of introduced true-negatives in a true-negative analysisd

Proportion of RBH-predicted orthologse

Ratio Rangec

Proportion of introduced true-negatives in a true-negative analysisd

Proportion of RBH-predicted orthologse

Ratio Rangec

Proportion of introduced true-negatives in a true-negative analysisd

Proportion of RBH-predicted orthologse

rat-mouse comparison (human outgroup)

R1 ≤ 0.60 and R2 ≤ 0.55

0.8%

76%

See footnotef

16%

14%

R1 > 0.80 or R2 > 0.80

77%d

10%

P. putida-P. syringae comparison (E. coli outgroup)

R1 ≤ 0.55 and R2 ≤ 0.70

1.3%

91%

See footnotef

24%

4%

R1 > 0.75 and R2 > 0.85

87%

5%

  1. a RBH-predicted = Predicted to be orthologous using a Reciprocal-best BLAST hit approach.
  2. b "S upporting-s pecies-di vergence orthologs" = orthologs that appear to have diverged only due to speciation and have diverged at an expected relative rate for the species. Such orthologs are likely to have more similar function. See text for details.
  3. c Ratio Range for both Ratio1 (R1) and Ratio2 (R2). See Figure 8C for a schematic illustration of the cut-off ranges on a R1 × R2 plot.
  4. d Proportion of introduced true-negatives for the 25% true-negative analysis is shown here, however the actual number of true-negatives will be higher due to false-positives likely occurring in the original ortholog dataset. This analysis was used to estimate % false predictions in range (see text and Figure 8.
  5. e RBH-predicted data sets were examined using the cut-offs generated by the true-negative analysis, to identify what proportion of all RBH-predicted orthologs fell within each range. For the rat-mouse comparison 6294 RefSeq-based groups were classified into "probable ssd-ortholog", "uncertain", and "probable paralog" classes. For the Pseudomonas comparison, a total of 1456 groups were classified. Note that for an analysis of the EGO-based rat-mouse data set of 19,200 groups with the same cut-offs, 76% ssd-orthologs and 16% probable paralogs were predicted (when in-paralogs were not counted, because of the lack of differentiation of gene isoforms in the EGO data set).
  6. f This "uncertain" category falls between the other two ranges and is graphically illustrated, for ease of understanding, in Figure 8C. This category follows the formula (R1 > a and R1 < b and R2 < d) or (R2 > c and R2 < d and R1 < a), where a and b are the lower and upper cut-off values, respectively, for Ratio1 (i.e. lower = cut-off for ssd-orthologs and higher = cut-off for probable paralogs), and c and d are the lower and upper cut-off values, respectively, for Ratio2. Note this "uncertain" category also contains counts of in-paralogs detected (7% of eukaryotic data, and negligible for prokaryotic data) – see text for details.