Skip to main content

Table 2 The performance of different data transformation methods*.

From: Revealing and avoiding bias in semantic similarity scores for protein pairs

Measure

Estimated λ**

λ = 1

Inverse (λ = -1)

Cube-root (λ = 1/3)

Square-root (λ = 1/2)

Square (λ = 2)

Log

Resnik(AVG)

0.878

0

-***

0.645

0.370

0

-

Lin(AVG)

0.890

0

-

0.659

0.474

0

-

RS(AVG)

0.925

0

-

0.632

0.355

0

-

Jiang(AVG)

0.812

0

0.081

0

0

0

0.002

Resnik(BMA)

0.938

0.661

-

0.025

0.248

0

-

Lin(BMA)

0.940

0.706

-

0.012

0.156

0.002

-

RS(BMA)

0.927

0.650

-

0.004

0.042

0.001

-

Jiang(BMA)

0.010

0

0

0

0

0

0

TO

0

0

0

0

0

0

0

NTO

0.555

0.001

0

0.366

0.478

0

0.009

Dice

0.926

0.014

0

0.384

0.890

0

0.001

Kappa

0.896

0.010

-

0.518

0.866

0

-

GIC

0.552

0

-

0.096

0

0

-

VSM

0.291

0

-

0.006

0

0

-

  1. * The numbers in the table represent the percentages of the scores that fitted normal distributions after data transformation, among all group pairs with different length combinations.
  2. ** λ was estimated by the method described in the Methods section.
  3. *** "-" indicates the transformation method was not suitable for the similarity measure.