Skip to main content

Table 12 Results: Codes with Less Than Five Positive Documents. We see that performance improves as the number of positives improves. The row labeled GT4 shows the performance for these codes when they are processed in the standard way using all available positive documents. The test set represented in the FirstFiveTest column is limited to documents time stamped on or earlier to the time stamp of the fifth positive document that is not in the training set. The test set in the FullTest column includes the rest of the temporal stream following the training set.

From: GO for gene documents

Hierarchy

# +ves

Threshold

FullTest FScore

Threshold

FirstFiveTest FScore

MF

1

-0.942

0.141

-0.924

0.1682

MF

2

-0.892

0.1999

-0.9

0.2441

MF

3

-0.908

0.2583

-0.87

0.2825

MF

4

-0.906

0.2713

-0.86

0.3218

MF

GT4

 

0.4209

  

BP

1

-0.942

0.0881

-0.95

0.1081

BP

2

-0.94

0.1440

-0.936

0.1791

BP

3

-0.904

0.1591

-0.894

0.1851

BP

4

-0.898

0.1931

-0.896

0.2251

BP

GT4

 

0.3480

  

CC

1

-0.946

0.1439

-0.948

0.1791

CC

2

-0.916

0.1631

-0.896

0.1977

CC

3

-0.896

0.2012

-0.848

0.2067

CC

4

-0.872

0.2144

-0.844

0.2488

CC

GT4

 

0.3795

 Â