Skip to main content

Table 1 Counts of annotations

From: Concept annotation in the CRAFT corpus

terminology

# total annotations

average # annotations per article

median # annotations per article

minimum # annotations per article

maximum # annotations per article

ChEBI

8,137

121

94

11

486

CL

5,760

86

58

0

435

Entrez Gene

12,277

183

155

3

543

GO BPa

16,184

241

194

14

738

GO CC

8,354/4,707b

125/70

97/51

9/0

499/322

GO MF

4,062

61

42

2

403

NCBITaxonc

7,449

111

91

12

378

PRO

15,594

233

207

4

704

SOd

22,090

330

328

72

935

all

99,907

1,491e

   
  1. aWe are still in the process of reviewing and editing the GO BP & MF annotations for the official 1.0 version release; therefore, the statistics for these will likely change. We will update annotation statistics on the project Web site as needed.
  2. bWe have calculated statistics for the GO CC project both with and without the annotations of cell (GO:0005623), as these account for over half of the annotations of this project. In addition to skewing these statistics, since this is such a trivial concept that is also being annotated in the CL project, users may wish to exclude these annotations for training and evaluation of systems.
  3. cIn addition to the hundreds of thousands of organism entries, the NCBI Taxonomy also has a small taxonomy of types of biological taxa (e.g., phylum, genus, subgenus). For the NCBI Taxonomy pass, there are also a small number of annotations of the mentions of these taxonomic concepts in the articles; however, we have excluded these in these statistics.
  4. dFor the SO statistics, the independent_continuant annotations (as described in the Methodology) were excluded from the analysis.
  5. eThe averages of the total number of annotations per article and of unique concepts per article were calculated simply by adding up the averages for each terminological annotation pass.
  6. Counts of annotations and of average, median, minimum, and maximum counts of annotations per article for the 67 articles constituting the initial public release of the CRAFT Corpus.