Skip to main content

Table 2 Statistics of the ABAG annotated corpus with 60:20:20 splitting ratio and group-wise random clustering method

From: Extract antibody and antigen names from biomedical literature

Characteristics

Training

Developing

Testing

Total

No. of PubMed article abstracts

1930

640

640

3210

No. of antibody mentions

6627

1948

2260

10,835

No. of unique antibody mentions

1963

666

796

3144

No. of antigen mentions

12,198

3981

4047

20,226

No. of unique antigen mentions

3235

1335

1271

4950

Avg. sentences per abstract

7.5

7.5

7.5

7.5

Avg. words per abstract

182.9

181.8

183.5

182.8

Avg. mentions per abstract

9.8

9.3

9.9

9.7