Skip to main content

Table 1 Statistics of the biomedical datasets

From: BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework

Entity type

Corpus name

Annotated sentences

Sentence max length

Gene/Protein

BC2GM

20,130

206

 

JNLPBA

22,401

221

Species

Species-800

8193

143

 

LINNAEUS

23,151

246

Disease

BC5CDR-Disease

13,938

225

 

NCBI-Disease

7287

123

Chemical

BC4CHEMD

87,674

225

 

BC5CDR-Chemical

13,938

225