CRAFT Corpus (full/initial release)
|
~790,000/~560,000
|
97/67 articles
|
sources of MGI annotations of mouse genes/gene products
|
Open Biomedical Ontologies (CL, ChEBI, SO, PRO, GO BP/CC/MF, NCBITaxon), Entrez Gene
|
~140,000/~100,000
|
ABGene
| |
4,265 sentences
| |
n/a
|
~8,200
|
BioInfer
|
~34,000/~30,000f
|
1,100 sentences
|
protein-protein interactions
|
~100 entity classes, ~100 relationships
|
~6,300 named entities, ~2,700 relationshipsg
|
CALBC corpus
|
~16,000,000
|
150,000 abstracts
|
immunology
|
UniProt, NCBITaxon, UMLSh
|
~2,700,000
|
CLEF Corpus
| |
variousi
|
clinical/cancer data
|
6 concept types
| |
FetchProt Corpus
| |
200 articles
|
protein tyrosine kinase activity
|
10 concept types, UniProt
|
~3,800
|
4th i2b2/VA Challenge Corpus
| |
~750 discharge summaries
|
clinical data
|
3 concept types
|
~2,000
|
GENETAG
|
~548,000
|
20,000 sentences
| |
n/a
|
~25,000 genes/proteins, ~19,000 alternative lexical forms
|
GENIA 3.0
|
~440,000
|
2,000 abstracts
|
human blood-cell transcription factors
|
35 entity classes, 34 process classes
|
~93,000 entities, ~36,000 events
|
GREC
| |
240 abstracts
|
E. coli gene regulation
|
433 classes
|
~5,000
|
ITI TXM PPI/TE Corpora
|
~2,000,000/ ~1,900,000
|
217/238 articles
|
protein-protein interactions/tissue expression
|
9/13 concept types, Entrez Gene, RefSeqj, ChEBI, MeSH, NCBITaxonk
|
~160,000/~164,000
|
MedPost
|
~156,000
| | | | |
OntoNotes 2.0
|
~500,000
|
1,000 newswire documents
|
English & Chinese news
|
1000 s of WordNet senses, 50 concept typesl
|
~58,000 verbsm
|
PennBioIE Oncology/CYP v1.0 Corpora
|
~381,000 (~327,000)/~313,000 (~274,000)
|
1,414/1,100 abstracts
|
medical genetics of oncology/inhibition of cytochrome P450 enzymes
|
n/a
| |
Yapex Corpus
| |
200 abstracts
|
protein-protein interactions
|
n/a
|
~3,700
|