From: TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction
Granularity | Target | Raw | Data cleaning | Dataset generation | ||
---|---|---|---|---|---|---|
TS | DR | RN | DB | |||
Global | Publications | 707,390 | 572,981 | 572,607 | 447,280 | 57,675 |
Genes | 21,118 | 17,658 | 17,658 | 17,658 | 8827 | |
Diseases | 23,433 | 17,032 | 17,023 | 17,023 | 6964 | |
Therapeutic | Instances | 10,744 | 4132 | 3925 | 3925 | 3925 |
Bags | 6872 | 2939 | 2857 | 2857 | 2,857 | |
Biomarker | Instances | 1,530,072 | 1,080,089 | 1,075,327 | 580,053 | 24,739 |
Bags | 605,826 | 460,334 | 460,276 | 383,358 | 17,459 | |
Genomic Alterations | Instances | 849,472 | 531,601 | 516,630 | 516,630 | 37,346 |
Bags | 289,693 | 202,548 | 202,045 | 202,045 | 15,028 |