Block Forests: random forests for blocks of clinical and omics covariate data

BMC Bioinformatics

Table 1 Overview of the data sets used in the comparison study

Name	Cancer type	Sample size	Uncensored observations
BLCA	Bladder Urothelial Carcinoma	310	32%
BRCA	Breast Invasive Carcinoma	863	9%
CESC	Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma	206	15%
COAD	Colon Adenocarcinoma	350	22%
ESCA	Esophageal Carcinoma	121	21%
GBM	Glioblastoma Multiforme	154	73%
HNSC	Head and Neck Squamous Cell Carcinoma	411	35%
KIRC	Kidney Renal Clear Cell Carcinoma	322	22%
KIRP	Kidney Renal Papillary Cell Carcinoma	249	10%
LGG	Brain Lower Grade Glioma	454	21%
LIHC	Liver Hepatocellular Carcinoma	298	28%
LUAD	Lung Adenocarcinoma	424	30%
LUSC	Lung Squamous Cell Carcinoma	365	39%
OV	Ovarian Serous Cystadenocarcinoma	261	54%
PAAD	Pancreatic Adenocarcinoma	142	49%
PRAD	Prostate Adenocarcinoma	425	2%
READ	Rectum Adenocarcinoma	138	16%
SARC	Sarcoma	183	16%
SKCM	Skin Cutaneous Melanoma	264	25%
STAD	Stomach Adenocarcinoma	284	27%
UCEC	Uterine Corpus Endometrial Carcinoma	503	13%

The following information is given: Name of the data set, cancer type, sample size and the percentage of observations for which the survival time was uncensored. Note that the TCGA Project ID of each data set is “TCGA-[Name]”, with “[Name]” being the name of the data set (given in the first column)

ISSN: 1471-2105