Skip to main content

Table 1 Clustering results on six keys: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment with the number of keys

From: Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata

No. of keys

Key

 

Age

25

Age unit, age group, age_years, age (y), age in years, donor_age, age (months), age (years), age (yrs), patient age, age at diagnosis, age at diagnosis (years), age at sample (months), patient age (yrs), tumor stage, age.brain, age (weeks), stage, gestational age (weeks), age.blood, sample age, age at surgery, age, age months, age(years)

5

Pathological_stage, growth/development stage, growth stage, pathological stage, development stage

 

Cell line

12

Cell line name, cell line source age, cell line type, cell lines, cell line background, cell lineage, cell line/clone, cell line source gender, cell line source ethnicity, cell line, cell line passage, cell line source

3

Origin of a cell line, source cell line, growth pattern of cell line

14

Tissue/cell line, cell line source tissue, dendritic cell lineages, coriell cell line repository identifier, cell line tissue source, parental cell line, tumor cell line, donor cell line, tissue/cell lines, injected cell line, tumour cell line used for conditioning medium, insect cell line, cell line origin, primary cell line

 

Disease

5

Subject’s disease state, primary disease, histology (disease state), advanced disease stage, advanced disease state

22

Disease-state, meibomian gland disease state, disease, disease/treatment status, disease status of patient, disease progression, disease stage, disease subtype, status of disease, clinical characteristic/disease status, patient disease status, disease development, disease phase, diseased, disease/cell type, extent of disease, disease state, disease state (host), disease severity, disease_state, disease model, disease type

7

Disease_specific_survival_years, disease status, diseasestatus, disease_specific_survival_event, disease outcome, disease exposure, disease_status

16

Disease-free survival (dfs), disease-free interval (months), disease free interval (days), disease specific survival (years), stage of disease (inss), disease relapse (event), disease_free_survival_event, disease-free survival (dfs) event, disease_free_survival_years, disease progression (event), stage of disease, disease free interval (months), age at disease onset, duration of disease, disease free survival in months, disease free survival time (months)

 

Strain

22

Background mouse strain, background/strain, background strains, strain, strain/accession, strain or line, strain/background, strain/genotype, strain/ecotype, strains, strain number, strain [background], strain phenotype, strain/line, strain description, strain source, strain fgsc number, strain background (bloomington stock number), strain (donor)

3

Toxoplasma parasite strain, infection (virus strain), human cytomegalovirus strain

16

Bacteria strain, siv strain, viral strain, recipient strain, substrain, parent strain, parental strain, host strain, parasite strain, host strain background, maternal strain, virus strain, scanstrain, mice strain, mouse strain, plant strain

 

Tissue

14

Sample tissue of origin, cell line source tissue, cell/tissue type, original tissue, source tissue, cell line, tissue source, organ/tissue, original tissue source, primary tissue, sample tissue type, sample type, cell tissue, source tissue type, organ/tissue type

3

Age of ffpe tissue, day of tissue dissection, age at tissue collection (days)

78*

Tissue separation, tissue & age, tissuer type, tissue_detail, tumor tissue source, tissue/tumor subtype, tissue derivation, tissue, tissue origination, tissue site, tissue_mg, tissue/cell lines, tumor/tissue type, tissue subtype, tissue_biological, tissue processing, tissue/development stage, harvested tissue type, tissue and developmental stage, tissue isolated

 

Treatment

67*

Pretreatment drug & dose, pre-treatment, treatment2_in vivo treatment, treatment stage, treatments, treatment agent, treatment_molecule, lighttreatment, drug treatment time point, treatment result, treatment_2, treatment_1, tissue treatment, cactus host treatment, inducer treatment, sirna treatment group, treatment/exposure, maternal treatment group, treatment_dose, treatment dosage

12

l-dopa treatment, patient treatment plan, nrti treatment status, culture conditions/treatment, tamoxifen-citrate treatment, disease/treatment status, globin treatment, experimental treatment, dopamine-agonists treatment, oxygen treatment, tap treatment, lenolidamide treatment

31

Time of treatment, treatment time, tissue/treatment id, treatment period, days after treatment, treatment duration, pre-treatment psa, treatment time (rhgaa), weeks of treatment, tnfa treatment time point, treatment_time, treatment length, time (days post-treatment), order of treatment, bl treatment level, treatment-time, time after treatment, day of dss treatment, time post treatment, time of tamoxifen treatment, h2o2 treatment level, days of ddc treatment, weeks after treatment, post-treatment time, length of treatment, duration of il-6 treatment, treatment start age, duration of treatment, days of treatment, time post-treatment, treatment age

  1. *Due to space constraints, only the first 20 keys are reported in this table for the “age” cluster with 78 keys and the “treatment” cluster with 67 keys, respectively. All results are available on our website