Reference | Dataset source(s) | Count and pathology of patients/slides | Stain | Magnification | Count of images (tiles) (patches) | Dimensions and selection of images (tiles) (patches) | Data augmentation method(s) | Training (: validation) : testing ratio |
---|---|---|---|---|---|---|---|---|
[29]a | TCGA | ≈ 500 slides of UCC or adjacent normal cuts | H&E | 20 ×  | 4711 normal and 73,425 cancer (depending on slide-level labels) | 512 × 512 Non-overlapping After background removal | None | 70:30 (of slides) Stratified |
[29]b | TCGA | 388 UCC slides | H&E | Not mentioned | 185,064 total | 512 × 512 Non-overlapping Excluding normal tiles | None | 70:30 (of slides) |
[54] | Not mentioned | Eight bladder biopsy slides Pathology was not mentioned | H&E | 40 ×  | Not mentioned | For training and validation: 64 × 64 at 10 ×  Non-overlapping After background removal For testing: 64 × 64 by a sliding window with 8-pixel steps | None | Not mentioned |
[55] | The Ohio State University | 39 T1 bladder cancerc slides | H&E | 40 ×  | Excluding background tiles: 13,606 training, 1360 validation, and 1359 testing | 512 × 512 Non-overlapping Including background | None | 31:4:4 (of slides) Non-stratified for tiles/classes |
[56] | University Hospital of Stavanger, Norway | 32 UCC patients/slides | HES | 400 ×  (100 × and 25 × by down-sampling) | 139,861 (after augmentation) at each magnification level | 128 × 128 400 × tiles: non-overlapping for all classes (including background) except muscle and stroma where 50% overlap was present 100 × and 25 × tiles: centered at corresponding 400 × tiles | For muscle and stroma training tiles only: rotation and flipping | Five-fold cross-validation (of patients) using only training and testing sets (no validation set) |
[57]d | Three centers in the Netherlands | 328 non-muscle invasive UCC specimens from 232 patients | H&E | 20 ×  | ≈ 500,000 total | 572 × 572 25% overlap Excluding patches with ≥ 75% background pixels | Random color variation, flipping, and mirroring of the training patches | 60:20:20 (of patients) |
[57]e | Three centers in the Netherlands | 328 non-muscle invasive UCC specimens from 232 patients | H&E | 20 ×  | 123,132 undefined, 564,710 low grade, and 493,374 high grade | 224 × 224 25% overlap From regions of urothelium segmented by U-Net | Random flipping and mirroring of the training patches | 60:20:20 (of patients) |
[14] | TCGA and University of Florida Health Shands Hospital in the United States | 913 UCC slides | H&E | 40 ×  | Training: 148,671 Validation: 8371 Testing: not mentioned | 1024 × 1024 Randomly From manually partially annotated tumor and non-tumor regions Each has a binary annotation mask | Rotation, horizontal and vertical flips, and random crop Not mentioned to which data it was applied | 620:193:100 (of slides) |
[58] | Edinburgh hospitals | 100 muscle-invasive UCC patients/slides | IF (PanCK, Hoechst) | 20 ×  | Not mentioned | Not mentioned | None | Not mentioned |
[59]f | TCGA | 100 UCC patients/slides | H&E | 20 ×  | Excluding testing: 79,747 tumor and 92,797 non-tumor | 512 × 512 Non-overlapping Including background | Random rotation, zooming, flipping, and color-based During training | 48:12:40 (of slides) |
[59]g | TCGA | 253 UCC patients/slides (124 low and 129 high tumor mutational burden) | H&E | For AP clustering: 2.5 ×  For feature extraction: 20 ×  | 125,358 total tumor tiles, from which AP clustering selected 11,164 representative tiles | For AP clustering: 128 × 128 Non-overlapping From segmented tumor For feature extraction: 1024 × 1024 Selected by AP clustering | None | Leave-one-out cross validation |
[60] | University of Rochester Medical Center | 1177 UCC images (460 stage Ta and 717 stage T1) Not mentioned if each image came from a separate slide | H&E | 100 ×  | Not mentioned | 700 × 700 One to four images were cropped from the central part of each raw image | None | 70:30 (after sampling 460 Ta and 460 T1 imagesh) |
[61] | TCGA and local institution of the authors | Muscle-invasive UCC TCGA: 318 slides from 294 patients Local institution: 38 slides from 13 patients | H&E | 10 ×  | Training patches: 18,552, 68,880, 264,550, and 1,044,158 at effective 2.5 × , 5 × , 10 × , and 20 × , respectively Rest of patches: Not mentioned | 300 × 300 (at effective 2.5 × , 5 × , 10 × , and 20 ×) Non-overlapping From manually annotated tumor regions | Random rotation, flipping, warping, brightness, and contrast During training | TCGA: 146:73:75 (of patients) Local institution: all testing |
[15] | TCGA and University Clinic Hospital Erlangen | Muscle-invasive bladder canceri TCGA: 363 (training and validation) patients/slides Erlangen: 16 (testing) patients/slides | H&E | TCGA: Not mentioned Erlangen: 40 ×  | TCGA: 807,943 total, but only a random 250,833 were used Erlangen: Not mentioned | 512 × 512j Non-overlapping From manually annotated tumor regions | Random flipping, mirroring, contrast / saturation / brightness changes, and cutouts Not mentioned to which data it was applied | TCGA: 90:10 (of slides) stratified |
[62]k | The Stanford tissue microarray database | 2139 bladder cancerg slides (542 GATA3, 514 CK14, 544 S100P, and 539 S0084) | IHC | Not mentioned | Not mentioned | 224 × 224 (Inception-v1) and 229 × 229 (Inception-v3, and Inception-ResNet-v2) Not mentioned how tiles were derived from slides | None | 70:15:15 (of slides) |
[62]l | The Stanford tissue microarray database | 2137 bladder cancerg slides (680 Score 0, 235 Score 1, 284 Score 2, and 938 Score 3) | IHC | Not mentioned | Not mentioned | 224 × 224 (Inception-v1) and 229 × 229 (Inception-v3, and Inception-ResNet-v2) Not mentioned how tiles were derived from slides | None | 70:15:15 (of slides) |
[63] | TCGA | 332 UCC patients Slide count was not mentioned | H&E | 20 ×  | Not mentioned | 512 × 512 Non-overlapping From manually annotated tumor regions | Random horizontal and vertical flipping During training | Stratified three-fold cross-validation (of patients) |
[64] | TCGA | 381 UCC slides | H&E | For the lymphocyte CNN: 20 ×  For the necrosis CNN: 6.67 ×  | Not mentioned | Non-overlapping For the lymphocyte CNN: 100 × 100 Excluding background For the necrosis CNN: 333 × 333 | Only for the lymphocyte CNN: Random croppingm, color perturbing, rotation, and mirroring For training and testing separately | Not mentioned |
[65] | TCGA | 290 UCC patients/slides | H&E | 20 ×  | 10,000 patches per slide | 100 × 100 Non-overlapping | None | Not mentioned |
[66] | Amsterdam University Medical Center | Non-muscle invasive UCC 359 and 281 patients for 1- and 5-year survival, respectively Slide count was not mentioned | H&E | 20 ×  | 1-year: ≈ 5,500,000 (recurrence in 35%) 5-year: ≈ 4,400,000 (recurrence in 64%) | 224 × 224 Non-overlapping From urothelium segmented by U-Net [57] | None | 60:20:20 (of patients) |