Skip to main content

Dunn Index Bootstrap (DIBS): A procedure to empirically select a cluster analysis method that identifies biologically and clinically relevant molecular disease subgroups

Background

Cluster analysis is widely used in cancer research to discover molecular subgroups that inform subsequent laboratory investigations and define risk classification criteria for subsequent clinical trials. However, for any data set, there are a very large number of candidate cluster analysis methods (CCAMs) due to the many choices for feature selection criteria, number of selected features, number of clusters to define, etc. Frequently, a specific CCAM is chosen without quantifying the validity of its results in terms of reproducibility or distinctiveness of the reported subgroups.

Materials and methods

Here, we propose the Dunn Index Bootstrap (DIBS) procedure to quantify the reproducibility and distinctiveness of subgroups defined by many CCAMs. DIBS applies each CCAM to the observed data and many bootstrap data sets obtained by subject resampling. The bootstrap results are used to compute metrics of subgroup reproducibility and distinctiveness of the subgroups defined by each CCAM.

Results

DIBS was used to characterize the performance of each of 4,032 CCAMs in the analysis of one RNA-seq, two microarray gene expression, and one methylation array data set from three different cancers. In each example, DIBS identified specific CCAMs that defined subgroups of well-established biological and clinical relevance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stan Pounds.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pawlikowska, I., Liu, Z., Shi, L. et al. Dunn Index Bootstrap (DIBS): A procedure to empirically select a cluster analysis method that identifies biologically and clinically relevant molecular disease subgroups. BMC Bioinformatics 16 (Suppl 15), P12 (2015). https://doi.org/10.1186/1471-2105-16-S15-P12

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-16-S15-P12

Keywords

  • Feature Selection
  • Risk Classification
  • Combinatorial Library
  • Microarray Gene Expression
  • Microarray Gene