Skip to main content

Subtyping irritable bowel syndrome using cluster analysis: a systematic review



Irritable bowel syndrome (IBS) is a common chronic functional gastrointestinal disorder associated with a wide range of clinical symptoms. Some researchers have used cluster analysis (CA), a group of non-supervised learning methods that identifies homogenous clusters within different entities based on their similarity.

Objective and methods

This literature review aims to identify published articles that apply CA to IBS patients. We searched relevant keywords in PubMed, Embase, Web of Science, and Scopus. We reviewed studies in terms of the selected variables, participants’ characteristics, data collection, methodology, number of clusters, clusters’ profiles, and results.


Among the 14 articles focused on the heterogeneity of IBS, eight of them utilized K-means Cluster Analysis (K-means CA), four employed Hierarchical Cluster Analysis, and only two studies utilized Latent Class Analysis. Seven studies focused on clinical symptoms, while four articles examined anocolorectal functions. Two studies were centered around immunological findings, and only one study explored microbial composition. The number of clusters obtained ranged from two to seven, showing variation across the studies. Males exhibited lower symptom severity and fewer psychological findings. The association between symptom severity and rectal perception suggests that altered rectal perception serves as a biological indicator of IBS. Ultra-slow waves observed in IBS patients are linked to increased activity of the anal sphincter, higher anal pressure, dystonia, and dyschezia.


IBS has different subgroups based on different factors. Most IBS patients have low clinical severity, good QoL, high rectal sensitivity, delayed left colon transit time, increased systemic cytokines, and changes in microbial composition, including increased Firmicutes-associated taxa and depleted Bacteroidetes-related taxa. However, the number of clusters is inconsistent across studies due to the methodological heterogeneity. CA, a valuable non-supervised learning method, is sensitive to hyperparameters like the number of clusters and random initialization of cluster centers. The random nature of these parameters leads to diverse outcomes even with the same algorithm. This has implications for future research and practical applications, necessitating further studies to improve our understanding of IBS and develop personalized treatments.

Peer Review reports


Irritable bowel syndrome (IBS) is a chronic functional gastrointestinal (GI) disorder that manifests with abdominal pain, bloating, and altered bowel habits in the absence of any organic disorder or biological markers [1,2,3]. IBS predominantly affects women [4]. The global prevalence based on ROME III criteria is 9.2%, whereas, based on the ROME IV version, it is estimated at 3.8% [5]. The burden of IBS is significant: individual patients, their families, society, and health care system are all affected [5]. Patients with IBS frequently report lower quality of life (QoL). Particularly, those in the diarrhea-predominant subgroup have lower income because of their absence from work, and their partner and family are also affected by the burden of the disease because these patients might avoid traveling, socializing, etc.

Diagnosing and treating patients with IBS is challenging because there is no single cause [6]. The following possible causes have been considered: mucosal inflammation, mucosal immune activation, changes in intestinal permeability, alteration in the gut microbiome, and post-infectious changes [7]. According to the last published criteria (ROME IV), IBS has four subtypes [8]. However, almost one-third of patients may experience intermittent symptoms. This intermittency complicates subtyping; patients in the same subgroup may have suffered from different underlying mechanisms [9, 10].

To address heterogeneity in research and analysis, various approaches have been used, including subgroup analysis, stratification, regression modeling, and cluster analysis (CA) [11,12,13,14]. CA, in particular, has been valuable in identifying distinct subgroups within datasets. However, it is important to choose the appropriate clustering algorithm to ensure reliable and meaningful results. Researchers should carefully consider the best approach to address heterogeneity and enhance the interpretation of their findings.

As a result, a series of researchers decided to use CA, a group of non-supervised learning methods that classifies entities or objects into different homogenous groups or clusters based on their similarity [15,16,17]. Many algorithms have been introduced, but some are more frequently used [18]. CA has several benefits; for instance, it improves diagnostic criteria to conclude a more comprehensive and meaningful profile, interprets heterogeneous outcomes, and adjusts treatments [19,20,21]. CA has been used in hypothesis generation, finding a topography, data exploration, and data reduction [22,23,24]. CA also has some specific usage; it can identify a group of genes with similar biological functions [25] or identify a group of patients that need targeted interventions [22, 23].

CA has several advantages over other methods. It allows researchers to uncover hidden patterns and structures in complex datasets without making assumptions about data distributions making it a versatile technique [14]. However, it is important to note that CA is sensitive to the initial configuration, and choice of algorithm, which means different results can be obtained [26]. To address this, researchers should carefully select appropriate algorithms and validate the stability of the clusters obtained [27]. Furthermore, it is essential to understand that CA alone does not provide casual relationships or explanations, so, further analysis and interpretation are required. Despite these limitations, CA remains a powerful tool for gaining insights into data structures across various fields.

However, there are challenges with using CA [28]. The sample size is calculated based on the variables included in the analysis and the number of identified clusters [29]. To achieve sufficient power, we need to have a large sample size (greater than 200) and split it into two groups: one for training and one for validation [30]. The results can be reported when the same subgroups are obtained in multiple samples of the target population [31]. This article reviews CA studies in IBS.


We conducted the present systematic review based on preferred reporting items for systematic reviews and meta-analysis (PRISMA) guidelines (Additional file 1).

Search strategy

We searched PubMed, Embase, Scopus, and the Web of Science from initiation until November 03, 2022 for relevant published articles in English without restricting the publication date. We used a combination of the keywords related to irritable bowel and cluster analysis. The Additional file 2 includes the queries used for searching in each database.

Selection criteria

We included studies on patients with IBS who were over the age of 17 years old and had not any organic GI disorder. Non-English and animal articles were excluded.

Methods of review

The study selection is a four-step process: identification, screening, eligibility, and inclusion. At first, in the identification step, we gather all search records that were obtained from databases and removed duplicates. Then, we screened search results by title/abstract. In the third step, we assessed the potentially eligible articles by their full text and included them in our systematic review if they met the inclusion criteria.

Data extraction

We evaluated the methods and results section of each included article. Specifically, we retrieved details on the following items: study design, participants’ characteristics, diagnostic criteria, the variables considered for clustering, data collection methods, data preprocessing techniques, clustering algorithms, validation, interpretation of the results, number of clusters, findings, limitations, and suggestions for future studies.

Study design

Studies were eligible for inclusion in the present review if their results were obtained from original research. Review articles, systematic reviews, and meta-analyses were excluded. Cohort studies, cross-sectional studies, and case-control studies were included.

Participants’ characteristics

We included studies that were conducted on IBS patients, adult participants, and evaluated both sexes.


Selecting relevant variables for discriminating clusters is very important. The variables included were related to GI symptoms, bowel habits, pain, bloating, psychological disturbances, QoL, anorectal function, colon transit time (CTT), anal pressure waves, cytokines levels, mast cell (MC) numbers, and microbial composition.

Data collection method

The methods of collecting participants’ data or tools for evaluating patients were reviewed: questionnaire, direct interview, data collection on consecutive days, a rectal examination tool, etc.

Data preprocessing methods

Considering that the data obtained from the studies might be different in terms of units or other items, we examined studies to control if they applied standardization and data normalization methods before CA.

Cluster analysis

CA is a group of machine learning algorithms that classify data into homogenous groups with the least similarity to other groups [32]. There are different types of clustering algorithms (Fig. 1). K-means CA and hierarchical cluster analysis (HCA) are the most frequently used [33]. K-means CA is preferable due to its good measurement capability. One of the features of this algorithm is the need to calculate the number of clusters before analysis under the title of K [34]. There are different methods for choosing optimal cluster numbers, for example, BIC, AIC, elbow, etc., in K-means CA. The distance metric is another important feature in K-means CA, which uses Euclidian.

Fig. 1
figure 1

Clustering algorithms

HCA converts a distance matrix of all items’ similarity measurements into a hierarchy of nested groups. In this method, two different approaches are used: agglomerative and divisive [34]. HCA is aiming to group similar objects together based on their attributes and characteristics. It involves constructing a hierarchy of clusters, where each object begins as a separate cluster and is progressively merged with others to create larger clusters. This process continues until all subjects are consolidated into a single cluster or until a predetermined stopping condition is satisfied [14]. Latent class analysis (LCA) is another popular method that is a kind of finite mixture model (FMM). In this method, hidden clusters are uncovered by some predetermined multifactorial feature [35]. LCA estimated the probability of belonging to each latent class for each individual allowing researchers to understand the heterogeneity within a population. By uncovering these latent classes LCA provides insights into the structure and patterns of categorical data [36]. Principal component analysis (PCA) is a method that decreases multi-dimensional data before analysis [37], increases interoperability of the results, and minimizes information bias. PCA does the analysis by using new uncorrelated variables [38].

Cluster validation

One of the most critical steps in CA is the evaluation of the clusters obtained from the analysis. There are some methods for this assessment, such as Silhouette and Davies-Bouldin indexes [28, 39].

Interpretation of the results

The main goal in conducting CA studies is to obtain subgroups and relevant individual characteristics. CA is insufficient in determining the characteristics of clusters and assessing the relationship between different variables. So, after the analysis results are prepared, other methods apply to interpret the results, for instance, using Bayesian inference.


As illustrated in Fig. 2, the database search retrieved 413 records. One hundred sixty-six records were duplicated. We screened 247 discrete records by title and abstract, of which 25 appeared potentially eligible. During full-text reviewing, we excluded 11 articles due to not assessing outcomes of interest [40,41,42,43,44], not using CA [45,46,47], not including IBS patients [48, 49], and not available full-text [50]. Finally, 14 eligible articles were included in this article. The included articles were published between 1995 and 2021.

Fig. 2
figure 2

Study selection

Study design

Eight studies were designed as prospective cohort studies [51,52,53,54,55,56,57,58]. Seven of them recruited two groups of participants [51, 52, 54,55,56,57], however, one of them recruited only one group of participants [53]. The two groups included one group of IBS patients and one group of healthy controls, except for [52], which recruited two independent groups of IBS patients. Five studies were cross-sectional [42, 59,60,61,62]. Among them, two studies had a group of IBS patients [42, 61]. Other studies included two groups of IBS patients [59, 60], except for [62], which recruited one IBS group and one healthy control group. Only one of the studies conducted in 2013 was a randomized clinical trial (RCT) [63], which included patients and healthy controls.

Sample characteristics

Studies mostly included at least 100 participants [42, 51, 52, 55, 57, 59, 60, 62], except for six [53, 54, 56, 58, 61, 63]. The number of participants varied from 52 to 1533 across studies. Three studies did not report the percentage of participants by gender [53, 55, 56]. In six of the other 11 studies, more than 80% of the participants were female [51, 54, 59,60,61,62]. Participants’ age ranged from 17 [52] to 88 [57] years.

Diagnostic criteria

The included studies were conducted in different years and used different criteria for diagnosis. Three of the initial studies used the ROME I criteria [51,52,53, 57], and the next five studies that started in 2006 used the ROME II version [54, 55, 58, 61, 62]. Two studies used only the ROME III version [56, 63], and two of the most recent studies used two different criteria to identify IBS patients [59]. The most recent study conducted in 2021 by Black et al. used both ROME III and ROME IV criteria in order to identify IBS patients. Han et al. used ROME II and ROME III criteria. One of the studies made the diagnosis based only on the opinion of doctors without any use of questionnaires [42].


Table 1 shows that the included studies investigated a wide range of variables. Briefly, seven of these studies clustered patients based on clinical symptoms, QoL, etc. [42, 51, 52, 59,60,61, 63]. Four of the studies investigated the anocolorectal function of patients and clustering based on it [53,54,55, 57]. Two studies evaluated immunological factors such as the level of serum cytokines and the role of MC in the pathophysiology of the disease [56, 62]. Finally, a study was conducted on the intestinal microbial composition of IBS patients [58].

Table 1 Characteristics of IBS cluster analysis articles

Data collection

Various questionnaires were used to collect the necessary data in the field of clinical symptoms, as shown in Table 1. Specialized tools were used to collect other data; for instance, QuinTron Breath Tracker for evaluating exhaled H2 and CH4 [63], manometry for anocolorectal function, Prodimed Le Plessis-Bouchard for CTT, high-sensitivity multiplex assays [62], immunofluorescence, and immunoassays for systemic cytokines and MC characteristics, RT-PCR for gene expression [56], also bioinformatics for microbial composition.

Data preprocessing

Four of the studies did not mention details in this regard [51, 56, 62, 64]. Ragnarsson et al. transformed data to have a mean of 0 and a standard deviation of 1 [52, 53]. Exploratory factor analysis was used in two of the studies [60, 61]. Two other studies normalized data in different ways. Bouchoucha et al. [54] normalized data by subtracting the pressure of the first measured point from the measured values in each experiment; however, Jeffery et al. [58] normalized data by scaling to an intensity of 1 to control for differing numbers of reads. PCA and factor analysis were used in two studies [42, 63]. Mertz et al. [57] standardized data and used unpaired student t-tests. One study used a t-test and a partially overlapping z-test [59].

Determination of cluster numbers

In four of the studies that used the K-mean CA for clustering, the number of clusters was determined based on Euclidian distance [42, 51,52,53]. Two other studies that used K-mean CA for clustering used pseudo-f statistic [55, 57]. Another study used successive solutions that increment the value of k by 1 [61]. Likelihood-based methods were used in two of the articles that were clustered by using LCA. Han et al. [60] used likelihood‐based criteria and model entropy, and Black et al. [59] used the Bayesian information criterion of the log-likelihood (BIC(LL)) to identify the number of clusters. Natural breaks in distance jumps were used before HCA in one of the studies [63]. The remaining four studies did not mention the use of any methods to determine the number of clusters before clustering [54, 56, 62, 63] (Table 2).

Table 2 Methodology of IBS cluster analysis articles

Clustering algorithms

K-means CA is the most commonly used algorithm. Eight articles used the K-means CA algorithm for clustering [42, 51,52,53,54,55, 57, 61]; four studies used HCA [56, 58, 62, 63]; only two studies used the LCA algorithm for clustering [59, 60] (Table 2).

Cluster validation

Six studies used methods for cluster validation. Two studies used cross-validation methods. Black et al. [59] used tenfold cross-validation and Sundin et al. [56] used cross-validation by the Q2 parameter. Bennet et al. [62] used Q2. Three other validation methods were exploratory CA [61], silhouette coefficient [63], and Bonferroni corrected pair-wise comparisons [51].

Interpretation of results

The number of clusters varied from two to seven based on different factors. As shown in Table 2, different methods were used to interpret the results, and find the relationship between different factors. The most prevalent methods were used is as follows: one-way analysis of variance (ANOVA) [51, 53,54,55, 59, 60, 63], Kruskal Wallis test [52, 53, 56, 60, 62], Mann–Whitney test [52,53,54, 56, 58, 62], and χ2 test [51, 57, 59, 60]. Other methods include squared semi-partial correlation [61], spearman correlation coefficient [54, 62], and Pearson correlation coefficient [51, 57, 58]. Eslick et al. [42] described a cluster profile that comprised the mean score per factor per cluster.

Clusters’ profile

Seven articles were based on clinical findings [42, 51, 53, 59,60,61, 63], four articles were based on anocolorectal functions [52, 54, 55, 57], two studies assessed immunological factors in IBS patients [56, 62], and one study was about microbial composition in IBS [58].

Clinical features

As Table 3 shows, seven articles that classified patients based on clinical symptoms are different in several ways, including study design, sample size, diagnostic criteria, clustering algorithms, and findings [42, 51, 52, 59,60,61, 63]. Some of these studies had similar results in terms of the number of clusters and classification of patients into homogenous groups based on clinical symptoms. Two of the earliest studies classified participants into three homogenous groups [52, 53]. They were similar in diagnostic criteria and clustering algorithm and had almost the same sample sizes.

Table 3 Number of clusters, characteristics, and associations

The study by Ragnarsson et al. [52] was completely based on the participants’ statements. There was no significant difference in the number of patients in all three subgroups. They also clustered IBS patients based on pain and bloating. The patients were divided into two groups with almost equal sample size. In the first group, the symptoms were low, whereas, in the second group, the symptoms were high. Guthrie and his colleagues [51] included patients who suffered from a more severe and chronic form of the disease. Most of the patients were fallen into the second group with a low threshold of rectal sensitivity, diarrhea-predominant or intermittent, and low level of mental disorders.

In two studies, four subgroups were emerged. In [60], which is more recent, the sample size was much larger than in [61] and ROME II and ROME III criteria were used to identify patients. More than 75% of patients were classified in the first and second subgroups, with low symptoms and relatively good QoL.

Lackner et al. [61], included patients with moderate to high severity of symptoms. They had to conduct the classification by using LCA due to the small sample size. The highest severity of symptoms was in the third subgroup, which included a smaller number of patients. More than one-third of the patients were labeled as fourth subgroup, with the lowest severity of symptoms and the highest QoL.

Based on the results of two other studies, the patients were divided into seven subgroups. Both of these studies had a significant sample size. In a study conducted in 2004 [42], a quarter of the patients were classified in the diarrhea-predominant group and about 20% of the patients were classified in the undifferentiated group. In [59], the most novel study in this field, all stages of diagnosis of patients and the data collection were online. The second subgroup included the largest number of patients among the subgroups, in which GI symptoms were low, whereas psychological disturbances were high. There was a significant difference between different clusters in terms of age and gender. For instance, cluster one included mostly elderly patients, cluster five included younger patients, and cluster three included more men.

In 2012, an RCT was conducted [63] to assess the occurrence of symptoms after meals. They evaluated GI symptoms, and psychological symptoms and exhaled H2 and CH4 after lactulose diets. They used three types of meals including 4oo ml liquid plus three different doses of lactulose (0–15 g–25 g). GI symptoms and discomfort were assessed at baseline and every 15 min after different test meals. Both lactulose-containing diets increased GI symptoms in IBS patients; however, the lactulose (25 gr) diet discriminated patients from controls more precisely. Ascending CA (wards method) was performed based on the response after a lactulose test meal and consequently, five clusters were obtained which could be divided into two subpopulations, labeled as, high GI symptoms and low GI symptoms. In the high GI subpopulation, both hospital anxiety and depression scale (HADS) and visceral sensitivity index (VSI) were high. As a result, no significant difference was found between GI symptoms, IBS clusters, subpopulation, and exhaled H2, and CH4 in any dose of lactulose.

Anocolorectal function

We reviewed four articles that evaluated the association between IBS and anorectal dysfunction [53,54,55, 57]. They are the same in study design, diagnostic criteria, and clustering algorithm which was K-means CA. These studies consistently identified subgroups based on anorectal function and associated factors. Mertz et al. [57] proved that altered rectal perception is a biological marker of IBS. They evaluated anal perception thresholds and reevaluated 15 patients after three months to identify the correlation between changes in perception thresholds and symptom severity. Finally, they found three subgroups of IBS patients based on eight physiological parameters. Ragnarsson et al. [53] assessed the hypothesis that abdominal symptoms are related to anorectal function in IBS patients. They classified patients based on anorectal functions, bowel habits, pain, and distention. They investigated anorectal function by using manovolumetry before and 40 min after a fatty meal.

K-means CA resulted in three clusters in anorectal functions. It is similar in both pre- and postprandial manovolumetry, except that in postprandial, the first group does not have increased anal canal pressure and has large rectal compliance with no more men than women. In pre-prandial manovolumetry, the third group is more prevalent; however, in postprandial manovolumetry, the second group is more prevalent. Consequently, rectal sensitivity increased after a fatty meal in somehow half of the patients and women are more sensitive. Ragnarsson et al. [52] have also done clustering based on bowel habits, pain, and distention like their previous study which was mentioned earlier in the clinical findings category with the same results. There was no difference between before and after fatty-meal manovolumetry. So, it might be possible to assume that there is no relation between abdominal symptoms and anorectal functions in IBS.

Bouchoucha et al. have done two studies in the field of anorectal function. One study was about anal pressure waves [57] and the other one was CTT [59]. Participants with delays in manometry or CTT were excluded from the studies. It has been caused to eliminate constipation-predominant patients and is a manifestation of selection bias. In the first study [57], manometry has been done by a small balloon tube in two states of rest and distention. Three clusters of anal pressure waves have resulted from K-means CA in both rest and distention states. In the distention state, ultra-slow waves increase in both groups; however, slow waves increase merely in IBS patients, and simple waves decrease in control groups. Ultra-slow waves in IBS patients are not significantly different neither at rest nor in the distention state. These waves are just a manifestation of increased activity of the internal sphincter [38, 39]. In the second study, the association between CTT and IBS was investigated. They used a previously identified technique using radio-opaque markers [40,41,42]. Three parameters were assessed, including CTT, distribution of markers, and diffusion kinetics. To do this, three different parts of the colon, including the right colon, the left colon, and the rectosigmoid area, were assessed. They identified four clusters in IBS patients and three clusters in healthy controls. In both groups, cluster one has a delay in the right colon, cluster two has a delay in the left colon, and cluster three has a delay in the rectosigmoid area. In the patient’s group cluster four is defined as no marker seen in the plain film. In the IBS group, a higher percentage of females is in cluster two and the number of males was higher equally in both clusters one and two; however, in healthy controls, more females were in cluster two and more males were in cluster one. Total CTT was more in the second cluster in both sexes. Generally, IBS patients have longer CTT than controls, and females have longer CTT. No correlation was found between CTT and IBS.

Immunological findings

Two of the studies assessed the correlation between IBS and aggregation of colonic MCs and the level of serum cytokines. The results were diverse, with some studies identifying subgroups based on MC characteristics and cytokine levels.

Sundin et al. [38] intended to discover the pathophysiology of IBS. They determined the mucosal MC characteristics and their proximity to nerves, fecal serine protease activity, symptoms, visceral sensitivity, and expression of epithelial barrier genes. They measured the MC using a method previously published [39]. They analyzed data by HCA and identified two subgroups. One subgroup, MC High, has higher rates of MC and proximity to nerves, and, the second, MC Low, which is the opposite of the previous subgroup. A higher percentage of patients was classified in MC Low subgroup. Different subgroups of IBS could not be distinguished based on symptoms, visceral sensitivity, gene expression, and fecal protease activity. MC numbers and location sound to not have any role in the pathophysiology of IBS.

The role of cytokines and their correlation with IBS was investigated by Bennet et al. [40], who assessed pro-inflammatory factors, including IL1B, IL8, IL6, TNF, and IL10 [41, 42]. They obtained four clusters. Most of the patients and controls were in the second cluster. The first cluster had the lowest level of cytokines, and the fourth cluster had the highest level of cytokines. In all of the clusters, TNF had the highest level in comparison with other cytokines except in the second cluster, in which, IL8 had the highest level. Finally, they identified IBS patients have a higher level of cytokines in comparison with healthy controls. However, this issue cannot differentiate patients from healthy controls and there is no correlation between cytokine levels and IBS symptoms.

Microbiome composition

There was one study about microbiome composition in IBS patients [58]. This study found IBS patients have lower microbial diversity compared with healthy controls. They analyzed data by HCA. The results showed less than half of the patients were like normal controls, and the others were classified into two subgroups. The first was characterized by a diminished diversity and a median of 44 species and the second by increased diversity and a median of 53 species. There were some associations between microbial composition and clinical or physiological features, immunological alterations, and low-grade inflammation in IBS patients.

The CA.


Different approaches exist for addressing the heterogeneity grouping problem, including CA, latent class analysis (LCA), and mixture modeling. CA is advantageous in identifying distinct subgroups and providing visual representations but lacks standardized methods and may overlook latent factors [14]. LCA incorporates latent factors and allows for hypothesis testing but assumes conditional independence and requires large sample sizes [65]. Mixture modeling offers flexibility in distributional assumptions and handles missing data but requires complex model estimation and interpretation [66]. The choice of approach depends on the specific research question, data characteristics, and the balance between interpretability and flexibility needed in the analysis. CA is a group of unique machine learning algorithms that identifies different homogenous subgroups in datasets. Due to their unique features, these algorithms are now increasingly used in studies for various purposes [58]. In this review, we included 14 articles that used CA in IBS patients and obtained different subgroups of IBS patients based on different factors. The number of clusters obtained from these studies varied from two to seven. Seven studies were based on clinical symptoms in IBS patients. They selected different variables. The details of clusters are summarized in Table 3. K-means CA was used in four of these studies. In four of these studies, unlike in the past, the classification of patients was based on the severity of symptoms rather than the form of bowel habits, and most of the participants were in clusters with low GI symptoms severity and good QoL. Some specific findings and associations were obtained from these studies (Table 3). Men have lower symptoms in comparison with women, and GI symptoms are based on stool subtypes [59]. However, Ragnarsson et al. [52] found completely contradictory results. They identified that there is no association between sex and IBS subgroups and mentioned the degree of pain and bloating does not correlate with the type of bowel habits. Meals can induce symptoms in IBS patients; however, there is no significant association between meals and IBS clusters.

Four studies evaluated anocolorectal function in IBS patients. They used K-means CA. None of them measured the validity of the analysis results. The study conducted by Ragnarsson et al. [53] sub-grouped patients based on rectal manometry results. In addition to this, they subgrouped patients based on bowel habits, pain, and, bloating. They identified three clusters based on manovolumetry, three clusters of bowel habits, and two clusters of pain and bloating. The main results of this study include: the anorectal function does not correlate with symptoms; changes in rectal sensitivity do not associate with symptoms; rectal sensitivity is higher in women; and rectal compliance is higher in men.

Bouchoucha et al. [54] investigated different types of anal pressure waves in IBS patients. Three clusters were obtained, including ultra-slow waves, slow waves, and simple waves. They found that ultra-slow waves associate with high anal pressure, anal hypertonia, and, dyschezia. Ultra-slow waves in IBS patients are not significantly different from healthy controls. Increased activity of anal sphincters is associated with ultra-slow waves. Altered rectal perception is a biological marker of IBS and its change is associated with symptom severity [57]. IBS patients are different from healthy controls in CTT, marker distribution of retention, and diffusion coefficient. Most IBS patients have delayed CTT in the left colon [55]. We reviewed two studies in the field of immunological findings of IBS patients that utilized HCA. They evaluated systemic cytokine levels, MC number, and location in intestinal epithelium to identify its role in IBS pathophysiology. They found no strong associations between immune activation and IBS symptoms. MC numbers and location do not correlate with the pathophysiology of IBS, symptoms, and subtypes. A study evaluated the gut microbial composition of patients and found that microbial composition correlates with IBS and is associated with immunological alteration and low-grade inflammation.

These studies resulted in inconsistent findings and they are not comparable. They are different in sample size, diagnostic criteria, methodology, etc.

The CA conducted in the studies included in this systematic review, demonstrated moderate consistency with clinical criteria in several aspects of irritable bowel syndrome (IBS). These subgroups were consistent with certain clinical criteria such as symptom severity, bowel habits, pain, and physiological parameters. However, there were also instances where cluster analysis did not show a strong correlation with clinical criteria, indicating the complexity and heterogeneity of IBS.

In terms of clinical symptoms clustering, some studies found consistent results, indicating the presence of homogenous patient groups. However, the findings related to anorectal function clustering were consistently supportive, suggesting the potential for tailored treatments based on symptom profiles and associated factors. On the other hand, the immune feature clustering studies yielded inconsistent results, highlighting the need for further exploration and validation. Given the variability in findings, combining the most clinically consistent variables may improve patient stratification and guide personalized treatment approaches. However, it is essential to thoroughly assess the reliability, validity, and generalizability of each variable before their combination.

Some limitations of these studies are summarized in Table 4. The major limitation was the small sample size. The sample size is a crucial factor in studies and as larger the sample size is, the results can be more generalizable. Also, some of the studies had selection bias in some way. For instance, most of the participants were related to one center and were in the severe spectrum of the disease, so the results cannot be generalized to the whole spectrum of patients. Most of the published studies applicated subjective data.

Table 4 Limitations and future suggestions

The fact that emerged from this review is that IBS is not merely a GI disorder, but it is a disease that affects many things. The most important effect is a psychological disturbance. Also, changes in the diversity of the intestinal microbiome, such as an increase of Firmicutes-associated taxa and depletion of Bacteroidetes-related taxa, and also aberrations in cytokines can be underlying mechanisms. In summary, CA is a type of unsupervised learning technique that eliminates the need for experts to spend time on manual labeling, making it a convenient method. Nevertheless, it is crucial to acknowledge that CA is greatly influenced by hyperparameters, including the number of clusters and the random initialization of cluster centers. The arbitrary selection of these parameters can result in different outcomes, even when employing the same algorithm. The utilization of CA has provided valuable insights into the heterogeneity of IBS based on clinical features, anocolorectal functions, immunological factors, and microbiome composition. Although there were variations in the clustering results, some consistent patterns emerged. However, no particular clustering method or k-means cluster number method consistently outperforms others in terms of consistency with clinical criteria. Further research is needed to explore the optimal clustering approach for accurately capturing the clinical heterogeneity in IBS. Other suggestions for IBS clustering are proposed in Table 4. It is better to have a larger sample size, normally distributed participants in terms of gender and other factors, enroll participants from different geographical locations, and based on ROME IV criteria. As the symptoms are temporary, it is better to conduct studies that follow patients over time to identify the exact pathophysiology of the disease by measuring biomarkers or examining the microbial composition of the intestines, etc., to achieve targeted treatment of the disease. These investigations might be able to reduce additional treatment costs and, the burden of the disease.


We conclude that unlike the previous classifications, which were based exclusively on bowel habits, CA focuses more on the severity of all symptoms in IBS. Overall, most patients have low severity of clinical symptoms and a good QoL. The clustering based on colorectal function has shown that rectal sensitivity increases in most patients and this can be used as a biological indicator in IBS. The level of serum immunological markers increases moderately in IBS and the diversity of the intestinal microbiome decreases. The number of IBS clusters is variable based on different factors and according to the chosen methodology. As a result, we cannot express a definite number of clusters. Considering that, knowing the different clusters based on different factors would help us to know the disease more accurately, and understand its pathophysiology more precisely, further studies should be done with a similar methodology to be comparable and based on the recommendations mentioned before. So, we hopefully will be able to treat IBS patients in a more targeted manner in the future.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.


  1. Yang W, Yang X, Cai X, Zhou Z, Yao H, Song X, Zhao T, Xiong P. The Prevalence of irritable bowel syndrome among chinese university students: a systematic review and meta-analysis. Front Public Health. 2022;10:864721.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Defrees DN, Bailey J. Irritable bowel syndrome: epidemiology, pathophysiology, diagnosis, and treatment. Prim Care. 2017;44:655–71.

    Article  PubMed  Google Scholar 

  3. Saha L. Irritable bowel syndrome: pathogenesis, diagnosis, treatment, and evidence-based medicine. World J Gastroenterol. 2014;20:6759–73.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Kim YS, Kim N. Sex-gender differences in irritable bowel syndrome. J Neurogastroenterol Motil. 2018;24:544–58.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Black CJ, Ford AC. Global burden of irritable bowel syndrome: trends, predictions and risk factors. Nat Rev Gastroenterol Hepatol. 2020;17:473–86.

    Article  PubMed  Google Scholar 

  6. Jayaraman T, Wong RK, Drossman DA, Lee YY. Communication breakdown between physicians and IBS sufferers: What is the conundrum and how to overcome it? J R Coll Phys Edinb. 2017;47:138–41.

    Article  CAS  Google Scholar 

  7. Holtmann GJ, Ford AC, Talley NJ. Pathophysiology of irritable bowel syndrome. Lancet Gastroenterol Hepatol. 2016;1:133–46.

    Article  PubMed  Google Scholar 

  8. Drossman DA, Hasler WL. Rome IV—functional GI disorders: disorders of gut-brain interaction. Gastroenterology. 2016;150:1257–61.

    Article  PubMed  Google Scholar 

  9. Barberio B, Houghton LA, Yiannakou Y, Savarino EV, Black CJ, Ford AC. Symptom stability in rome IV vs rome III irritable bowel syndrome. Am J Gastroenterol. 2021;116:362–71.

    Article  PubMed  Google Scholar 

  10. Bajor A, Törnblom H, Rudling M, Ung KA, Simrén M. Increased colonic bile acid exposure: a relevant factor for symptoms and treatment in IBS. Gut. 2015;64:84–92.

    Article  CAS  PubMed  Google Scholar 

  11. Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess. 2001;5(33):1–56.

    Article  CAS  PubMed  Google Scholar 

  12. Tripepi G, Jager KJ, Dekker FW, Zoccali C. Stratification for confounding – part 1: the Mantel-Haenszel formula. Nephron Clin Pract. 2010;116(4):c317–21.

    Article  PubMed  Google Scholar 

  13. Zhou Y, Yuan A, Tan MT. Identification of subgroups via partial linear regression modeling approach. Biom J. 2022;64(3):506–22.

    Article  PubMed  Google Scholar 

  14. Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323.

    Article  Google Scholar 

  15. Liao M, Li Y, Kianifard F, Obi E, Arcona S. Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis. BMC Nephrol. 2016;17:1–14.

    Article  Google Scholar 

  16. Dilts D, Khamalah J, Plotkin A. Using cluster analysis for medical resource decision making. Med Decis Mak Int J Soc Med Decis Mak. 1995;15:333–46.

    Article  CAS  Google Scholar 

  17. Mclachlan GJ. Cluster analysis and related techniques in medical research. Stat Methods Med Res. 1992;1:27–48.

    Article  CAS  PubMed  Google Scholar 

  18. Wu X, Kumar V, Ross QJ, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2008;14:1–37.

    Article  Google Scholar 

  19. Windgassen S, Moss-Morris R, Goldsmith K, Chalder T. The importance of cluster analysis for enhancing clinical practice: an example from irritable bowel syndrome. J Ment Health. 2018;27:94–6.

    Article  PubMed  Google Scholar 

  20. Song S, Jason LA. A population-based study of chronic fatigue syndrome (CFS) experienced in differing patient groups: an effort to replicate Vercoulen et al.’s model of CFS. J Mental Health. 2005;14:277–89.

    Article  CAS  Google Scholar 

  21. Taylor LAJ, Michae R. Evaluating latent variable models of functional somatic distress in a community-based sample. J Mental Health. 2001;10:335–49.

    Article  Google Scholar 

  22. Clatworthy J, Buick D, Hankins M, Weinman J, Horne R. The use and reporting of cluster analysis in health psychology: a review. Br J Health Psychol. 2005;10(Pt 3):329–58.

    Article  PubMed  Google Scholar 

  23. Matthew R, Weir M. Implications of a health lifestyle and medication analysis for improving hypertension control. Arch Intern Med. 2000;160(4):481–90.

    Article  Google Scholar 

  24. Dilts D. Using cluster analysis for medical resource decision making. Med Decis Mak. 1995;15(4):333–4.

    Article  CAS  Google Scholar 

  25. Michael BE. Cluster analysis and display of genome-wide expression patterns. Proc National Acad Sci. 1998;95(25):14863–8.

    Article  Google Scholar 

  26. Milligan GW, Cooper MC. An examination of procedures for determining the number of clusters in a data set. Psychometrika. 1985;50(2):159–79.

    Article  Google Scholar 

  27. Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc. 2002;97(458):611–31.

    Article  Google Scholar 

  28. Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst. 2001;17:107–45.

    Article  Google Scholar 

  29. Dziak JJ, Lanza ST, Tan X. Effect size, statistical power and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Struct Equ Model. 2014;21(4):534–52.

    Article  Google Scholar 

  30. Tekle FB, Gudicha DW, Vermunt JK. Power analysis for the bootstrap likelihood ratio test for the number of classes in latent class models. Adv Data Anal Classif. 2016;10(2):209–24.

    Article  Google Scholar 

  31. Windgassen S, Moss-Morris R, Goldsmith K, Chalder T. The importance of cluster analysis for enhancing clinical practice: an example from irritable bowel syndrome. J Ment Health. 2018;27(2):94–6.

    Article  PubMed  Google Scholar 

  32. Akopov AS, Moskovtsev AA, Dolenko SA, and Savina GD. Cluster analysis in biomedical researches. 2013.

  33. Zhao W, Zou W, Chen JJ. Topic modeling for cluster analysis of large biological and medical datasets. BMC Bioinform. 2014;15:S11.

    Article  Google Scholar 

  34. Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LDF, Rodrigues FA. Clustering algorithms: a comparative approach. PLoS ONE. 2019;14:e0210236.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Naldi L, Cazzaniga S. Research techniques made simple: latent class analysis. J Invest Dermatol. 2020;140:1676-1680.e1671.

    Article  CAS  PubMed  Google Scholar 

  36. Vermunt JK, Magidson J. Latent class cluster analysis. In: McCutcheon AL, Hagenaars JA, editors. Applied latent class analysis. Cambridge: Cambridge University Press; 2002. p. 89–106.

    Chapter  Google Scholar 

  37. Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901;2:559–72.

    Article  Google Scholar 

  38. Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc A Math Phys Eng Sci. 2016;374:20150202.

    Article  Google Scholar 

  39. Cheng D, Zhu Q, Huang J, Wu Q, Yang L. A Novel cluster validity index based on local cores. IEEE Trans Neural Netw Learn Syst. 2019;30:985–99.

    Article  PubMed  Google Scholar 

  40. Quigley BM. Tu1612 – centralized sensitivity phenotype as a predictor of outcome to cognitive behavioral therapy for irritable bowel syndrome. Gastroenterology. 2019;156(6):S-1061.

    Google Scholar 

  41. Black CJ, Yiannakou Y, Guthrie E, West R, Houghton LA, Ford AC. Longitudinal follow-up of a novel classification system for irritable bowel syndrome: natural history and prognostic value. Aliment Pharmacol Ther. 2021;53:1126–37.

    Article  PubMed  Google Scholar 

  42. Eslick GD, Howell SC, Hammer J, Talley NJ. Empirically derived symptom sub-groups correspond poorly with diagnostic criteria for functional dyspepsia and irritable bowel syndrome. A factor and cluster analysis of a patient sample. Aliment Pharmacol Ther. 2004;19:133–40.

    Article  CAS  PubMed  Google Scholar 

  43. Zhou QQ, Fillingim RB, Riley JL, Verne GN. Thermal hypersensitivity in a subset of irritable bowel syndrome patients. World J Gastroenterol. 2009;15:3254–60.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Molinder H, Agréus L, Kjellström L, Walter S, Talley NJ, Andreasson A, Nyhlin H. How individuals with the irritable bowel syndrome describe their own symptoms before formal diagnosis. Upsala J Med Sci. 2015;120:276–9.

    Article  PubMed  PubMed Central  Google Scholar 

  45. van Tilburg MA. Tu1810 distinct subtypes of irritable bowel syndrome are defined by psychological symptoms, visceral pain sensitivity, stool consistency, and motility. 2016.

  46. Talley NJ, Holtmann G, Agreus L, Jones M. Gastrointestinal symptoms and subjects cluster into distinct upper and lower groupings in the community: a four nations study. Am J Gastroenterol. 2000;95(6):1439–47.

    Article  CAS  PubMed  Google Scholar 

  47. Quigley BM. Mo1621 beyond pain intensity: patient reported outcomes (PRO) based on pain quality profiles in irritable bowel syndrome patients. Gastroenterology. 2016;150(4):S733.

    Article  Google Scholar 

  48. Koloski NA, Jones M, Young M. Differentiation of functional constipation and constipation predominant irritable bowel syndrome based on Rome III criteria: a population-based study. Aliment Pharmacol Ther. 2015;41(9):856–66.

    Article  CAS  PubMed  Google Scholar 

  49. Camilleri M, Carlson P, Valentin N, Acosta A, O’Neill J, Eckert D, Dyer R, Na J, Klee EW, Murray JA. Pilot study of small bowel mucosal gene expression in patients with irritable bowel syndrome with diarrhea. Am J Physiol Gastrointest Liver Physiol. 2016;311:G365–76.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Chen J. Somatosensory profiles differentiate pain and psychophysiological symptoms among young adults with irritable bowel syndrome: a cluster analysis. Clin J Pain. 2022;38(7):492–501.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Guthrie E, Creed F, Fernandes L, Ratcliffe J, Van Der Jagt J, Martin J, Howlett S, Read N, Barlow J, Thompson D, et al. Cluster analysis of symptoms and health seeking behaviour differentiates subgroups of patients with severe irritable bowel syndrome. Gut. 2003;52:1616–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Ragnarsson G, Bodemar G. Division of the irritable bowel syndrome into subgroups on the basis of daily recorded symptoms in two outpatient samples. Scand J Gastroenterol. 1999;34:993–1000.

    Article  CAS  PubMed  Google Scholar 

  53. Ragnarsson G, Hallböök O, Bodemar G. Abdominal symptoms are not related to anorectal function in the irritable bowel syndrome. Scand J Gastroenterol. 1999;34:250–8.

    Article  CAS  PubMed  Google Scholar 

  54. Bouchoucha M, Choufa T, Faye A, Berger A, Arsac M. Anal pressure waves in patients with irritable bowel syndrome. Dis Colon Rectum. 1999;42:1487–96.

    Article  CAS  PubMed  Google Scholar 

  55. Bouchoucha M, Devroede G, Dorval E, Faye A, Arhan P, Arsac M. Different segmental transit times in patients with irritable bowel syndrome and “normal” colonic transit time: is there a correlation with symptoms? Tech Coloproctol. 2006;10:287–96.

    Article  CAS  PubMed  Google Scholar 

  56. Sundin J, Nordlander S, Eutamene H, Alquier-Bacquie V, Cartier C, Theodorou V, Le Nevé B, Törnblom H, Simrén M, Öhman L. Colonic mast cell numbers, symptom profile, and mucosal expression of elements of the epithelial barrier in irritable bowel syndrome. Neurogastroenterol Motil Off J Eur Gastrointest Motil Soc. 2019;31:e13701.

    Article  Google Scholar 

  57. Mertz H, Naliboff B, Munakata J, Niazi N, Mayer EA. Altered rectal perception is a biological marker of patients with irritable bowel syndrome. Gastroenterology. 1995;109(1):40–52.

    Article  CAS  PubMed  Google Scholar 

  58. Jeffery IB, Otoole PW, Öhman L, Claesson MJ, Deane J, Quigley EM, Simrén M. An irritable bowel syndrome subtype defined by species-specific alterations in faecal microbiota. Gut. 2012;61(7):997–1006.

    Article  PubMed  Google Scholar 

  59. Black CJ, Yiannakou Y, Guthrie EA, West R, Houghton LA, Ford AC. A Novel method to classify and subgroup patients with IBS based on gastrointestinal symptoms and psychological profiles. Am J Gastroenterol. 2021;116:372–81.

    Article  PubMed  Google Scholar 

  60. Han CJ, Pike K, Jarrett ME, Heitkemper MM. Symptom-based latent classes of persons with irritable bowel syndrome. Res Nurs Health. 2019;42:382–91.

    Article  PubMed  Google Scholar 

  61. Lackner JM, Jaccard J, Baum C. Multidomain patient-reported outcomes of irritable bowel syndrome: exploring person-centered perspectives to better understand symptom severity scores. Value Health. 2013;16:97–103.

    Article  PubMed  Google Scholar 

  62. Bennet SMP, Palsson O, Whitehead WE, Barrow DA, Törnblom H, Öhman L, Simrén M, van Tilburg MAL. Systemic cytokines are elevated in a subset of patients with irritable bowel syndrome but largely unrelated to symptom characteristics. Neurogastroenterol Motil Off J Eur Gastrointest Motil Soc. 2018;30:e13378.

    Article  CAS  Google Scholar 

  63. Nevé BL, Posserud I, Böhn L, Guyonnet D, Rondeau P, Tillisch K, Naliboff BD, Mayer EA, Simren M. 999 a combined nutrient and lactulose challenge test allows symptom-based clustering of patients with irritable bowel syndrome unrelated to exhaled gas and ROME III subtype. Gastroenterology. 2012;142:S-177-S-177.

    Article  Google Scholar 

  64. Bouchoucha M, Devroede G, Dorval E, Faye A, Arhan P, Arsac M. Different segmental transit times in patients with irritable bowel syndrome and “normal” colonic transit time: Is there a correlation with symptoms? Tech Coloproctol. 2006;10(4):287–96.

    Article  CAS  PubMed  Google Scholar 

  65. Hagenaars JA, McCutcheon AL. Applied latent class analysis. Cambridge: Cambridge University Press; 2002.

    Book  Google Scholar 

  66. McLachlan GJ, Peel D. Finite mixture models. New Jersey: Wiley; 2004.

    Google Scholar 

Download references


Not applicable.



Author information

Authors and Affiliations



D.Z. conceptualized the study and prepared the original draft. A.S. conceptualized the study and revised the manuscript. N.R. appraised the manuscript and supervised the project.

Corresponding author

Correspondence to Nima Rezaei.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have co conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. PRISMA 2020 Checklist.

Additional file 2

. Search strategy.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zarei, D., Saghazadeh, A. & Rezaei, N. Subtyping irritable bowel syndrome using cluster analysis: a systematic review. BMC Bioinformatics 24, 478 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: