Abstract:
The physiopathology of Irritable Bowel Syndrome (IBS) is complex and multidimensional that has continued to baffle both researchers and clinicians. IBS has long been
correlated with genetic factors, indicating a hereditary element and the efforts to identify
these genetic risk factors for IBS have been restricted and inconclusive.
In recent years, there has been considerable interest in the application of artificial intel ligence methodologies to the study of IBS. The aim of our study is to examine the UK
BioBank, which is the largest database in the UK, in order to determine the endophe notypes of multidimensional patient data. This multi-dimensional clinical data contains
information about a variety of factors like gastrointestinal indications, co-morbidities,
demographics, physiology, and psychophysiology.
Application of exploratory data analytics techniques on the data, and key variables
such as age, BMI, and sex, reveals important insights into participant demographics
and health patterns. Feature engineering techniques, including one-hot encoding and
principal component analysis, are employed to handle high dimensional data and extract
informative features for differentiating between IBS patients and the control group.
This research focused on making use of unsupervised learning algorithms in order to find
complex relations between different features in the data. By rigorously selecting ICD-10
codes as features and categorizing them into different groups, a concise subset of comorbidities strongly associated with IBS is identified. Cluster analysis and visualization
further reveal hidden pattern and relationships among these comorbidities and results
highlight the superior performance of the Kmeans clustering model.
The findings of this research endeavour will enhance our comprehension of IBS and
subsequently elevate the efficacy of treatment through customized medical interventions.