NUST Institutional Repository

Development of Machine Learning Disease Prediction Model to Analyse the Gut Metagenome in Disease and Control Samples

Show simple item record

dc.contributor.author Parveen, Haleema
dc.date.accessioned 2024-09-03T11:02:19Z
dc.date.available 2024-09-03T11:02:19Z
dc.date.issued 2024
dc.identifier.other 401709
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/46318
dc.description.abstract Gut metagenome refers to the genetic material of microorganisms living in the intestine. Bacterial relative abundance may provide valuable insights into the role of dysbiosis in different diseases. Considering the complexity and high dimensionality of metage nomic data, an increasing tendency in the use of machine learning is being observed in metagenomics research. All the previous studies utilized MetaPhlAn2 for metagenomic analysis. But it is demanded to perform the metagenomic analysis using the tools con taining upgraded sequences of bacterial species. So, in this study, MetaPhlAn4 which harbours the latest database was utilized for relative abundance analysis for a more accurate outcome. Moreover, as most of the previous studies consider binary classi fication between a disease and healthy state, there is no previous study focusing on multi-class disease prediction using bacterial species relative abundance data. In this study, analysis on a combined set of diseases was performed to develop a multi-class disease classification model to predict from among multiple disorders. This would help avoid false predictions that could be made if the model is not trained on the datasets of the disease to which the sample belongs. Machine learning model was built using Random Forest and SVM while feature selection was performed using RFE (Recursive Feature Elimination) and Lasso CV approaches. As a result of metagenomic relative abundance analysis, some bacterial species abundant in the diseased states in the ref erence studies showed the same trend in our study, but in this study, some SGBs and GGBs were also highly abundant in some diseases which were never reported before. Furthermore, some bacterial species such as Escherichia coli some Streptococcus and Veillonella species showed strong association with the disease states by showing higher relative abundance in multiple datasets. In regards to the multi-class disease prediction model, the highest accuracy (72 %) was achieved by SVM using Lasso CV. More data for different diseases can be augmented to refine this model and to enable it to detect more diseases for an effective practical application for disease predictions. en_US
dc.description.sponsorship Dr. Rehan Zafar Paracha en_US
dc.language.iso en_US en_US
dc.publisher School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST) en_US
dc.subject Metagenomics, Machine learning, Disease prediction, Random Forest, Dysbiosis, Eubiosis en_US
dc.title Development of Machine Learning Disease Prediction Model to Analyse the Gut Metagenome in Disease and Control Samples en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [159]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account