dc.contributor.author |
Parveen, Haleema |
|
dc.date.accessioned |
2024-09-03T11:02:19Z |
|
dc.date.available |
2024-09-03T11:02:19Z |
|
dc.date.issued |
2024 |
|
dc.identifier.other |
401709 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/46318 |
|
dc.description.abstract |
Gut metagenome refers to the genetic material of microorganisms living in the intestine.
Bacterial relative abundance may provide valuable insights into the role of dysbiosis
in different diseases. Considering the complexity and high dimensionality of metage nomic data, an increasing tendency in the use of machine learning is being observed in
metagenomics research. All the previous studies utilized MetaPhlAn2 for metagenomic
analysis. But it is demanded to perform the metagenomic analysis using the tools con taining upgraded sequences of bacterial species. So, in this study, MetaPhlAn4 which
harbours the latest database was utilized for relative abundance analysis for a more
accurate outcome. Moreover, as most of the previous studies consider binary classi fication between a disease and healthy state, there is no previous study focusing on
multi-class disease prediction using bacterial species relative abundance data. In this
study, analysis on a combined set of diseases was performed to develop a multi-class
disease classification model to predict from among multiple disorders. This would help
avoid false predictions that could be made if the model is not trained on the datasets
of the disease to which the sample belongs. Machine learning model was built using
Random Forest and SVM while feature selection was performed using RFE (Recursive
Feature Elimination) and Lasso CV approaches. As a result of metagenomic relative
abundance analysis, some bacterial species abundant in the diseased states in the ref erence studies showed the same trend in our study, but in this study, some SGBs and
GGBs were also highly abundant in some diseases which were never reported before.
Furthermore, some bacterial species such as Escherichia coli some Streptococcus and
Veillonella species showed strong association with the disease states by showing higher
relative abundance in multiple datasets. In regards to the multi-class disease prediction
model, the highest accuracy (72 %) was achieved by SVM using Lasso CV. More data
for different diseases can be augmented to refine this model and to enable it to detect
more diseases for an effective practical application for disease predictions. |
en_US |
dc.description.sponsorship |
Dr. Rehan Zafar Paracha |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST) |
en_US |
dc.subject |
Metagenomics, Machine learning, Disease prediction, Random Forest, Dysbiosis, Eubiosis |
en_US |
dc.title |
Development of Machine Learning Disease Prediction Model to Analyse the Gut Metagenome in Disease and Control Samples |
en_US |
dc.type |
Thesis |
en_US |