Abstract:
Arespiratorydiseasereferstoaconditionthataffectstheorgansinvolvedinrespiration. The lungs are the central part of the respiratory system. Advancements in sequencing technology have revealed that respiratory tract primarily lungs consist of unique and diverse microbes. The alteration in the balance of these microbes have been found in different respiratory diseases e.g. Chronic Obstructive Pulmonary Disease (COPD), asthma, Tuberculosis (TB), lung cancer, pneumonia and SARS-CoV. Globally, pulmonary infections continue to be a leading source of mortality with COPD about 3 million deaths annually, asthma affecting about 334 million people, lung cancer killing 1.6 million people and TB infecting 10 million cases worldwide acquiring great concern to understand the composition and role of microbial communities in respiratory diseases. This study utilized the two secondary 16S amplicon datasets to analyze and identify the bacterial compositions mainly in COPD and SARS-CoV. The outcome has given the taxonomic classification and relative abundance of micro-organisms in lungs during different disease states. Moreover, the four shotgun datasets are also analyzed to provide detail insight into microbial content, identifying different bacterial species and assessing the observed differences during diseased and healthy states in COPD and SARS-CoV, lung cancer and TB disorders. The key bacterial phylum determined by the analysis are Bacteroidetes, Firmicutes and Proteobacteria with different families and crucial bacterial species with their relative abundances. The discriminating phylum in lung cancer are Actinobacteria and Fusobacteria are distinct phylum in lung cancer and crucial species are Halomonas-sp-LBP4, Campylobacter-jejuni and Haemophilus-influenzae while Candidatus-Saccharibacteria is discrimated with significantspeciesareNeisseria-subflavaandPrevotella-melaninogenicainTBsequencesafter performing metagenome analysis. In COPD the important species are Haemophilusinfluenzae and Staphylococcus-aureus and COVID-19 has Staphylococcus-epidermidis, Malassezia-restricta and Corynebacterium-propinquum. Amplicon analysis shows that Fusobacteria is also present and Staphylococcaceae, Pseudomonaceae and Flavobacteriaceae are identified in both amplicons and shotgun analysis. Additionally, using the taxonomic classification tables of bacterial species with their relative abundances in specific disorders two sophisticated machine learning models are generated to classify the bacteria into diseased and control also providing information about the relative abundances present in the feature data. These models are trained on each disease dataset and then one model with all combined datasets. All models gives the good accuracies and potential to categorize the microbial species precisely. The COVID-19 datasets has high accuracies among all datasets with 94% in RF and 88% in SVM. These procedures give valuable insight into the understanding of respiratory microbe’s composition and patterns in infected and control states. These findings lead to the basis of further comprehensive and valuable studies focusing on composition and functional profiling of respiratory microbiota and investigating a better way to cure these painful disorders.