In silico Genome-Wide Prediction and Comparative Analysis of Clinical Isolates of Mycobacterium tuberculosis Pangenome: Region Wise

Habiba, Umme

DSpace Home
→
E-Theses
→
SINES
→
Bioinformatics
→
MS
→
View Item

dc.contributor.author	Habiba, Umme
dc.date.accessioned	2021-09-15T09:24:40Z
dc.date.available	2021-09-15T09:24:40Z
dc.date.issued	2021-09-06
dc.identifier.other	RCMS003280
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/26041
dc.description.abstract	Tuberculosis (TB) has surpassed HIV as the leading infectious disease killer globally since 2014. The pathogen Mycobacterium tuberculosis (Mtb) contains ~4,000 genes that can account for almost 90% of the genome. Many global comparative studies on Mtb whole genome sequenced files have been conducted to elucidate the core, accessory, and strain-specific genome. However, it is still a noticeable edge to perform detailed pangenome analysis on several Asian strains including Pakistan. The major function of these studies was to focus on the generality/individuality of strains and gene content along with their evolutionary trends. Here we utilized a pangenomic analysis of 40 Mtb genomes to address these questions. EDGAR platform has become one of the most established software tools in the field of comparative genomics. These Mtb genomes are specifically selected from the Asian strains and collected from the National Center for Biotechnology Information (NCBI) to perform the variation and evolution studies. We identified 49.2% of the core genome with 2809 genes, 38.5 % dispensable genome with 2196 genes, and the singleton genome with 12.8 % with 704 genes. The translated CDS are involved in membrane and repair proteins with conserved hypothetical proteins. We also observed strain-specific genes for 40 Mtb strains comparing it with Mtb H37Rv in EDGAR. We identified a pan vs core developmental plot to indicate the evolutionary trend and variation history among Mtb strains. The trend for pan and core genes was vice versa. A phylogenetic tree is constructed using a multiple sequence alignment tool (MUSCLE) and EDGAR built-in package PHYLIP to find the intra-species evolutionary relationships and variation. EDGAR offers web-based interface with an independent user interface. Furthermore, we have identified the common core virulent and unique genes for Pakistani strains. For common core virulent genes identification, genes from Mtb H37Rv, Virulence factor database (VFDB), and Database of essential genes (DEG) are retrieved. Genes are loaded to the RAST server to find out the sequence similarities of local strains with reference Mtb H37Rv. EDGAR helps to find out the strain-specific genes for selected genomes. We identified 72 strain-specific genes for the Mtb SWL PK and 100 genes for the Mtb MNPK. Further investigation of the 40 Mtb strains is performed for functional annotation through KOfamKOALA to get better insights about biological, cellular, and metabolic processes involvement in disease pathogenicity. This study reflects that the variation in gene content can drive potential biomarkers for many sequenced Mtb strains from different locations.	en_US
dc.description.sponsorship	Dr. Rehan Zafar Paracha	en_US
dc.language.iso	en_US	en_US
dc.publisher	RCMS NUST	en_US
dc.subject	silico Genome, Wide Prediction, Clinical Isolates, Mycobacterium tuberculosis Pangenome.	en_US
dc.title	In silico Genome-Wide Prediction and Comparative Analysis of Clinical Isolates of Mycobacterium tuberculosis Pangenome: Region Wise	en_US
dc.type	Thesis	en_US