dc.description.abstract |
TB, Tuberculosis is a deadly infectious disease caused by the bacteria mycobacterium tuberculosis According to an estimate, one third of the world's population is infected with TB and new infections are occurring at a rate of about one per second. Most people infected with TB never develop active disease and only a small portion of the population develops active TB in various forms. Genetic profile of a person has been known to play an important role in patient susceptibility to active TB. There are certain genes in the human body which determine body’s response to TB and hence, play a critical role in different clinical manifestations of active TB in a patient. These genes control dynamic state of cells (behavior) making up the “gene expression profile”. This expression profile is used for the analysis of genetic data.
Previous research has led to the discovery of a gene associated with TB susceptibility namely CCl1 but there is not always a single gene involved in inducing a phenotype (disease). Genes are often co-expressed and are co-regulated producing a specific protein (condition). Therefore our study is looking to determine the genes (known and novel) which are co-expressed with and co-regulate CCl1 and become a determining factor for causing TB. We use a distance metric based clustering method to find co-regulated genes instead of merely finding differentially expressed genes as carried out in previous studies. We use three different distance metrics (Pearson correlation, Fisher discriminant and Kullback leibler distance) and use them to perform hierarchical clustering to investigate what other genes are correlated with CCL1. Our study has led to the discovery of genes (forming three clusters) which are possibly not only responsible for causing TB but can also discriminate between different clinical forms of TB.
Our results show that using the three dissimilarity metrics proposed we could reduce the data-sets by filtering based upon dissimilarity in expression with CCl1 and subsequently use various clustering techniques to cluster similar genes. Most of the 21 genes discovered in our study are found to play a role in lung functioning and development and some are also seen to be active in spread of certain tumors. Hence, some new co-regulated genes of CCl1 have been discovered which were previously unknown. The work can be extended by applying techniques such as k-means and Dbscan which could again be used employing different dissimilarity measures to discover clusters.
1
Chapter 1
According to an estimate one third of the world's population is believed to be infected with TB [7] [8]. Moreover, new infections occur at a rate of about one per second [5]. TB appears in different clinical forms including Latent (passive form), Pulmonary (localized to lungs) and Meningeal (affecting the nervous system) and has been declared as the global emergency of the millennium by the World Health Organization (WHO) [55]. According to a WHO estimate, by 2020 people infected with active TB will reach up to 1 billion.
In Pakistan around 60,000 people die of tuberculosis (TB) every year and it ranks sixth amongst the countries worst affected by the disease. More than one million people have TB in Pakistan. One new person is infected every two minutes and one dies every eight minutes [54]. According to a research, a person infected with active TB will infect on average between 10 to 15 people every year [6]. Each year, there are 9 million people around the world who get sick with TB and there are almost 2 million deaths due to Tuberculosis worldwide [7].
Despite the desperate need for drugs and vaccines for TB no new vaccine has been developed for the last 90 years [8]. The first anti-TB drug, streptomycin, was developed in 1944, and it has been nearly 30 years since a new TB drug has been developed [9]. The drugs being used today were developed some 40 years ago and they seem to be losing their efficacy [8]. TB requires antibiotics treatment to cure it and the treatment is six months long. However, TB patients tend to stop their medication as soon as they begin to feel well. In these cases bacteria remains in their body in the passive form and can later attack again with greater force. In developed countries people are suffering from tuberculosis because their immune systems are compromised by immunosuppressive drugs, substance abuse, AIDS etc and hence TB proves to be leading killer of people who are HIV infected [10].
In many Asian and African countries 80% of the people test positive in tuberculin tests (BCG) [8]. No vaccine is available that provides reliable protection for adults. Even BCG cannot protect against Pulmonary TB (lungs). Recent studies have also discovered that BCG is more dangerous in children born with AIDS. Referring to a study conducted by WHO, The New York Times (July 2009)
2
reported that BCG is a live vaccine and can cause a serious form of bacterial infection that can rage through the body of an HIV-infected infant. According to the report, infection is fatal in more than 70 percent of the cases. [8] “. In countries like South Africa, where both TB and transfer of AIDS virus from mother to child is very common, the vaccine does not protect against TB and it may kill them with BCG disease” [8]. According to WHO estimation the largest number of new TB cases in 2008 occurred in the South-East Asia Region, which makes up 35% of incident cases globally [10]. Since humans are the only host for the mycobacterium tuberculosis, eradication is considered possible. Moreover, most of the infected will not develop active TB and those who do would develop TB due to genetic susceptibility to the disease.
There are certain genes which have been found to play a significant role during TB infection. The up regulation and down regulation of these genes is responsible for production of proteins that allow or enhance the progression of TB [56]. Genes regulate and deregulate each other making up a structural relationship amongst genes referred to as co-expression or co-regulation. Co-expression usually indicates deeper similarities between the genes or their proteins [3]. For instance they might be on the same pathway, a set of proteins which interact in sequential or regular ways, in order to communicate a message or perform a function (disease) within the cell. Since pathways function as a unit, so all genes in the pathway are needed and biologists expect all the genes coding for proteins in a common pathway to be co-expressed. [3]
Genes are often co-expressed with the genes whose transcript (DNA) they regulate. For example, if Gene A is a transcription factor which activates Gene B, then when Gene A is expressed we should see a corresponding increase in the expression of Gene B. Gene up regulation and down regulation strongly influences the host genetic susceptibility to different kinds of diseases. TB is one of those diseases which is strongly influenced by the regulation of gene CCL1 [3]. Gene regulation starts with transcription (replication of DNA) and hence, transcript information is needed to understand gene regulatory networks. Measurement of genes transcription levels in organisms under various disease conditions, under different behavior, at different developmental stages is used to build up ‘gene expression profiles’ which characterize the dynamic functioning of each gene in the genome. Gene expression profiling indicates the quantitative and qualitative change in genes under different disease conditions. Such regulation and de-regulation provides a basis to understand the reasons behind the
3
underlying disease. This understanding may lead to a new way of diagnostic treatment of the disease [50].
Gene expression analysis helps in understanding gene regulation, metabolic, genetic mechanisms of disease and the response to drug treatments. For example, if over expression of certain genes is associated with a certain disease, we can explore other conditions that affect the expression of these genes and similarity of other gene expression profiles. We can also investigate which compounds (potential drugs) can lower the expression levels of such genes [35] to potentially lower the chances of the development of active disease. Identifying genetic variants that increase or decrease an individual’s susceptibility to disease can potentially lead to the targeting of preventive measures at those who are at greatest risk. This may also give valuable insights into the underlying molecular processes at a cellular level that are important in disease causation, opening the way for new and novel therapies to improve the outcomes of those with disease or who are at risk of disease [63]
Majority of the previous studies pertaining TB have discovered differentially expressed genes by comparing the expression levels of genes in different samples, however one study by Nguyen et al. is unique in the sense that they have not only proved that gene CCL1 is responsible for TB susceptibility using data analysis but also proved the fact empirically. However, it is an established fact that there is not always a single gene involved in causing a disease [4] but genes usually regulate and deregulate each other (making up structural relationships) in causing a specific phenotype.
This study is different from previous studies as it does not only find the differentially expressed genes but we have filtered out the high variance genes and applied distance based clustering on them to find the most relevant set of genes. We use three different distance metrics namely Pearson Correlation (PC), Kullback leibler distance (KD) and Fisher distance (FD) to compute distance values of all genes with respect to CCL1 and filter out the highly variant genes using thresholding over the distance values. Our study has employed distance based metrics and hierarchical clustering to explore the structural relationships between genes and a list of co-regulated genes is identified which play an important role in transformation of cells from a healthy state to an unhealthy state and also discriminate between different clinical forms of TB.
4
CCL1 is already discovered to be associated not only with TB but is responsible for different clinical forms of TB [3] but there could be multiple genes in the common pathway to establish certain condition or disease. So we attempted to find known and novel co-expressed and co-regulated genes of CCL1 and our results show that there are 21 genes which are co-expressed with CCl1 and hence are ultimately associated with TB. For evaluation of our results we used the Pearson Correlation Coefficient to estimate the similarity in expression levels of the genes in the three clusters discovered to CCL1. In this study, we have identified three clusters of genes that are highly co-expressed with CCl1 (as indicated by the high values of Pearson Correlation Coefficient of greater than 0.9) and hence play a role in TB susceptibility. The results in detail are presented in Chapter 4.
To understand how gene regulation works and how it impacts our health, we present some discussion on the biological background of the phenomenon and how gene expression levels are measured in the next section. |
en_US |