dc.description.abstract |
Leukemia is a cancer of white blood cells and body's blood forming tissues, including the bone marrow and the lymphatic system. It is ranked at 5th position in Pakistan with a prevalence rate of 4.2%. Late diagnosis of leukemia is one of the major factors in its prevalence rate. Diagnosis of leukemia is done by several diagnostic techniques such as bone marrow biopsy, myelograms, cytogenetic and immunophenotyping. Some of these methods are invasive and painful while some requires a lot of time and money. However, pre-processing and screening of leukemia is usually done based on the history of the patient, clinical symptoms and complete blood count (CBC) report, etc. Among these screening procedures, CBC report is a useful, common and efficient method in terms of time and cost. Moreover, it is not painful and helps in indicating various blood diseases like leukemia. A subjective assessment is usually adopted for screening of leukemia through CBC report. Thus the assessment varies from practitioner to practitioner; hence chances of mis/no diagnoses are higher. Therefore, there is a need to develop an objective data driven model to improve the accuracy and precision in decision making with respect to the screening of leukemia using CBC reports. This study is designed to develop machine learning models using secondary data of CBC reports of 287 subjects obtained from eight different hospitals of Rawalpindi and Islamabad. Two methods namely Radial Basis Function (RBF) and Multilayer Perceptron (MLP) with softmax and hyperbolic tangent functions have been used, respectively. The analysis has two sections. Section I deal with development of predictive models using binary categorical dependent variable (disease/leukemic Vs normal/non-leukemic) and six explanatory variables namely gender, white blood cells, monocytes count, neutrophil count, eosinophil count and lymphocyte count. While, section II deals with the development of predictive models using
ABSTRACT
iii
multinomial categorical variable (normal/non-leukemic Vs Acute Lymphoid Leukemia(ALL) Vs Chronic Myelogenous Leukemia(CML) Vs Acute Myelogenous Leukemia(AML)) with the same set of independent variables including age. Based on the four assessment measures accuracy, sensitivity, specificity and precision, for Section I, the performance of RBF is better than MLP. For Section II, MLP performed better than RBF in terms of accuracy; however, the models are inaccurately predicting for the category of ALL. One major reason of this inaccuracy is the availability of very limited data in this category (we only have 18 CBC reports of subjects suspected of ALL). Therefore, the results of this study can be improved with an addition of further data, especially for the category ALL. The results of this study would be helpful for the practitioners to improve accuracy in screening of leukemia and its subtypes using characteristics belonging to the cluster of white blood cells of CBC report. |
en_US |