dc.description.abstract |
Leukemia is a malignancy of white blood cells (WBC’s) arises from hematopoietic stem cells. A common, essential, initial, and normal examination test which may indicate the presence of leukemia and its subtypes is Complete Blood Count (CBC). A CBC report provides useful information of different characteristics of blood cells that can be used for differential diagnosis. This study is designed to analysis different characteristics of CBC reports to develop predictive models for the screening of suspected patients of leukemia and its subtypes. In this study, primary data set of 302 CBC reports is collected from eight different hospitals of Rawalpindi and Islamabad regions. Out of these 302 CBC reports 67 are normal (non-leukemic), 123 are Acute Myeloid Leukemia (AML), 79 are Chronic Myeloid Leukemia (CML) and 18 are Acute Lymphocytic Leukemia (ALL). A CBC report usually consists of 21 different characteristics/variables of blood picture of a person. Out of these 21 variables, 15 variables are selected for the analysis by dropping information of percentages of various variables to avoid duplication. Comparative analysis has been used to validate statistically significant differences between the numerical estimates of means with respect to four categories of all selected variables. The results show that Mean Corpuscular Haemoglobin (MCH) is the only variable having statistically insignificant difference between the means of normal, AML, CML and ALL. To check the existence of linear relationship between variables, correlation analysis is performed. This analysis also helps in the identification of multicollinearity problem for the development of logistic regression models. For the development of Multinomial Logistic Regression (MLR) model, five different combinations of methods for inclusion of relevant variables in the model or exclusion of irrelevant variables from the model. These are backward elimination method using Wald’s criteria, selection of variables using odds ratios (OR), selection of variables from combination of dropping insignificant variables simultaneously and Wald’s test, selection of variables from combination of dropping insignificant variables simultaneously and OR and selection of variables from combination of Wald test and OR. Final selection of any variable is done based on the criteria that it is successfully shortlisted in at least three methods of selection.
xiv
Abstract
Therefore, four variables have been identified namely haemoglobin, neutrophil count, monocyte count and gender being appropriate variables for development of multinomial logistic regression model. The performance of the developed model is checked through different measures like accuracy, sensitivity, specificity, and precision. The results show that in case of Normal vs AML the accuracy is 86 %, sensitivity is 86%, specificity is 85% and precision is 91%. For Normal vs CML, accuracy is 88%, sensitivity is 91%, specificity is 85% and precision is 87%. For Normal vs ALL, accuracy is 88%, sensitivity is 100%, specificity is 85% and precision is 64%. These results show that the developed models can be used with confidence for the subjective screening of disease, i.e leukemia or its subtypes. A notable point is that the proposed model is not intended to be used as replacement of the formal diagnostic tests of leukemia like bone marrow biopsy, flow cytometry, etc. It facilitates basic technical support for screening of patients using data driven models. Therefore, a combination of subjective and objective assessment can improve the quality of diagnosis of leukemia or its subtypes at early stages. |
en_US |