dc.description.abstract |
Leukemia is an abnormal clonal proliferation of hematopoietic stem cells that affects
the bone marrow and lymphatic system. Despite the availability of diagnostic tests, the
mortality rate of leukemia is increasing, especially in developing countries with
insufficient healthcare facilities. One possible reason may be late or misdiagnosis
majorly due to painful procedure of sample collection and expensive diagnostic tests.
Therefore, there is a need to improve efficiency of early screening through inexpensive
tests like Complete Blood Count (CBC) test. This can be achieved by supplementing the
usual subjective assessment of medical practitioners through objective data driven models.
For this purpose, a secondary data set of 287 CBC reports has been used with 210
disease/leukemic and 67 control/non-leukemic cases. For classifications, various
combinations of features have been modeled using different machine learning methods
like Support Vector machine (SVM), Decision Tree (DT) and Random Forest (RF).
These combinations include biologically as well as statistically significant features. For
the assessment of developed models, a stratified 10-fold cross validation is used with
measures like precision, accuracy, recall, F-1 score and specificity. The study concludes
that RF method is adequate with 12 features to predict state of the subject. These features
are Haemoglobin, Haematocrit, Red Blood Cell Count, Monocyte Percent, Platelet
Count, Neutrophil Percent, Monocyte Count, Eosinophil Percent, White Blood Cell
Count, Lymphocyte Percent, Mean Corpuscular Volume and Lymphocyte Count.
Therefore, the proposed process can be helpful to medical practitioners or pathologists
for screening leukemic patients using numerical estimates of CBC features. |
en_US |