Abstract:
Early predictions and survivability analysis can often be a key to better treatment and
accurate prognosis of Cancer. Changes in staging model are a requirement to under stand the tumor behavior and its possible clinical outcomes. Different models of Machine
learning are widely used in order to increase prognostic accuracy.
In this research, for prognosis and Stage prediction of thyroid cancer, the data was gath ered from Cancer repository. National Cancer Institute has launched a program which
holds a number of registries on almost every type of cancer. This disease specific dataset
was fetched from the program’s database known as Surveillance, Epidemiology, and End
Results (SEER). The derived data model is similar to the American Joint Committee
on Cancer (AJCC).
The data is pre-processed to achieve good outputs. After cleaning and encoding of data,
the machine learning models are implemented. Models are tuned on hyper-parameters
and trained using the training data. To enhance the overall performance of cancer stage
prediction, class balancing strategies such as oversampling, undersampling, normaliza tion techniques and principle component analysis were added into the models.
To achieve improved results and better understanding we used different machine learning
classification models. The experimentation showed that the Gradient Boosting Machine
Learning technique implemented on the data combination of Tumor, Nodes, Metastasis
and Age (TNMA), generates best predictions for Stages. The evaluation measures used
to compare the performance of the machine learning models showed that Light Gradi ent Boosting gave an accuracy of 91% while AdaBoost gave an accuracy of 88.5% but
this value was enhanced to 96% when Decision Tree was used as the base for the Ad aBoost classifier. The results showed that adding class balancing approaches enhanced
the models’ performance greatly as well. The predicted stages are closely related to the
standard for cancer staging.
The survival probability for the Thyroid Cancer Stages showed that patients in earlier
stages can survive longer than the patients in the higher stages. The number of patients
reaching the final stages is found to be low. The demonstrated approaches in the thesis
can aid in the patient’s treatment decision making and can be utilized in making prog nostic systems.