Abstract:
Skin cancer is the deadliest form of cancer and if it is not diagnosed and treated timely,
it can prove to be fatal. Many deaths occur globally because of skin cancer and the
number continues to rise every year. With the advent of AI, as with several other
areas, researchers started working to automate the diagnoses of skin cancer to curb the
fatalities caused by late diagnosis. So far, many datasets have been developed and many
models have been trained for the classification of skin lesions as cancerous or benign.
A team of scientists from Stanford university studied AI dermatology with respect to
race inclusion and their findings suggested that black and brown skin representation
is negligible with most of the publicly available datasets consisting of predominantly
white skin images. They curated a dataset with images of all skin tones, which they
released in March 2022 and called it Diverse Dermatology Image (DDI). Due to lack
of representation of tones in previous datasets, the models trained on those datasets
failed to perform on the DDI dataset. Our study analyzes the performance of machine
learning and deep learning algorithms on the DDI dataset. We used image processing
techniques like black-hat filtering and inpainting to pre-process images, performed data
augmentation on the pre-processed images and trained a CNN on this dataset. We then
used two pre-trained models on the DDI using transfer learning and fine-tuning. For
evaluation of Machine Learning, we used our trained CNN as a feature extractor and
fed those features to SVM. We achieved an ROC-AUC value of 0.82 with our model,
which is an improvement on the value achieved by the original paper.
Keywords: Skin Lesion Classification, Traditional Machine Learning, Deep Learning,
Transfer Learning, Diverse Dermatology Dataset (DDI)