NUST Institutional Repository

Extracting Cancer Phenotypes from Radiology Reports and Clinical Records - Tapping the power of unstructured data

Show simple item record

dc.contributor.author Khan, Saad Ahmad
dc.date.accessioned 2023-01-02T10:35:04Z
dc.date.available 2023-01-02T10:35:04Z
dc.date.issued 2022
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/32019
dc.description.abstract With an increasing growth rate, cancer marks itself as one of the major causes of death worldwide. The exponentially increasing adverse effects of cancer on patients have propelled the research community to dive into avenues relating to that could reduce the time to diagnosis and treatment. One such pathway is the extraction of the physical traits of cancer, phenotypes. Correlating genomics data with phenotypic information, typically found in clinical notes is vital for the comprehensive understanding of cancer. However, the quantity and the diversity of notes make manual extraction of phenotypes a human resource-intensive task. Furthermore, the unstructured nature of clinical notes makes them complex to work with generic data extraction tools. Rule-based techniques have been employed previously to obtain this information, however, the usage of rules limits the scope of the model in terms of cancers and phenotypes covered. We have aimed to devise a model that could tackle these limitations by utilizing NLP concepts such as NER to be independent of rules, and reduce the dependency of data extraction on medical practitioners. We extend a cancer ontology to include 65 phenotypes for eight cancer types and propose a Named Entity Recognition (NER) based multi-cancer multi-phenotypes extraction method from unstructured clinical records. A qualitative and quantitative comparative analysis has been carried out between SpaCy NER and BERT-based NER models, with BERT outperforming by achieving precision and recall scores of 0.84 and 0.85 respectively. In order to cope with the large dataset, active learning was also introduced with an uncertainty sampling interpretation presented for NER problems. The research highlights the benefit of employing active learning with BERT to annotate a large dataset by manually labelling a small representative sample of the data. en_US
dc.description.sponsorship Dr. Muhammad Moazam Fraz en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Sciences (SEECS) NUST en_US
dc.title Extracting Cancer Phenotypes from Radiology Reports and Clinical Records - Tapping the power of unstructured data en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account