Abstract:
In the globe today, cancer is the second most common cause of death, killing more people
every year owing to its increasing growth rate. There is a vast amount of clinical data
in radiology reports and electronic health records (EHRs). Case studies are important
because they offer a plethora of medical information on diseases, treatments, and other
issues. However, because this information is frequently available as unstructured notes,
working with it can be challenging. Additionally, the data volume is huge, the production
rate is rapid, and the format is special. Thus, the conversion of health information into
standards-compliant, comparable, and consistent data is essential for these scenarios.
To address these challenges, we have proposed a knowledge extraction pipeline based on
schema based knowledge graphs (KG), from EHRs and clinical reports. After extracting
knowledge using Name Entity Recognition from radiology reports and EHRs of 33,431
cancer patients, we developed a knowledge graph in Neo4j containing 368,436 entities
and 754,061 relationships of 15 different semantic categories based upon the proposed
schema. The proposed method would serve as the initial step in understanding how
to use KG intelligently for uniform representation of medical knowledge to analyse the
course of disease after learning about it via EHRs.