dc.contributor.author |
Hussain, Tayyaba |
|
dc.date.accessioned |
2023-08-04T06:27:36Z |
|
dc.date.available |
2023-08-04T06:27:36Z |
|
dc.date.issued |
2021 |
|
dc.identifier.issn |
319554 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/35606 |
|
dc.description |
Supervisor: Dr. Muhammad Usman Akram |
en_US |
dc.description.abstract |
Evidence through data is critical if government is to address many threats facing
society, including; pandemics, climate change, Alzheimer’s disease, child hunger,
increasing food production, maintaining biodiversity, and addressing many other
challenges. Yet much of the information about data necessary to inform evidence and
science is locked inside publications. A new dataset is recently introduced, Coleridge
Initiative - Show US the Data, to discover how the data is used for the public good. In
this research, we demonstrate a general Data Extraction Framework Using Natural
Language Processing Techniques (DEFNLP) which challenges data scientists to show
how publicly funded data are used to serve science and society using Natural Language
Processing (NLP) techniques and models. The proposed framework uses NLP libraries
and techniques like SpaCy NER and different huggingface Question Answering (QA)
models to predict the datasets used in publications after further processing, data and
text mining. DEFNLP will enable government agencies and researchers to quickly find
the information they need. Till now such issue having large dataset which belongs to
numerous research areas has not been addressed. The proposed approach is domain
independent and therefore can be applied to all kind of case studies and scenarios where
data is extracted. Our methodology sets the state-of-the-art on this Coleridge dataset,
reaching the impressive outcome of 0.654, which outperforms current state-of-the-art
as compare to other frameworks. In terms of timing and performance, it has short timing
and high performance as each epoch took around 5 minutes on average on a CPU with
output size of 3.27kB. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
College of Electrical & Mechanical Engineering (CEME), NUST |
en_US |
dc.subject |
Keywords— Big Data, Data Extraction, Data Mining, Named Entity Recognition (NER), Natural Language Processing (NLP), Question Answering (QA) Modelling, Text Mining |
en_US |
dc.title |
A Novel Data Extraction Framework Using Natural Language Processing (DEFNLP) Techniques |
en_US |
dc.type |
Thesis |
en_US |