A Robust Deep Bidirectional Model With Lower Parameters Size And Better Sentence Prediction Using Deep Learning And Modeling Techniques

Jahan, Muhammad Shah

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
MS
→
View Item

A Robust Deep Bidirectional Model With Lower Parameters Size And Better Sentence Prediction Using Deep Learning And Modeling Techniques

Jahan, Muhammad Shah

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35557

Date: 2020

Abstract:

In transfer learning, a model is pre-trained on a large unlabeled dataset and then fine-tuned on downstream tasks. These pretraining and fine-tuning models are powerful and produced the best results on Natural Language Processing (NLP) tasks. These models are unidirectional but BERT introduced the first full deep bidirectional model which can read input from both sides of the input. BERT was pre-trained on Wikipedia and Book corpus dataset and fine-tuned with an extra layer. We present a replication study of BERT and provide a detailed analysis of the effect of hyperparameters during pre-training on downstream tasks. Due to the public unavailability of the Books Corpus dataset, we pre-trained the BERT from scratch on Wikipedia (2100M) and compares it with our model which trained on Wikipedia (531M). Our model Modified BERT “MBERT” achieves better results on GLUE (74.94) which consists of 8 tasks excepts STS-B, SQuADv1.1(57.40/69.50) and SQuADv2.0(56.19/59.38) dataset while saving pretraining from 53 hours to only 17 hours, six times less computational power and was also trained on four times smaller dataset. We also present a detailed study of why MBERT achieves these results on the SQuAD dataset.