NUST Institutional Repository

A Robust Deep Bidirectional Model With Lower Parameters Size And Better Sentence Prediction Using Deep Learning And Modeling Techniques

Show simple item record

dc.contributor.author Jahan, Muhammad Shah
dc.date.accessioned 2023-08-03T10:40:07Z
dc.date.available 2023-08-03T10:40:07Z
dc.date.issued 2020
dc.identifier.other 00000275028
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/35557
dc.description Supervisor: Dr. Muhammad Usman Akram en_US
dc.description.abstract In transfer learning, a model is pre-trained on a large unlabeled dataset and then fine-tuned on downstream tasks. These pretraining and fine-tuning models are powerful and produced the best results on Natural Language Processing (NLP) tasks. These models are unidirectional but BERT introduced the first full deep bidirectional model which can read input from both sides of the input. BERT was pre-trained on Wikipedia and Book corpus dataset and fine-tuned with an extra layer. We present a replication study of BERT and provide a detailed analysis of the effect of hyperparameters during pre-training on downstream tasks. Due to the public unavailability of the Books Corpus dataset, we pre-trained the BERT from scratch on Wikipedia (2100M) and compares it with our model which trained on Wikipedia (531M). Our model Modified BERT “MBERT” achieves better results on GLUE (74.94) which consists of 8 tasks excepts STS-B, SQuADv1.1(57.40/69.50) and SQuADv2.0(56.19/59.38) dataset while saving pretraining from 53 hours to only 17 hours, six times less computational power and was also trained on four times smaller dataset. We also present a detailed study of why MBERT achieves these results on the SQuAD dataset. en_US
dc.language.iso en en_US
dc.publisher College of Electrical & Mechanical Engineering (CEME), NUST en_US
dc.subject Key Words: MBERT, BERT, bidirectional language modeling, language modeling, modified BERT, transformer. en_US
dc.title A Robust Deep Bidirectional Model With Lower Parameters Size And Better Sentence Prediction Using Deep Learning And Modeling Techniques en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [441]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account