dc.description.abstract |
With 2.5 quintillion bytes of data produced daily in the modern digital age, the exponen tial increase of data output has reached unprecedented levels. Because of the widespread
connection of people and devices, the volume of data is only going up, making it increas ingly difficult to analyse it all effectively. This study focuses on topic modelling, which
is the study, comprehension, and organisation of textual data. Although Latent Dirich let Allocation (LDA) is frequently used for topic modelling clustering, its effectiveness
suffers when it is utilised for short text messages seen on social networking platforms,
product reviews, and customer feedback. The work investigates the combination of
BERT (Bidirectional Encoder Representations from Transformers) with LDA in an ef fort to overcome this drawback and produce better outcomes. The results show that
the combined LDA+BERT strategy performs better than the use of LDA and BERT
separately, leading to more balanced and distinct clusters. This hybrid model makes
use of the probabilistic topic assignments produced by LDA as well as the semantic
comprehension and contextual representations offered by BERT. The outcomes demon strate the method’s potential to improve topic modelling and clustering’s quality and
interpretability, particularly in cases involving short text data.
This study contributes to the development of topic modelling techniques by combining
LDA and BERT. In a variety of textual datasets, the hybrid LDA+BERT model excels
in identifying important subjects and underlying trends. To enhance the effectiveness
of topic modelling and clustering, the merger of BERT’s contextual representations and
LDA’s probabilistic topic assignments presents a viable option. This method may help
people make better decisions and offer insightful information in a variety of applications
and domains. The study shows that, particularly in scenarios involving short text input,
practitioners can extract higher-quality topics and produce higher clustering outcomes by combining these two models. The results of this study add to the expanding body
of knowledge in the area of topic modelling and set the way for further development in
textual source data analysis and information extraction. |
en_US |