Abstract:
Fake news prediction is still a challenging problem. Fake news becomes a very important
issue and latest research topic in 2016, this topic becomes more and more important
especially after the US presidential election of Donald Trump and Henry Clinton. Fake news
is directly related to different spreading methods of fake information in our society for
changing the thoughts and minds of readers. A few years ago, first-time wrong information
problems were founded, but nowadays fake news detection becomes a big research topic
because in our society this disease becomes growing day by day and damaging badly to our
society. At present it becomes a very easy task for everyone they can spread fake news, they
can write fake news on the website, on web pages easily. In this research First, we have
performed a Systematic Literature Review (SLR). In the SLR, we compared studies for
getting proposed approaches, tools & techniques in the previous studies. Also, find out the
related datasets with their achieved accuracies in those studies. At the same time, we
compared studies for finding Natural Language Processing (NLP) techniques and methods in
the related studies. After SLR, we proposed a detailed methodology that shows the novel
approach for classifying the News articles. In the methodology, we designed an approach that
follows the NLP & Machine Learning techniques. After that, implemented six sub
approaches under the two main approaches which are related to unigram & bigram bag of
words. First, we followed all text data pre-processing techniques and then applied the features
extraction techniques for getting the most important features from text data. We used two
techniques for feature extraction Count Vectorizer & “Term Frequency – Inverse Document
Frequency” (TF-IDF) just for comparison of the results from both techniques. After that
implemented the machine learning four classifiers with the help of extracted features. Four
machine learning algorithms Multinomial Naïve Bayes (MNB), Random Forest (RF),
Support Vector Machine (SVM), and K Nearest Neighbor (KNN) used as classification
model. We also evaluated the implemented models for getting the best approach based on the
model’s accuracy. Used K-Fold Cross-Validation, confusion matric and other evaluation
metrics for evaluating the models such as Precision, Recall, and Accuracy. Further, we
compared our proposed approach result with state-of-the-art benchmarks approaches and we
achieved better results as compared to other approaches in terms of precision, recall, and
accuracy.