Unreal and Counterfeit News Prediction Using Machine Learning

Rehman, Asad Ur

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
MS
→
View Item

Unreal and Counterfeit News Prediction Using Machine Learning

Rehman, Asad Ur

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35315

Date: 2021

Abstract:

Fake news prediction is still a challenging problem. Fake news becomes a very important issue and latest research topic in 2016, this topic becomes more and more important especially after the US presidential election of Donald Trump and Henry Clinton. Fake news is directly related to different spreading methods of fake information in our society for changing the thoughts and minds of readers. A few years ago, first-time wrong information problems were founded, but nowadays fake news detection becomes a big research topic because in our society this disease becomes growing day by day and damaging badly to our society. At present it becomes a very easy task for everyone they can spread fake news, they can write fake news on the website, on web pages easily. In this research First, we have performed a Systematic Literature Review (SLR). In the SLR, we compared studies for getting proposed approaches, tools & techniques in the previous studies. Also, find out the related datasets with their achieved accuracies in those studies. At the same time, we compared studies for finding Natural Language Processing (NLP) techniques and methods in the related studies. After SLR, we proposed a detailed methodology that shows the novel approach for classifying the News articles. In the methodology, we designed an approach that follows the NLP & Machine Learning techniques. After that, implemented six sub approaches under the two main approaches which are related to unigram & bigram bag of words. First, we followed all text data pre-processing techniques and then applied the features extraction techniques for getting the most important features from text data. We used two techniques for feature extraction Count Vectorizer & “Term Frequency – Inverse Document Frequency” (TF-IDF) just for comparison of the results from both techniques. After that implemented the machine learning four classifiers with the help of extracted features. Four machine learning algorithms Multinomial Naïve Bayes (MNB), Random Forest (RF), Support Vector Machine (SVM), and K Nearest Neighbor (KNN) used as classification model. We also evaluated the implemented models for getting the best approach based on the model’s accuracy. Used K-Fold Cross-Validation, confusion matric and other evaluation metrics for evaluating the models such as Precision, Recall, and Accuracy. Further, we compared our proposed approach result with state-of-the-art benchmarks approaches and we achieved better results as compared to other approaches in terms of precision, recall, and accuracy.