NUST Institutional Repository

Corroborating Information from Disagreeing Views Using Machine Learning Techniques

Show simple item record

dc.contributor.author Riaz, Tayyeba
dc.date.accessioned 2023-08-19T15:06:32Z
dc.date.available 2023-08-19T15:06:32Z
dc.date.issued 2022
dc.identifier.other 277290
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/36981
dc.description Supervisor: Dr. Seemab Latif en_US
dc.description.abstract In this era of big data, huge amount of heterogeneous data is produced and shared on the internet making it a central medium for valuable sources of information. This data on the web can be published without quality control unlike the traditional media, thus, making it less reliable. Often data provided by different sources can be conflicting which can be due to noisy, erroneous, or obsolete data providers. It can also be easily manipulated by bots creating misleading data. This gives rise to a fundamental challenge for data extraction and fusion. This paper proposes an automated solution for truth finding from conflicting data by different sources by considering website credibility. It takes into consideration that different sources have varying degrees of reliability. It not only considers several factors about the sources but also provides with the true answer from a credible source. This paper identified seven web credibility categories namely Accuracy, Authority, Aesthetics, Professionalism, Popularity, Currency and Quality. Each category has several factors contributing to it. A total of 24 factors were used after applying feature reduction to approx. 100 identified factors from research. Six different supervised learning classifiers: Naïve Bayes, Support Vector Machine, Stochastic Gradient Descent, Neural Network, Decision Trees and Random Forest were employed. Existing solutions focus primarily on finding relevant web pages but either do not evaluate web pages’ credibility rather focus on trustworthiness only or evaluate two to three out of seven credibility categories. Experiments on the Book-Author dataset shows that Random Forest performs the best with an accuracy of 97.45%, Precision 0.975, Recall 0.975 and F-measure 0.974 when all the categories are used collectively. This is significantly higher than the baseline method using a single factor that can be categorized to authority category. The baseline accuracy is 87.77% with a Bayesian based approach. Furthermore, different experiments using each category separately and in combination were performed which shows that categories with many factors contribute more to credibility than the ones with a single factor. These are Professionalism, Popularity and Quality. Also, the importance of aesthetics category is proved experimentally. Accuracy of 93.47% for aesthetics category alone shows that it is vital in credibility which is rarely recognized. However, this study focuses primarily on using all the seven categories for web credibility to resolve conflicting data. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Science NUST SEECS en_US
dc.title Corroborating Information from Disagreeing Views Using Machine Learning Techniques en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [432]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account