NUST Institutional Repository

ENHANCING PHISHING DETECTION THROUGH MACHINE LEARNING

Show simple item record

dc.contributor.author Zain ul Abidin
dc.date.accessioned 2024-03-14T06:16:36Z
dc.date.available 2024-03-14T06:16:36Z
dc.date.issued 2024
dc.identifier.other 329412
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/42622
dc.description Supervisor: Dr. Hasan Tahir Butt en_US
dc.description.abstract In the current digital environment, the prevalence of phishing attacks, which use social engineering to unlawfully obtain sensitive data like user credentials and personal information, is on the rise. This increase highlights the need for more advanced detection methods. Traditional phishing detection strategies are usually more effective with smaller datasets and often suffer from high computational demands due to their reliance on numerous features, limiting scalability in machine learning applications. This research introduces a new method employing five well-known machine learning algorithms: Logistic Regression, Random Forest, Gradient Boosting, XGBoost, and LightGBM. The goal is to create a general framework for analyzing large-scale phishing data. An extensive dataset of 274,131 phishing URL entries has been compiled from sources like Kaggle, PhishTank, and OpenPhish. This dataset covers a wide range of URL categories, including Benign, Defacement, Phishing, Malware, and Spam, offering a broad foundation for the detection model. A thorough preprocessing of the data was conducted to correct common issues such as incorrect formats, duplicates, broken links, and domain-only URLs, ensuring the dataset's quality for machine learning. A key aspect of this approach is the use of a relatively small set of features, even with larger datasets, addressing a major limitation of previous methods. The processed data underwent extraction, optimization, and evaluation within the proposed machine learning frameworks. The findings of this research are notable, showing that the new methodologies outperform existing techniques in detection accuracy, handling of large data volumes, and efficiency in feature use. Experimental results show especially high accuracy in phishing URL detection, with algorithms like Random Forest, Gradient Boosting, XGBoost, and LightGBM achieving up to 98% accuracy in identifying phishing URLs within the substantial 274,131 URL dataset. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Sciences (SEECS), NUST en_US
dc.subject Phishing Detection, Social Engineering, Machine Learning, URL Classification, Supervised Learning, Large-scale Dataset Analysis. ALLPhDTheses. en_US
dc.title ENHANCING PHISHING DETECTION THROUGH MACHINE LEARNING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [146]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account