NUST Institutional Repository

An Efficient Phishing URLs Detection Approach Using Supervised Machine Learning

Show simple item record

dc.contributor.author Mustafa, Muhammad
dc.date.accessioned 2023-08-23T10:58:57Z
dc.date.available 2023-08-23T10:58:57Z
dc.date.issued 2023
dc.identifier.other 328823
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/37297
dc.description Supervisor: Dr. Mehdi Hussain en_US
dc.description.abstract A phishing attack is an instance of social engineering in which the perpetrator deceives the user to gain access to sensitive information and/or personal data without authorization. This attack vector has become a prevalent problem in recent years and can result in substantial financial damage, as well as the potential risk of identity theft, data loss, and long-term damage to an organization's reputation. In prior efforts to counter this attack vector, researchers employed machine learning-based approaches which are based on lexical analysis of URLs and make use of datasets containing websites’ URLS. However, these approaches are effective only on smaller no of dataset entries and are unable to detect new phishing URLs. This research has optimized an existing anti-phishing methodology to function on a larger dataset of phishing website URLs. To this end, a dataset of 150,000 URLs is collected for experimentation, and a set of optimized lexical features is incorporated. To obtain the optimal set of features, the feature significance scheme is then employed, using Random Forest Python code to reduce the number of lexical features from 70 to 15. For experiments, nine different machine learning classification algorithms, such as Random Forest, Support Vector Machine, and Logistic Regression, were used to assess the results. Precision, Recall, F1 Score, and Accuracy metrics were evaluated in comparison to the benchmark study. In experiments, it is observed that the proposed methodology obtained high detection accuracies as compared to the benchmark approach on a larger phishing dataset (150k), where the kNN classifier achieved the best detection accuracy of 99.98%. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Sciences (SEECS), NUST en_US
dc.title An Efficient Phishing URLs Detection Approach Using Supervised Machine Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [146]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account