NUST Institutional Repository

SECURE AND SCALABLE MALICIOUS URL DETECTION USING MACHINE LEARNING AND SERVERLESS COMPUTING

Show simple item record

dc.contributor.author Shaheen, Sikandar
dc.date.accessioned 2024-10-18T03:54:26Z
dc.date.available 2024-10-18T03:54:26Z
dc.date.issued 2024-10-18
dc.identifier.other 00000431932
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/47295
dc.description Supervised by Associate Prof Dr. Shahzaib Tahir en_US
dc.description.abstract Malicious URLs have become major threat vectors over the Internet, with attackers using URLs to launch attacks such as phishing campaigns, malware distribution, or even data exfiltration. Naturally, since malicious URLs are modeled after real ones, they can be difficult for users to spot and identify as frauds. A single attack can target an organization with thousands of malicious URLs, costing in data loss, financial losses and reputation damages. Even worse, cyber criminals are fast at revising their tactics and this makes the identification a frustration, which requires effective countermeasures. This thesis presents a novel hybrid model of machine learning and serverless computing to tackle the challenge of detecting malicious URLs. In this research, I am using a well-balanced dataset of 48,000 consistent URLs, from reputable sources such as PhishTank and VirusTotal. Using such a varied dataset achieves the purpose of training on benign as well as malicious URLs, thereby assisting the model to learn better and generalize well across various cyber threats. Through features extracting process, I have selected 54 different features (25 from the URL strings and 29 from the content of respective web pages) for identification of malicious URLs. Multiple machine learning model were tested and evaluated including Decision tree, Random Forest, AdaBoost. In the end, XGBoost emerged as the standout performer, achieving an impressive accuracy of 98.14%. The testing showed the potential of hybrid model in a high detection rate regarding malicious URLs and improved amount in processing time that was saved significantly by the serverless architecture. Serverless computing also provided sandbox like environment for securely extraction of features from webpage content. This research exhibits the efficiency of my hybrid model in precisely identifying malicious URLs and showcases the potential of merging machine learning with serverless computing to bolster our defenses against evolving cyber threats. en_US
dc.language.iso en en_US
dc.publisher MCS en_US
dc.title SECURE AND SCALABLE MALICIOUS URL DETECTION USING MACHINE LEARNING AND SERVERLESS COMPUTING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account