NUST Institutional Repository

URL-Analyzer Intelligent Detection and Classi fication of Malicious URLs using Natural Language Processing

Show simple item record

dc.contributor.author Anila Ghazanfar
dc.date.accessioned 2021-01-07T07:36:15Z
dc.date.available 2021-01-07T07:36:15Z
dc.date.issued 2017
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/20681
dc.description Supervisor: Dr. Zahid Anwar en_US
dc.description.abstract Uniform Resource Locators (URLs) have been a basis of the Web since its origin. They are the reference point to any resource in the cyber space. Ac- cording to Verizon DBIR 2016 report, attacks on web applications are the single-biggest source of data loss and they account for over 40% of incidents resulting in data breaches. The main challenges are that: Firstly, URLs are sometimes hidden, shortened or encoded which humans cannot readily identify as legitimate. While this is typical for URLs, the attackers utilize it to their advantage. Secondly, the automation of attacks using domain generation algorithms (DGA) and exploit toolkits have led to the need for the automated and proactive protection system for malicious URL detec- tion. Thirdly, the attackers can manipulate the users to redirect them to the intended URL without the need to click by a variety of attacks such as Phishing attack and drive-by-download attacks. Existing malicious web site detection techniques have limitations in terms of accuracy and time that have inverse relation and are di cult to achieve at a good rate. Also, few research works are focused on speci c attack types such as domain typosquatting. The aim of this work is the development of a system, URL-Analyzer, for malicious URL detection with good accuracy and time trade-o . This novel contribution is based on Natural Language Processing(NLP) technique and a breadth of features: (i) URL lexical, (ii) host based, (iii) social reputation and (iv) time-based features for the detection of phishing, malware, drive by download and typosquatting. The input to the URL-Analyzer comprises of known benign and malicious URL datasets. To detect signs of malicious- ness in the URL, static analysis technique for feature extraction and machine learning for classi cation is employed. The accuracy achieved classi cation ranges from 97% to 98.5% with average time to extract features 1.98 ms, 8 sec and 34.25 sec per URL for the URL-Analyzer's three developed modes: (i) o ine, (ii) online and (iii) partially online respectively. en_US
dc.publisher SEECS, National University of Sciences and Technology, Islamabad en_US
dc.subject Information Security en_US
dc.title URL-Analyzer Intelligent Detection and Classi fication of Malicious URLs using Natural Language Processing en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [146]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account