NUST Institutional Repository

Malware Detection through Activity logs and Apply Machine Learning to detect new Malware

Show simple item record

dc.contributor.author Kazmi, Hasnain Taqi
dc.date.accessioned 2023-08-30T04:29:39Z
dc.date.available 2023-08-30T04:29:39Z
dc.date.issued 2023
dc.identifier.other 320246
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/37858
dc.description Supervisor: Dr. Sidra Sultana en_US
dc.description.abstract Amid the mounting and progressively intricate cyber threats, malware has emerged as a substantial challenge in today's digital world. Traditional defences, dependent on static analysis and signature based tactics, frequently fail to detect and classify variants of malware and zero-day attacks due to their vulnerability to obfuscation and polymorphism. However, behaviour-based malware detection, providing a deeper insight into the behaviour of malware execution, is more efficacious in malware family classification. This paper introduces a distinctive framework capable of correctly classifying familiar malware samples into their respective families. The research puts forward a comprehensive strategy for classifying malware families, from data prepossessing to feature selection, emphasising the pivotal role of machine learning in this process. The methodology employed involves three crucial stages: extraction of labels and features, representation of features, and finally, feature selection and classification. The study makes use of the publicly accessible "Malware Analysis Datasets: Top-1000 PE Imports" by IEEE, centring on the top 1000 imported functions culled from 'pe_imports' elements. These elements are detected utilising Cuckoo Sandbox, a robust and distributed framework for malware examination. The process of assigning labels to malware families is conducted through VirusTotal, which harnesses data from all available antivirus vendors, effectively mitigating potential issues related to label completeness, consistency, accuracy, and coverage. The features selected for malware classification revolve around the API calls tied to file, registry, network, process, and system, which are invoked during the execution of malware samples. Machine learning models, particularly Random Forests and Decision Trees, play a key role in feature selection, identifying 'Classification' and 'Family' as essential features for malware detection. Their significance is further validated through Information Entropy, which utilises the Information Gain Ratio to rank features. Amid the mounting and progressively intricate cyber threats, malware has emerged as a substantial challenge in today's digital world. Traditional defences, dependent on static analysis and signature based tactics, frequently fail to detect and classify variants of malware and zero-day attacks due to their vulnerability to obfuscation and polymorphism. However, behaviour-based malware detection, providing a deeper insight into the behaviour of malware execution, is more efficacious in malware family classification. This paper introduces a distinctive framework capable of correctly classifying familiar malware samples into their respective families. The research puts forward a comprehensive strategy for classifying malware families, from data prepossessing to feature selection, emphasising the pivotal role of machine learning in this process. The methodology employed involves three crucial stages: extraction of labels and features, representation of features, and finally, feature selection and classification. The study makes use of the publicly accessible "Malware Analysis Datasets: Top-1000 PE Imports" by IEEE, centring on the top 1000 imported functions culled from 'pe_imports' elements. These elements are detected utilising Cuckoo Sandbox, a robust and distributed framework for malware examination. The process of assigning labels to malware families is conducted through VirusTotal, which harnesses data from all available antivirus vendors, effectively mitigating potential issues related to label completeness, consistency, accuracy, and coverage. The features selected for malware classification revolve around the API calls tied to file, registry, network, process, and system, which are invoked during the execution of malware samples. Machine learning models, particularly Random Forests and Decision Trees, play a key role in feature selection, identifying 'Classification' and 'Family' as essential features for malware detection. Their significance is further validated through Information Entropy, which utilises the Information Gain Ratio to rank features. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Sciences (SEECS), NUST en_US
dc.title Malware Detection through Activity logs and Apply Machine Learning to detect new Malware en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account