dc.description.abstract |
In this thesis, we demonstrate that multi-layer perceptrons (MLPs) are a promising approach for detecting PDF-based malware. Malware in the form of PDF files is becoming increasingly prevalent, making it crucial to develop effective detection methods. Traditional methods for detecting malware, such as signature-based detection, are becoming less effective as attackers can easily evade them by modifying the malicious code. To train our MLP, we first collected a large dataset of both benign and malicious PDFs. The dataset was pre-processed to extract relevant features, such as the presence of certain keywords and the structure of the PDF file. In total we used 37 static representative features. We used a combination of supervised learning techniques to train the MLP on this dataset. The trained model was then evaluated on a separate
test dataset and was shown to have high accuracy of about 96% in detecting PDF-based
malware. We also investigated the effect of different feature selection methods and the impact of network architecture on the performance of the model. The results demonstrate that using MLPs for detecting PDF-based malware is an effective approach and can achieve high accuracy. Moreover, we also proposed an approach to increase the robustness of the model by using adversarial machine learning techniques to improve the model’s ability to detect novel and evasive malware. In conclusion, this thesis presents a novel approach for training MLPs to detect PDF-based malware, and the results demonstrate the effectiveness of this approach. The proposed approach could be used to improve the security of systems that handle PDF files and provide a new tool for the security community to fight against PDF-based malware. |
en_US |