Abstract:
Keeping your computer system safe from all types of viruses, trojans, spyware, Ransomes is
a daily basis struggle. All around the World people are struck by this problem on a daily
basis. Using anti-virus is so far the best possible cure found till now. The problem with antivirus is that it is unable to detect any new type of malware; it will traditionally match the
characteristics of the previously detected malware with the newly detected malware. It can
easily be fooled by malware with different characteristics and hence your system gets
infected. To overcome this hurdle artificial learning approach is applied for this thesis work.
Machine learning has tremendous power to predict based on training done previously. One
essential for artificial intelligence is large amounts of datasets. One of the goals of this
research work was to collect enough dataset to apply machine learning. Only static features
were drawn out from benign and malware PE files for classification. Two datasets were used
a publicly available dataset and a self-collected dataset of about 21,000 samples. In machine
learning, unsupervised algorithms using the resultant features given by PCA gave precision
and recall above 0.8%. Results produced by machine learning supervised and unsupervised
algorithms resulted in above 80% training and testing accuracy. Best results were given by
dimensionality reduction approaches. Above 90% accuracy was achieved in proposed
dimensionality reduction models. This approach was pronounced to be better than the
traditional signature-based malware detection techniques due to its ability to learn and
predict.