Abstract:
Enormous amount of spam is sent out every day which requires a spam filtering solution that can effectively identify and filter unsolicited bulk emails. Spam emails should primarily be filtered since they frequently contain malicious content that can transmit infections and cyberattacks hence organizations need the finest protection they can get. Content spam filters analyze header and body of the email which provides the information about credibility of sender whereas a content filter checks for suspicious content in the actual body of the email like trigger words or images that are consistent with spammers.
Traditional spam filtering methods only block spam at the receiver's end and are unable to stop it from spreading throughout the network, wasting disc space, computing power, and network bandwidth. In order to solve these problems, this study suggests a methodology for sender-oriented anti-spamming that recognizes and filters spam at its source, helping to reduce resource waste while also discouraging spam. A sender based framework is proposed that investigates the email before sending it to the internet. A verification key is also generated so the receiver can verify that the particular email has gone through the filter.
Spammers are now using graphic spam to avoid text-based email examination. To prevent text from being recognized by OCR systems, a spam message is obfuscated and inserted in an image attachment. Many solutions have been proposed over the time to provide image authentication. Hashing is one such technique but the point of concern is that a slight change in the data can significantly change the hash value. Multimedia need some kind of authentication that is robust to minor content manipulation but fragile to malicious tempering. A promising solution to this problem is perceptual hashing. The principle of this technique is to extract robust and distinctive features from multimedia data. A perceptual hash value is calculated. The content is judged as authentic if its PH value is not significantly changed. This thesis examines most used perceptual image hashing techniques, their implementation, and then evaluate them in terms of accuracy and Average Time.
In this study we also implement deep learning algorithm to distinguish between benign and malicious content. A pretrained CNN model called ResNet-50 is used that is a 50 layers deep model and is pretrained on Imagenet Database. This model can classify images into 1000 object categories.
We tested both the techniques i.e. Perceptual Hashing and deep Learning using the same Dataset. Hence we provide a comparison of both techniques.