dc.description.abstract |
Emailing is one of the most popular services for communication on the Internet.
Being used formally and informally all over the world. Spammers are exploiting this
service by using this service for adverse purposes. Today about 50% of total email traffic
is filled with spam emails, this results in a waste of resources. This has caused economic losses for email service providers. We have presented a background of spam and ham emailing with more emphasis on email classification by different artificial intelligence models. We have also carried out a comprehensive and detailed literature review of various work related to this field and have carried out a tabular comparative analysis of 35 such techniques (against popular performance metrics – nature of the dataset, feature selection, classification algorithm, accuracy, false positives).
This thesis presents a hybrid approach that has two stages. We have used Enron
Email Dataset, which is one of most popular publicly available email datasets. In the first
stage, Harris Hawks Optimization (HHO) based wrapper method is used to select the best feature subset. This helps in decreasing the number of features. In the second stage, Multilayer Perceptron model is developed with the selected feature subset. We have given a detailed mathematical explanation with each step highlighting the equation in use.The proposed approach is evaluated based on multiple performance metrics. The model was vigorously and repeatedly tested against the performance metrics. We have given a tabular evaluation of our model as well for the ease of the reader. Not only this, but we have also compared our approach with other best approaches in the literature (these authors were using similar dataset), and have concluded that we achieved an accuracy of 98.36%. We have also listed the merits and demerits of our proposed solution. We end the thesis with a brief conclusion and room for improvement. |
en_US |