Abstract:
Intrusion detection system (IDS) in past many years has played an important part in
improving performance of systems by avoiding and preventing false attacks on the systems
to make the networks more safe and secure. Now it has become very difficult in
this world to work on the internet due to the cyber-attacks and security risks on the
internet like intrusion detections. Intrusion detection system is very much powerful so
researchers have produced the different types of intrusion detection system for different
types of environments, because IDS can detect the abnormal behaviors of the system
very accurately. On the other hand many other problems are raising for the researchers
and the intrusion detection systems due to the change of the behaviors of the attacks
on the system. Which is very much frustrating because the attacks are highly impactful
as well as the life and the accuracy of the system is on the stake. So relying on these
intrusion detection systems IDS and prevention systems PS is very risky due to their
inability to detect the threats against the new level and nature of attacks on the systems.
Recently machine leaning has reached its heights in terms of detection of threats
and anomalies as compared to the anomaly based detection systems with good potential
where these kinds of systems generally fail. For that purpose state of the art classical
machine learning algorithms are used on the UNSW-NB15 dataset as a benchmark for
experimentation while other datasets are also available. In this paper, many types of
techniques related to machine learning are used to detect the attacks accurately on the
system. On the other hand the performance of the system is degraded when the data
is multi-dimensional; therefore numbers of dimensions from 49 to 12 were decreased
for achieving more accuracy. In addition to that, features or dimensions which have
less importance or not having bigger impact or sparse on the results were filtered out.
Hence ensemble methods were applied like Random Forest (RF), support vector machine (SVM), XGBoost, Logistic Regression (LR) and Decision Tree (DT) to check the accuracy of these models. Also the wrapper based feature selection technique RFE
(Recursive Feature Elimination) is used to gain the desired features. All the machine
learning algorithms are applied for binary classification as well as for the multi classification.
In our results it was found out that the Random Forest with the feature selection
method (RFE) has the accuracy of 99.70% for the binary classification which is most
while using the Anaconda 3.0 with python 3.0 and google colab in this research work.
We also applied the machine learning algorithms for multi-classification the accuracy is
70.72% for training and 65.86% for the training data set, and other wrapper based techniques like sequential feature selection which gives the accuracy of 99.70% with random forest using feature elimination.