Abstract:
Organization’s data confidentiality with strong cryptographic primitives is primarily not threatened by extramural elements, but from within the organizational boundaries i.e insider attacks. It results in breach of confidentiality, integrity and availability of the organization’s assets. Insider Threat caused by malicious abuse of authority has exceeded the traditional Trojan attacks and has become the main threat to organizations. Therefore, detection and prevention from Insider Threat is a real challenge due to enormous raw data. This issue is being dealt by research community through machine learning techniques for past few years. In the absence of a carefully crafted middle ground an employee although provided access to effectively perform his/her duty, is able to wreck scaled havoc. Which in turn hampers the organizational productivity and force the organization to shift its focus. Therefore, it is necessary to carefully design the access architecture and a system bounded by the ultimate cherry-on-top to mitigate such attacks. In this dissertation, we address this critical issue of Insider Threat through comprehensive machine learning based Frameworks.We present four different machine learningbased frameworks that aim to thwart Insider Attacks through multi-dimensional user information by including user logs, emails and psychometric features. Our first machine learning based framework named Supervised Stacked Model (S2M) is tailored towards reporting the class imbalance problem. Multiple low variance filters were tried followed by correlation filters on the output data. As part of this framework, we propose a hybrid ensemble S2M that correctly classifies and differentiate the insider samples from normal activities. Vertical and horizontal re sampling techniques were applied and tested on re sampled data set. The proposed solution is tested on CERT 4.2 dataset which has normal and malicious activities of 1000 users recorded for the year 2010 to 2011 with more than 31 M records. Our second framework is named as Dynamic Weighted-Voting Ensemble (DWvEn). An ensemble model established on the weighted-voting approach for Insider Threat detection. We have brought together the feature engineering methods and ensemble learners that amicably classify the majority of malicious activities. Our proposed framework dynamically assigns weights to base learners predicted on their competency. We evaluated DWvEn on a substantial and largest publicly available datasets CERT 4.2 and CERT 6.2 by using multiple pre-processing and feature engineering techniques. As part of our email-based frameworks, we have applied semi supervised machine learning taxonomy on valuable collection of Enron corpus and TWOS datasets for the identification of unlabeled malicious emails and handling the Over-fitting issue in small dataset respectively. The former research is devoted to “traitor detection” which has remained very restricted as compared to “masquerader detection”. In this research Class label identification done through clustering algorithm and prediction of malicious emails is carried out by using multiple Machine Learning Classifiers. The frameworks and methodologies presented in this dissertation can assist a broad spectrum of organizations in attenuating Insider Threats. Conclusively, this thesis presents a comprehensive Intelligent Framework for effective classification of Insider Threats and essential to have multiple Models/ Frameworks depending on the type of datasets being handled.