Abstract:
Machine Learning algorithms have been performing expedient predictions in the fields like IT-banking and data rich business-related problems. In transactional fraud detection, it is important to analyze customers’ pattern of transactions as every user has few operation patterns that must be taken into consideration. One of the major challenges that transactional fraud detection has to face is confidentiality of actual user data and the sensitive variables of the dataset used for training purpose. It is difficult to obtain real world datasets specially when there are very low number of fraudulent transactions present in millions of genuine transactions. Another challenge is to evaluate one’s work to judge its performance based on evaluation matrices and criteria. In Machine Learning, problems like anomaly detection cannot be simply evaluated based on accuracy. In this thesis, we have created a transaction dataset from scratch containing transactions from March 2021 to May 2021. Since our initial data is raw and highly imbalance, we experiment with data transformation and machine learning techniques in order to detect fraudulent transactions. After comparison of multiple methods, we share our results and conclude that balanced dataset is the key to achieve highest accuracy on applied classifiers. We achieve 78% accuracy after balancing out dataset via SMOTE analysis on dataset that happen to be 28% more than that of imbalance dataset i.e. 50% in first few trials.