Abstract:
Money laundering, a critical issue in financial systems worldwide, involves the process of making illicitly-gained proceeds appear legitimate. As financial transactions grow increasingly complex, it has become harder for traditional methods to detect and prevent laundering activities effectively. The rise of sophisticated techniques such as cross-currency transactions and rapidly evolving fraudulent practices necessitates the development of more advanced, automated approaches for identifying suspicious activity. This research introduces a novel graph-based approach for detecting money laundering using advanced machine learning models—particularly Graph Convolutional Networks (GCN), GraphSAGE, and our proposed models, Adaptive Sampling Aggregated Graph Convolutional Network (ASA-GCN) and ASA-GNN. These models are designed to process graph-based data, such as financial transactions, and identify suspicious activity based on the relationships and interactions between entities. The primary objective of this thesis is to propose, develop, and evaluate a robust model for detecting money laundering. The models were trained and tested on the IBM Anti-Money Laundering (AML) dataset, which includes simulated financial transactions representing both legitimate and fraudulent activities. This dataset, rich in attributes such as transaction timestamps, amounts, currencies, and identifiers for the originating and receiving accounts, provides an ideal testing ground for assessing the performance of graph-based models. The results of this study demonstrate that the proposed ASA-GCN model consistently outperforms traditional graph-based models and baseline machine learning methods across several key metrics. ASA-GCN achieves an Area Under the Curve (AUC) score of 0.99, far exceeding the performance of GCN, GraphSAGE, and GAT, which typically range between 0.75 and 0.80. In addition, ASA-GCN demonstrates higher precision and recall, with an average precision (AP) score of 0.98, indicating its superior ability to identify both money
x
Money laundering, a critical issue in financial systems worldwide, involves the process of making illicitly-gained proceeds appear legitimate. As financial transactions grow increasingly complex, it has become harder for traditional methods to detect and prevent laundering activities effectively. The rise of sophisticated techniques such as cross-currency transactions and rapidly evolving fraudulent practices necessitates the development of more advanced, automated approaches for identifying suspicious activity. This research introduces a novel graph-based approach for detecting money laundering using advanced machine learning models—particularly Graph Convolutional Networks (GCN), GraphSAGE, and our proposed models, Adaptive Sampling Aggregated Graph Convolutional Network (ASA-GCN) and ASA-GNN. These models are designed to process graph-based data, such as financial transactions, and identify suspicious activity based on the relationships and interactions between entities. The primary objective of this thesis is to propose, develop, and evaluate a robust model for detecting money laundering. The models were trained and tested on the IBM Anti-Money Laundering (AML) dataset, which includes simulated financial transactions representing both legitimate and fraudulent activities. This dataset, rich in attributes such as transaction timestamps, amounts, currencies, and identifiers for the originating and receiving accounts, provides an ideal testing ground for assessing the performance of graph-based models. The results of this study demonstrate that the proposed ASA-GCN model consistently outperforms traditional graph-based models and baseline machine learning methods across several key metrics. ASA-GCN achieves an Area Under the Curve (AUC) score of 0.99, far exceeding the performance of GCN, GraphSAGE, and GAT, which typically range between 0.75 and 0.80. In addition, ASA-GCN demonstrates higher precision and recall, with an average precision (AP) score of 0.98, indicating its superior ability to identify both money
x
laundering transactions and non-money laundering transactions with minimal false positives and false negatives. Beyond the technical performance, this research highlights the interpretability of the ASA-GCN model. By examining the attention weights and node representations, we are able to understand how the model identifies suspicious transactions. This interpretability is essential for financial institutions that require transparency in their decision-making processes for compliance with anti-money laundering regulations. The findings of this thesis have broad implications for the future of anti-money laundering systems. As financial crimes become more intricate and the datasets grow larger, the use of graph-based machine learning models like ASA-GCN could revolutionize how banks, financial institutions, and regulatory agencies detect and prevent money laundering. The ability to process vast amounts of transactional data in real-time and accurately detect fraudulent activities makes these models indispensable tools in the fight against financial crime. In conclusion, this thesis presents ASA-GCN as a state-of-the-art model for money laundering detection. With its high accuracy, scalability, and interpretability, it holds great promise for practical deployment in financial institutions. However, future research could focus on optimizing the training time and exploring transfer learning methods to extend the applicability of the model to different domains. The potential for real-time implementation also opens avenues for further exploration, ensuring that financial institutions can stay ahead in the rapidly evolving landscape of financial crime.