Abstract
In this paper, we address the problem of detecting potentially illicit behavior in the context of Anti-Money Laundering (AML). We specifically address two requirements that arise when training machine learning models for AML: scalability and imbalance-resistance. By scalability we mean the ability to train the models to very large transaction datasets. By imbalance-resistance we mean the ability for the model to achieve suitable accuracy despite high class imbalance, i.e. the low number of instances of potentially illicit behavior relative to a large number of features that may characterize potentially illicit behavior. We propose a two-layered modelling concept. The first layer consists of a Logistic Regression model with simple features, which can be computed with low overhead. These features capture customer profiles as well as global aggregates of transaction histories. This layer filters out a proportion of customers whose activity patterns can be deemed non-illicit with high confidence. In the second layer, a gradient boosting model with complex features is used so as to classify the remaining customers. We anticipate that this two-layered approach achieves the stated requirements. Firstly, feature extraction is more scalable as the more computationally demanding features of the second layer do not need to be extracted for every customer. Secondly, the first layer acts as an undersampling method for the second layer, thus partially addressing the class imbalance. We validate the approach using a real dataset of customer profiles and transaction histories, together with labels provided by AML experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We also conducted experiments using another implementation of extreme gradient boosting (XGBoost), but these experiments consistently led to lower accuracy. In the evaluation reported below, we only report on results obtained using Catboost.
References
Tsui, E., Gao, S., Xu, D., Wang, H., Green, P.: Knowledge-based anti-money laundering: a software agent bank application. J. Knowl. Manage. (2009)
Breslow, S., Hagstroem, M., Mikkelsen, D., Robu, K. The new frontier in anti-money laundering McKinsey Insights, November 2017. https://www.mckinsey.com/business-functions/risk/our-insights/the-new-frontier-in-anti-money-laundering
Kotsiantis, S., Koumanakos, E., Tzelepis, D., Tampakas, V.: Forecasting fraudulent financial statements using data mining. Int. J. Comput. Intell. 3(2), 104–110 (2006)
Jayasree, V., Siva Balan, R.V.: Money laundering regulatory risk evaluation using bitmap index-based decision tree. J. Assoc. Arab Univ. Basic Appl. Sci. 23(1), 96–102 (2017)
Nielsen, D.: Tree boosting with XGBoost - why does XGBoost win “every” machine learning competition? Master’s Thesis, NTNU (2016)
Palshikar, G.K., Apte, M.: Financial Security Against Money Laundering: A Survey. In: Emerging Trends in ICT Security, pp. 577–590. Morgan Kaufmann (2014)
Senator, T.E., et al.: Financial crimes enforcement network AI system (FAIS) identifying potential money laundering from reports of large cash transactions. AI Mag. 16(4), 21 (1995)
Chen, Z., Teoh, E.N., Nazir, A., Karuppiah, E.K., Lam, K.S.: Machine learning techniques for anti-money laundering (AML) solutions in potentially suspicious transaction detection: a review. Knowl. Inf. Syst. 57(2), 245–285 (2018)
Helmy, T.H., Zaki, M., Salah, T., Badran, K.: Design of a monitor for detecting money laundering and terrorist financing. J. Theoret. Appl. Inf. Technol. 85(3), 425 (2016)
Chen, Y.T., Mathe, J.: Fuzzy computing applications for anti-money laundering and distributed storage system load monitoring (2011)
Cortinas, R., et al.: Secure failure detection and consensus in trustedpals. IEEE Trans. Dependable Secure Comput. 9(4), 610–625 (2012)
Phua, C., Smith-Miles, K., Lee, V., Gayler, R.: Resilient identity crime detection. IEEE Trans. Knowl. Data Eng. 24(3), 533–546 (2010)
Liou, F.M.: Fraudulent financial reporting detection and business failure prediction models: a comparison. Manage. Audit. J. (2008)
Lopez-Rojas, E.A., Axelsson, S.: Money laundering detection using synthetic data. In: The 27th annual workshop of the Swedish Artificial Intelligence Society (SAIS), Örebro; Sweden, 14–15 May 2012, no. 071, pp. 33–40. Linköping University Electronic Press, May 2012
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, pp. 6638–6648 (2018)
Leontjeva, A., Goldszmidt, M., Xie, Y., Yu, F., Abadi, M.: Early security classification of skype users via machine learning. In Proceedings of the 2013 ACM workshop on Artificial Intelligence and Security, pp. 35–44. ACM, November 2013
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection. Fundamenta Informaticae 101(4), 271–285 (2010)
Acknowledgements
This research was partly funded by the European Regional Development Funds via Archimedes Foundation (NUTIKAS programme).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Sequence-based Features Calculation. We have an assumption that sequences of customers’ transactions are not random and follow some hidden structure. Therefore we used a way to encode this information to the model by so-called generative log-odds features [17], where we estimate transaction probabilities between each transaction state separately for potentially illicit and non-illicit customers and then compare them. This approach allows us to capture the dynamics of the transaction history for our classification task while introducing less overhead than methods based on neural networks (e.g. Boltzman machines) or deep learning auto-encoders. In the log-odd feature extraction method, we want to generate features based on sequential probabilities. We are interested in the following probability:
where \(x_1,x_2,\ldots ,x_n\) are some discrete properties of transactions (i.e. direction). One particular way to estimate this probability is to use the chain rule:
In some cases, it is practically impossible, so we can simplify the assumptions using Markov property:
But for our task, we are more interested in finding that a particular set of transactions is more illicit than just a set of regular non-illicit transactions. Mathematically, we want to estimate:
One way to calculate this probability is to use Bayes theorem:
The only thing left is to calculate \( P(X \mid Y=y)\) and \( P(Y=y)\). \(P(X \mid Y=y)\) can be calculated using train set and then calculating transition probabilities separately for potentially illicit class and non-illicit class. For example, if there are only two states in a transaction sequence, namely in, out. All we need to estimate transition probabilities is to calculate
Similarly, for other combination of in, out we should do the same. \( P(Y=y)\) is the prior probability of being potentially illicit, which is simply a proportion of potentially illicit customers in a full customer set for train data. Finally, instead of outputting a binary label 1/0 (potentially illicit sequence or not), we can plug in this as a feature into a classifier along with other features. We can use so-called log-odds ratio instead of a binary feature, defining as:
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tertychnyi, P., Slobozhan, I., Ollikainen, M., Dumas, M. (2020). Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach. In: Clapham, B., Koch, JA. (eds) Enterprise Applications, Markets and Services in the Finance Industry. FinanceCom 2020. Lecture Notes in Business Information Processing, vol 401. Springer, Cham. https://doi.org/10.1007/978-3-030-64466-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-64466-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64465-9
Online ISBN: 978-3-030-64466-6
eBook Packages: Computer ScienceComputer Science (R0)