Abstract
Money laundering has been affecting the global economy for many years. Large sums of money are laundered every year, posing a threat to the global economy and its security. Money laundering encompasses illegal activities that are used to make illegally acquired funds appear legal and legitimate. This paper aims to provide a comprehensive survey of machine learning algorithms and methods applied to detect suspicious transactions. In particular, solutions of anti-money laundering typologies, link analysis, behavioural modelling, risk scoring, anomaly detection, and geographic capability have been identified and analysed. Key steps of data preparation, data transformation, and data analytics techniques have been discussed; existing machine learning algorithms and methods described in the literature have been categorised, summarised, and compared. Finally, what techniques were lacking or under-addressed in the existing research has been elaborated with the purpose of pinpointing future research directions.
Similar content being viewed by others
References
Kou Y, Lu C-TT, Sirwongwattana S, Huang YP, Sinvongwattana S (2004) Survey of fraud detection techniques. In: 2004 IEEE international conference on networking sensing and control, vol 2(3), pp 749–754
Huang JY (2015) Effectiveness of US anti-money laundering regulations and HSBC case study. J Money Laund Control 18(4):525–532
Mollenkamp C, Wolf B (n.d.) HSBC to pay record $1.9 billion US fine in money laundering case. Accessed on 29 Dec 2016 (Online). http://uk.reuters.com/article/2012/12/11/uk-hsbc-probe-idUKBRE8BA05K20121211
Claudio G, John E (n.d.) Iranian dealings lead to a fine for credit suisse. Accessed on 29 Dec 2016 (Online). http://www.nytimes.com/2009/12/16/business/16bank.html?_r=0
Harry W (n.d.) Major banks still vulnerable to money laundering, says top regulator. Accessed on 29 Dec 2016 (Online). http://www.telegraph.co.uk/finance/newsbysector/banksandfinance/10 153728/Major-banks-still-vulnerable-to-money-laundering-says-top-regulator.html
Reed A (n.d.) ING fined a record amount. Accessed on 29 Dec 2016 (Online). http://online.wsj.com/news/articles/SB1000142405275045774625127133363F78
Standard Bank Fined Over Lax Anti-Money Laundering Controls (n.d.). Accessed on 29 Dec 2016 (Online). http://www.bbc.com/news/business-25864499
Michael K (n.d.) FCA fines standard bank £7.6m for slack anti-money laundering controls. Accessed on 29 Dec 2016 (Online). http://www.ibtimes.co.uk/fca-fines-standard-bank-7-6m-slack-anti-money-laundering-controls-1433478
Gao S, Xu D, Wang H, Green P (2009) Knowledge based anti money laundering: a software agent bank application. J Knowl Manag 13(2):63–75
Verhage A (2009) Between the hammer and the anvil? the anti-money laundering-complex and its interactions with the compliance industry. Crime Law Soc Change 52(1):9–32
Cahill MH, Lambert D, Pinheiro JC, Sun DX (2002) Detecting fraud in the real world. Handbook of massive data sets. Springer, Berlin, pp 911–929
Gao Z, Ye M (2007) A framework for data mining-based anti-money laundering research. J Money Laund Control 10(2):170–179
Arquilla J, Ronfeldt D (2002) Networks and netwars. In: The future of terror, crime and militancy, pp 80–82
Liu X, Zhang P (2010) A scan statistics based suspicious transactions detection model for anti-money laundering (AML) in financial institutions. In: Proceedings—2010 international conference on multimedia communications, Mediacom, pp 210–213
Sudjianto A, Nair S, Yuan M, Zhang A, Kern D, Cela-Díaz F, Cela F (2010) Statistical methods for fighting financial crimes. Technometrics 52(1):5–19
Yue D, Wu X, Wang Y, Li Y, Chu CH (2007) A review of data mining-based financial fraud detection research. 2007 international conference on wireless communications, networking and mobile computing, WiCOM 2007, pp 5514–5517
Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge
Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Disc 1(3):291–316
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58
Zhang ZM, Salerno JJ, Yu PS (2003) Applying data mining in investigating money laundering crimes. ACM, New York, pp 24–27
Mannes J (n.d.) Another salesforce acquisition with beyondcore enterprise analytics grab. Accessed on 5 Jan 2017 (Online). https://techcrunch.com/2016/08/15/another-salesforce-acquisition-with-beyondcore-enterprise-analytics-grab/
Institute S (2008) SAS/STAT(R) 9.1 user’s guide: the REG procedure (book excerpt). SAS Institute, Cary
Kepes B (n.d.) More vertical analytics solutions—INETCO goes analytical on ATM data. Accessed on 10 August 2017 (Online). https://www.forbes.com/sites/benkepes/2015/01/15/more-vertical-analytics-solutions-inetco-goes-analytical-on-atm-data/#5755cb4f469c
Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17(5–6):375–381
Schmidt A (2013) Know your customer (technology abstract). Technical report, The Corporate Executive Board Company
Chen Z, Van Khoa LD, Nazir A, Teoh EN, Karupiah EK (2014) Exploration of the effectiveness of expectation maximization algorithm for suspicious transaction detection in anti-money laundering. ICOS 2014–2014 IEEE conference on open systems, pp 145–149
Le-Khac NA, Markos S, Kechadi MT (2010) Towards a new data mining-based approach for anti-money laundering in an international investment bank. In: Lecture notes of the institute for computer sciences, social-informatics and telecommunications engineering, vol 31 LNICST, pp 77–84
Donders AR, van der Heijden GJ, Theo S, Karel GM (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091
Brown ML, John FK (2003) Data mining and the impact of missing data. Ind Manag Data Syst 3(71–81):611–621
Garfinkel SL (2006) Forensic feature extraction and cross-drive analysis. Dig Investig 3:71–81
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Mass storage systems and technologies (MSST), Incline Village
Russell SJ, Norvig P (2002) Artificial intelligence: a modern approach, 2nd edn. Prentice Hall, Upper Saddle River
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
PWC (n.d.) PWC. Accessed on 5 Jan 2017 (Online). http://www.pwc.com/gx/en/financial-services/publications/anti-money-laundering-know-your-customer-quick-reference-guide.jhtml
Schmidt A (2013) Anti-money laundering: technology analysis abstract. Technical report
Zadeh L (1965) Fuzzy sets. Inf Control 8(3):338–353
Babuska R (1998) Fuzzy modeling for control, 1st edn. Kluwer, Norwell
Mamdani EH, Assilian S (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud 7(1):1–13
Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern SMC–15(1):116–132
Chen Y-T, Mathe J (2011) Fuzzy computing applications for anti-money laundering and distributed storage system load monitoring. In: World conference on soft computing
Ishibuchi H, Nojima Y (2006) Tradeoff between accuracy and rule length in fuzzy rule-based classification systems for high-dimensional problems. In: 11th international conference on information processing and management of uncertainty in knowledge-based systems
Ishibuchi H, Nojima Y (2007) Analysis of interpretability–accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int J Approx Reason 44(1):4–31
Sudkamp T, Hammell RJ (1998) Scalability in fuzzy rule-based learning. Inf Sci 109(1–4):135–147
Luo X (2014) Suspicious transaction detection for anti-money laundering. Int J Secur Appl 8(2):157–166
Han J, Hei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD international conference on management of data, Dallas
Grahne ZJ (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM 2003 workshop on frequent itemset mining implementations, Melbourne
Grahne ZJ (2005) Fast algorithms for frequent itemset mining using fptrees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613
Pengcheng W, Dietterich TG (2004) Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st international conference on machine learning, ICML’04, p 110
Gabrilovich E, Markovitch S (2004) Text categorization with many redundant features: using aggressive feature selection to make SVMS competitive with c4.5. In: Proceedings of the 21st international conference on machine learning, ICML’04. ACM, New York, p 41
Segata N, Blanzieri E (2011) Operators for transforming kernels into quasi-local kernels that improve SVM accuracy. J Intell Inf Syst 37(2):155–186
Chen Z, Olugbenro O, Seng NLC (2016) Equipment failure analysis for oil and gas industry with an ensemble predictive model. In: The 5th international conference on computer science and computational mathematics, ICCSCM 06
Gonzalez JL, Marcelin-Jimenez R (2011) Phoenix: a fault-tolerant distributed web storage based on URLs. In: 2011 IEEE 9th international symposium on parallel and distributed processing with applications, pp 282–287
Habib Soliman M, Jugal K (2010) Scalable biomedical named entity recognition and investigation of a database and supported svm approach. Int J Bioinform Res Appl 6(2):191–208
Joachims T (2006) Training linear SVMS in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06. ACM, New York, pp 217–226
Liu K, Yu T (2011) An improved support-vector network model for anti-money laundering. In: Management of e-commerce and e-government (ICMeCG), Hubei
Tang J, Yin J (2005) Developing an intelligent data discriminating system of anti-money laundering based on SVM. In: International conference on machine learning and cybernetics, Guangzhou
Lv L-T, Ji N, Zhang J-L (2008) A RBF neural network model for anti-money laundering. In: Wavelet analysis and pattern recognition, Hong Kong
Hwang Y-S, Bang S-Y (1994) A neural network model APC-III and its application to unconstrained handwritten digit recognition. In: International conference on neural information processing
Cao DK, Do P (2002) Applying data mining in money laundering detection for the vietnamese banking industry. In: 4th Asian conference on intelligent information and database systems
Yang Y, Guan X, You J (2002) CLOPE: a fast and effective clustering algorithm for transactional data. In: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining, Alberta
Le-Khac N-A, Kechadi M-T (2010) Application of data mining for anti-money laundering detection: a case study. In: Data mining workshops (ICDMW), Sydney
Liu R, Qian X-L, Mao S, Zhu S-Z (2011) Research on anti-money laundering based on core decision tree algorithm. In: Chinese control and decision conference (CCDC), Mianyang
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Record 25(2):103–114
Paula EL, Ladeira M, Carvalho RN, Marzagão T (2016) Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering. In: IEEE international conference on machine learning and applications (ICMLA), Anaheim
Spring R, Shrivastava A (2017) Scalable and sustainable deep learning via randomized hashing. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’17. ACM, New York, pp 445–454
Dreżewski R, Sepielak J, Filipkowski W (2012) System supporting money laundering detection. Dig Investig 9(1):8–21
Dreżewski R, Sepielak J, Filipkowski W (2014) The application of social network analysis algorithms in a system supporting money laundering detection. Inf Sci 295:18–32
Colladon AF, Remondi E (2017) Using social network analysis to prevent money laundering. Expert Syst Appl 67:49–87
Demetis DS (2010) The risk-based approach and a risk-based data-mining application. In: Technology and anti-money laundering: a systems theory and risk-based approach. Edward Elgar Publishing, Cheltenham. https://doi.org/10.4337/9781849806657
Chitra K, Subashini B (2013) Data mining techniques and its applications in banking sector. Int J Emerg Technol Adv Eng 3(8):219–226
Zhu T (2006) Suspicious financial transaction detection based on empirical mode decomposition method. In: IEEE Asia-Pacific conference on services computing, Guangzhou
Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
Loh W-Y (2009) Improving the precision of classification trees. Ann Appl Stat 3(4):1710–1737
Kamber M, Winstone L, Gong W, Cheng S, Han J (1997) Generalization and decision tree induction: efficient classification in data mining. In: Proceedings 7th international workshop on research issues in data engineering. High performance database management for large-scale applications, pp 111–120
Sudhakar M, Reddy CVK, Pradesh A (2016) Two step credit risk assesment model for retail bank loan applications using decision tree data. Int J Adv Res Comput Eng Technol (IJARCET) 5(3):705–718
Wang S-N, Yang J-G (2007) A money laundering risk evaluation method based on decision tree. In: Machine learning and cybernetics, Hong Kong
Rojas L, Alonso E, Axelson S (2012) Multi agent based simulation (MABS) of financial transactions for anti money laundering (AML). In: The 17th Nordic conference on secure IT system, volume: short-paper proceedings
Rojas L, Alonso E, Axelson S (2012) Money laundering detection using synthetic data. In: The 27th annual workshop of the Swedish artificial intelligence society (SAIS), Karlskrona
Liu X, Zhang P, Zeng D (2008) Sequence matching for suspicious activity detection in anti-money laundering. In: Intelligence and security informatics, Taipei, pp 50–61
Larik AS, Haider S (2011) Clustering based anomalous transaction reporting. In: Procedia computer science, Pakistan
Vikas J, Balan RS (2016) Money laundering regulatory risk evaluation using bitmap index-based decision tree. J Assoc Arab Univ Basic Appl Sci 23:96–102
Cortinas R, Freiling FC, Ghajar-Azadanlou M, Lafuente A, Larrea M, Penso LD, Soraluze I (2012) Secure failure detection and consensus in trustedpals. IEEE Trans Dependable Secure Comput 9(4):610–625
Phua C, Smith-Miles K, Lee V, Gayler R (2012) Resilient identity crime detection. IEEE Trans Knowl Data Eng 24(3):533–546
Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Wang X, Dong G (2009) Research on money laundering detection based on improved minimum spanning tree clustering and its application. In: 2nd international symposium on knowledge acquisition and modeling
Raza S, Haider S (2010) Suspicious activity reporting using dynamic Bayesian networks. Procedia Computer Science
Yang Q, Feng B, Song P (2007) Study on anti-money laundering service system of online payment based on union-bank mode. In: Wireless communications, networking and mobile computing, Shanghai
Tang J (2006) A peer dataset comparison outlier detection model applied to financial surveillance. In: Pattern recognition, Hong Kong
Kim Y, Sohn SY (2012) Stock fraud detection using peer group analysis. Expert Syst Appl 39(10):8986–8992
Weston DJ, Hand DJ, Adams NM, Whitrow C, Juszczak P (2008) Plastic card fraud detection using peer group analysis. Adv Data Anal Classif 2(1):45–62
Kingdon J (2004) AI fights money laundering. Intell Syst 19(3):87–89
NiceActimize (2009) Fortent is now part of NICE actimize. Accessed on 5 Jan 2017 (Online). http://www.niceactimize.com/index.aspx?page=fortent
NiceActimize (n.d.) Nice actimize: top-5 US Bank using fraud prevention solution from actimize, a NICE Company, detects 73% of wire fraud attempts in real-time and realizes 100% ROI on 7-digit investment within six weeks. Accessed on 5 Jan 2017 (Online). http://www.prnewswire.com/news-releases/top-5-us-bank-using-fraud-prevention-solution-from-actimize-a-nice-company-detects-73-of-wire-fraud-attempts-in-real-time-and-realizes-100-roi-on-7-digit-investment-within-six-weeks-62237467.html
Fulton S (n.d.) Logica announces new intelligent self-learning software to increase banks’ filtering systems efficiency. Accessed on 5 Jan 2017 (Online). http://www.marketwired.com/press-release/logica-announces-new-intelligent-self-learning-software-increase-banks-filtering-systems-1688140.htm
Ramentol E, Caballero Y, Bello R, Herrera F (2012) SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl Inf Syst 33(2):245–265
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36
Acknowledgements
This work was supported by a 3rd Called Collaboration with Public Universities and Agencies grant from the University of Nottingham, Malaysia Campus with Project No. UNHT0001.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, Z., Van Khoa, L.D., Teoh, E.N. et al. Machine learning techniques for anti-money laundering (AML) solutions in suspicious transaction detection: a review. Knowl Inf Syst 57, 245–285 (2018). https://doi.org/10.1007/s10115-017-1144-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1144-z