Customs fraud detection

Vanhoeyveld, Jellis; Martens, David; Peeters, Bruno

doi:10.1007/s10044-019-00852-w

Customs fraud detection

Assessing the value of behavioural and high-cardinality data under the imbalanced learning issue

Industrial and commercial application
Published: 30 October 2019

Volume 23, pages 1457–1477, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

1027 Accesses
15 Citations
Explore all metrics

Abstract

In this customs fraud detection application, we analyse a unique data set of 9,624,124 records resulting from a collaboration with the Belgian customs administration. They are faced with increasing levels of international trade, which pressurizes regulatory control. Governments therefore rely on data mining to focus their limited resources on the most likely fraud cases. The literature on data mining for customs fraud detection lacks in two main directions that are simultaneously addressed in this paper: (1) behavioural and high-cardinality data types are neglected due to a lack of methodology to include them. We demonstrate that such fine-grained features (e.g. the specific entities such as consignee, consignor and declarant and the commodities involved in a declaration) are very predictive. (2) Studies in the tax domain most often use standard learning algorithms on their fraud detection applications. However, customs data are highly imbalanced and this poses challenges for many inducers. We present a new EasyEnsemble method that integrates a support vector machine base learner in a confidence-rated boosting algorithm. This results in a fast and scalable learner that is able to drastically improve predictive performance over the base application of a support vector machine. The results of our proposed framework reveals high AUC and lift values that translate into an immediate impact on the customs fraud detection domain through an improved retrieval of tax losses and an enhanced deterrence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The effects of class rarity on the evaluation of supervised healthcare fraud detection models

Article Open access 28 February 2019

Matthew Herland, Richard A. Bauder & Taghi M. Khoshgoftaar

Data-Centric AI for Healthcare Fraud Detection

Article 11 May 2023

Justin M. Johnson & Taghi M. Khoshgoftaar

Credit Card Fraud Detection

Notes

Legitimacy should be interpreted broadly and is not limited to verifying whether the imposed duties are paid and transport documents are filled in correctly. It also means protecting the environment and society against imported harmful/dangerous goods (e.g. counterfeit goods of low quality).
Additionally, national taxes such as value-added tax (VAT) and excises related to international trade transactions can be collected by customs authorities.
The taxable amount is the customs value or the amount on which the tax is levied.
We refer to https://financien.belgium.be/nl/Statistieken_en_analysen/jaarverslag/cijfers/budget-ontvangsten/ontvangsten-aa-douane-en-accijnzen-1 for additional figures (also including VAT and excises).
Trade facilitation means a rapid clearance of customs goods to have a minimal impact on economic commerce.
Each member country can stipulate a number of additional national regulations.
There are two types of inspections: (1) physical cargo checks when the goods enter the territory (e.g. inspecting containers). (2) Post-clearance audits which entail checking the books and verifying trade documents (e.g. invoices, SAD declarations) for irregularities. Regarding the former, Belgian customs impose a 6 second rule for the automated processing of an article involved in a SAD declaration (online environment). The post-clearance audit checks can be conducted up to 4 years after the date of declaration.
There exists a third category of techniques, the semi-supervised approaches, that learn a discriminative boundary around the instances of a single class [5]. However, they do not seem to be applied in the area of customs fraud detection.
Methods at algorithmic level are also called cost-sensitive learning techniques.
TARIC extends the Combined Nomenclature (CN) and contains tariffs for each commodity according to its country of origin. The CN is a tool for the harmonized classification of goods within the EU and is a further development (with special EU-specific subdivisions) of the WCO’s Harmonized System Nomenclature (HSN) [16].
In this study, in accordance with Moeyersoms and Martens [36], an attribute is of high-cardinality in case it has more than 100 different categories.
This means that imported goods are released for free circulation and their associated customs duties are levied in one member state (i.e. Belgium), yet payment of VAT (and where applicable excise duties) is suspended because the import is directly followed by an intra-community supply of the goods to another member state (i.e. France). VAT (and excises) are due in the member state of final destination (i.e. France).
A forwarding agent (or freight forwarder) [51] is an entity that organizes the delivery of goods, without doing the actual transportation. He is responsible for choosing the carriers that deliver the goods in the most effective way in terms of transportation time and costs. Furthermore, he prepares the necessary documents (customs and insurance) and transport certificates. The forwarding agent acts as an intermediary in the logistics chain.
Extracted from http://ec.europa.eu/taxation_customs/dds2/taric/taric_consultation.jsp?Lang=en.
The representative and intra-community acquirer occur far less frequently. Also note that the identity of the consignor is unknown in an import declaration.
Each row therefore contains four ones. This time we consider the entities simultaneously which allows interaction effects to be revealed. In the case of high-cardinality variables, each attribute is treated separately. The main difference lies in the modelling.
The margin denotes the separation between the two classes (i.e. how far are the instances from both classes separated from the learned hyperplane?). Maximizing the margin coincides with minimizing the model complexity \(w^Tw/2\).
Choosing a too large value for regularization parameter C results in a learner that is too sensitive on the training data (overfitting) and fails to generalize for unseen data. On the other hand, a too small value for C means that large errors can occur for the training data and a too simple model is obtained (underfitting) that is unable to distinguish between both classes.
The LR component transforms the real-valued SVM scores \(w^T\varphi (x) + b\) to the range \([-1,+1]\) (as required for a confidence-rated boosting algorithm).
In Sect. 5.1.2, we will detail which part of the training data is effectively used for calculating the SR values.
The AUC corresponds to the probability that a positive instance (fraud) is ranked higher than a negative instance (compliant). The ranking is obtained by sorting the instances according to the output scores produced by the classifier.
Within each fold, the feature selection based on t statistic is computed on the training data. The set of ‘optimal’ features can therefore differ in each fold.
We make use of the standard MATLAB function fitctree, see https://nl.mathworks.com/help/stats/fitctree.html, which fits a classification decision tree making binary splits. Default parameter settings are adopted. The split criterion is a hyperparameter that can take on Gini’s diversity index or maximum deviance reduction (cross entropy). The MinLeafSize (minimum number of leaf node observations) is another hyperparameter that controls for overfitting. The following values were imposed: MinLeafSize\(= 2^z\), with \(z = [1;1.5;2;2.5;\ldots ;5.5]\).
The standard MATLAB function patternnet, see https://nl.mathworks.com/help/deeplearning/ref/patternnet.html, is used to construct a classification neural network with one hidden layer (sigmoid transfer function). Default parameter settings are adopted for the optimization algorithm (scaled conjugate gradient) and performance function (cross-entropy). The number of hidden neurons is a hyperparameter taking on values \([5; 10; 15; 20;\ldots ; 100]\). We trained the neural network, with a given number of hidden neurons, for 10 times on the training data and selected the one with the best validation set performance (a neural network converges to a local optimum).
Based on counting the wins/losses/draws in comparing several methods. For example, in comparing the F with the S version, a pair (F,S) is formed with the same data set (2 types), the same type of pre-processing (3 types) and the same final model (4 types). This leads to checking a total of 24 pairs.
In the case of OperID data, this is always the case. These are precisely the attributes with the highest cardinalities for which we expect stability issues to occur.

References

Agyemang M, Barker K, Alhajj R (2006) A comprehensive survey of numeric and symbolic outlier mining techniques. Intell Data Anal 10(6):521–538
Article Google Scholar
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Machine learning: ECML 2004: 15th European conference on machine learning, Pisa, Italy, September 20–24, 2004. Proceedings. Springer, Berlin, pp 39–50. https://doi.org/10.1007/978-3-540-30115-8_7
Baesens B, Gestel TV, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635. https://doi.org/10.1057/palgrave.jors.2601545
Article MATH Google Scholar
Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov 30(4):891–927. https://doi.org/10.1007/s10618-015-0444-8
Article MathSciNet Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58. https://doi.org/10.1145/1541880.1541882
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27::27
Article Google Scholar
Chawla NV (2005) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook. Springer, Boston, pp 853–867
Google Scholar
Closs DJ, McGarrel EF (2004) Enhancing security throughout the supply chain. IBM Center for the Business of Government. http://www.businessofgovernment.org/sites/default/files/Enhancing%20Security.pdf. Retrieved 2 Feb 2018
Cruz R, Fernandes K, Costa JFP, Ortiz MP, Cardoso JS (2018) Binary ranking for ordinal class imbalance. Pattern Anal Appl 21(4):931–939. https://doi.org/10.1007/s10044-018-0705-4
Article MathSciNet Google Scholar
De Cnudde S, Martens D, Evgeniou T, Provost F (2017) A benchmarking study of classification techniques for behavioral data. Working papers, University of Antwerp, Faculty of Applied Economics
Digiampietri LA, Roman NT, Meira LAA, Filho JJ, Ferreira CD, Kondo AA (2008) Uses of artificial intelligence in the Brazilian customs fraud detection system. In: Proceedings of the 2008 international conference on digital government research. Digital Government Society of North America, dg.o ’08, pp 181–187
Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection. In: Barbará D, Jajodia S (eds) Applications of data mining in computer security. Springer, Boston, pp 77–101. https://doi.org/10.1007/978-1-4615-0953-0_4
Chapter Google Scholar
European Commission (2003) Commission regulation (EC) no 2286/2003 of 18 December 2003 amending regulation (EEC) No 2454/93 laying down provisions for the implementation of council regulation (EEC) No 2913/92 establishing the community customs code. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02003R2286-20060101. Retrieved 14 Nov 2018
European Commission (2016) SAD guidance during the UCC transitional period. https://ec.europa.eu/taxation_customs/sites/taxation/files/guidance_transitional_sad_en.pdf. Retrieved 3 Feb 2018
European Commission (2018a) Athorised economic operator. https://ec.europa.eu/taxation_customs/general-information-customs/customs-security/authorised-economic-operator-aeo/authorised-economic-operator-aeo_en#what_is. Retrieved 7 July 2018
European Commission (2018b) The combined nomenclature. https://ec.europa.eu/taxation_customs/business/calculation-customs-duties/what-is-common-customs-tariff/combined-nomenclature_en. Retrieved 3 Feb 2018
European Commission (2018c) The single administrative document (SAD). https://ec.europa.eu/taxation_customs/business/customs-procedures/general-overview/single-administrative-document-sad_en. Retrieved 3 Feb 2018
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Article MathSciNet Google Scholar
Junqué de Fortuny E, Martens D, Provost F (2014a) Predictive modeling with big data: is bigger really better? Big Data 1(4):215–226. https://doi.org/10.1089/big.2013.0037
Article Google Scholar
Junqué de Fortuny E, Stankova M, Moeyersoms J, Minnaert B, Provost F, Martens D (2014b) Corporate residence fraud detection. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14. ACM, New York, NY, USA, pp 1650–1659. https://doi.org/10.1145/2623330.2623333
García E, Lozano F (2007) Boosting support vector machines. In: 5th international conference machine learning and data mining in pattern recognition, MLDM 2007, Leipzig, Germany, July 18–20, post proceedings. IBaI Publishing, pp 153–167
Goldstein M, Uchida S (2016) A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE 11(4):1–31. https://doi.org/10.1371/journal.pone.0152173
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Han CR, Ireland R (2014) Performance measurement of the KCS customs selectivity system. Risk Manag 16(1):25–43. https://doi.org/10.1057/rm.2014.2
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Kumar A, Nagadevara V (2006) Development of hybrid classification methodology for mining skewed data sets—a case study of indian customs data. IEEE Int Conf Comput Syst Appl 2006:584–591. https://doi.org/10.1109/AICCSA.2006.205149
Article Google Scholar
Kumar S, Biswas SK, Devi D (2018) TLUSBoost algorithm: a boosting solution for class imbalance problem. Soft Comput. https://doi.org/10.1007/s00500-018-3629-4
Article Google Scholar
Li Q, Mao Y (2014) A review of boosting methods for imbalanced data classification. Pattern Anal Appl 17(4):679–693. https://doi.org/10.1007/s10044-014-0392-8
Article MathSciNet MATH Google Scholar
Liu T (2009) Easyensemble and feature selection for imbalance data sets. In: 2009 international joint conference on bioinformatics, systems biology and intelligent computing, pp 517–520. https://doi.org/10.1109/IJCBS.2009.22
Liu W, Chawla S, Cieslak DA, Chawla NV (2010) A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the tenth SIAM international conference on data mining, SIAM, Philadelphia, vol 10, pp. 766–777
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B Cybern 39(2):539–550. https://doi.org/10.1109/TSMCB.2008.2007853
Article Google Scholar
Martens D, Provost F (2014) Explaining data-driven document classifications. MIS Q 38(1):73–100 10.25300/MISQ/2014/38.1.04
Article Google Scholar
Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2–3):427–436. https://doi.org/10.1016/j.neunet.2007.12.031
Article Google Scholar
Miguéis VL, Camanho AS, Borges J (2017) Predicting direct marketing response in banking: comparison of class imbalance methods. Serv Bus 11(4):831–849. https://doi.org/10.1007/s11628-016-0332-3
Article Google Scholar
Moeyersoms J, Martens D (2015) Including high-cardinality attributes in predictive models: a case study in churn prediction in the energy sector. Decis Support Syst 72:72–81. https://doi.org/10.1016/j.dss.2015.02.007
Article Google Scholar
Ngai E, Hu Y, Wong Y, Chen Y, Sun X (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50(3):559–569. https://doi.org/10.1016/j.dss.2010.08.006 (on quantitative methods for detection of financial fraud)
Article Google Scholar
Parvin H, Minaei-Bidgoli B, Alizadeh H (2011) Detection of cancer patients using an innovative method for learning at imbalanced datasets. In: Yao J, Ramanna S, Wang G, Suraj Z (eds) Rough sets and knowledge technology. Springer, Berlin, pp 376–381
Chapter Google Scholar
Perlich C, Provost F (2006) Distribution-based aggregation for relational learning with identifier attributes. Mach Learn 62(1):65–105. https://doi.org/10.1007/s10994-006-6064-1
Article Google Scholar
Port of Antwerp (2018) 2018 facts and figures. https://www.portofantwerp.com/sites/portofantwerp/files/POA_Facts_and_Figures_2018.pdf. Retrieved 14 Nov 2018
Pozzolo AD, Caelen O, Borgne YAL, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928. https://doi.org/10.1016/j.eswa.2014.02.026
Article Google Scholar
Provost F, Fawcett T (2013) Data science for business: what you need to know about data mining and data-analytic thinking. O’Reilly Media Inc, Sebastopol
Google Scholar
Rad HA, Arash S, Rahbar F, Rahmani R, Heshmati Z, Fard MM (2015) A novel unsupervised classification method for customs fraud detection. Indian. J Sci Technol 8(35):1–7. https://doi.org/10.17485/ijst/2015/v8i35/87306
Article Google Scholar
Ravisankar P, Ravi V, Raghava Rao G, Bose I (2011) Detection of financial statement fraud and feature selection using data mining techniques. Decis Support Syst 50(2):491–500. https://doi.org/10.1016/j.dss.2010.11.006
Article Google Scholar
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336. https://doi.org/10.1023/A:1007614523901
Article MATH Google Scholar
Shao H, Zhao H, Chang GR (2002) Applying data mining to detect fraud behavior in customs declaration. In: Proceedings international conference on machine learning and cybernetics, vol 3, pp 1241–1244. https://doi.org/10.1109/ICMLC.2002.1167400
Shmueli G (2017) Analyzing behavioral big data: methodological, practical, ethical, and moral issues. Qual Eng 29(1):57–74. https://doi.org/10.1080/08982112.2016.1210979
Article MathSciNet Google Scholar
Singh AK, Sahu R, Ujjwal K (2003) Decision support system in customs assessment to detect valuation frauds. In: Engineering management conference, 2003. IEMC ’03. Managing technologically driven organizations: the human side of innovation and change, pp 546–550. https://doi.org/10.1109/IEMC.2003.1252333
Stankova M, Martens D, Provost F (2015) Classification over bipartite graphs through projection. Working papers 2015001, University of Antwerp, Faculty of Applied Economics
Suykens JA, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J, Suykens J, Van Gestel T (2002) Least squares support vector machines. World Scientific, Singapore
Book Google Scholar
Universal Cargo (2018) What does a freight forwarder do and do you need one? https://www.universalcargo.com/what-does-a-freight-forwarder-do-do-you-need-one/. Retrieved 14 Nov 2018
Vanhoeyveld J, Martens D (2018) Imbalanced classification in sparse and large behaviour datasets. Data Min Knowl Discov 32(1):25–82. https://doi.org/10.1007/s10618-017-0517-y
Article MathSciNet MATH Google Scholar
West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57(Supplement C):47–66. https://doi.org/10.1016/j.cose.2015.09.005
Article Google Scholar
Wickramaratna J, Holden SB, Buxton BF (2001) Performance degradation in boosting. In: Proceedings of the second international workshop on multiple classifier systems, MCS ’01. Springer, London, UK, pp 11–21
Yaqin W, Yuming S (2010) Classification model based on association rules in customs risk management application. In: 2010 international conference on intelligent system design and engineering application, vol 1, pp 436–439. https://doi.org/10.1109/ISDEA.2010.276
Yuan B, Ma X (2012) Sampling + reweighting: Boosting the performance of adaboost on imbalanced datasets. In: The 2012 international joint conference on neural networks (IJCNN), pp 1–6
Zdravevski E, Lameski P, Kulakov A (2011) Weight of evidence as a tool for attribute transformation in the preprocessing stage of supervised learning algorithms. In: The 2011 international joint conference on neural networks, pp 181–188. https://doi.org/10.1109/IJCNN.2011.6033219
Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recogn 48(10):3102–3112. https://doi.org/10.1016/j.patcog.2014.12.016 (discriminative Feature Learning from Big Data for Visual Recognition)
Article Google Scholar
Zheng M, Zhou C, Wu J, Pan S, Shi J, Guo L (2018) Fraudne: a joint embedding approach for fraud detection. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

Download references

Acknowledgements

The authors would like to thank the Belgian Federal Public Service Finance division Customs and Excise for the provision of the data sets and their involvement throughout the project. The models described in this paper are not necessarily the ones used by the Belgian customs administration. Funding was provided by University of Antwerp (Grant No. DOCPRO4/Antigoon PS-IDnr. 29648).

Author information

Authors and Affiliations

Department of Engineering Management, University of Antwerp, Prinsstraat 13, Antwerp, Belgium
Jellis Vanhoeyveld & David Martens
Faculty of Law, University of Antwerp, Venusstraat 23, Antwerp, Belgium
Bruno Peeters

Authors

Jellis Vanhoeyveld
View author publications
You can also search for this author in PubMed Google Scholar
David Martens
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Peeters
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jellis Vanhoeyveld.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Single administrative document form

A blank SAD declaration form [13] is provided in Fig. 6. In Belgium, customs declarations are filed electronically by means of the PaperLess Douane en Accijnzen (PLDA) application.

Appendix 2: AdaBoost

Algorithm 1 presents the underlying AB boosting process for the EE technique that we have presented in Sect. 4.2.3.

Appendix 3: Data ensembles

Table 7 presents the results shown in Fig. 5 in a tabular format.

Table 7 Predictive performances for each data source individually and their combination

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vanhoeyveld, J., Martens, D. & Peeters, B. Customs fraud detection. Pattern Anal Applic 23, 1457–1477 (2020). https://doi.org/10.1007/s10044-019-00852-w

Download citation

Received: 11 December 2018
Accepted: 11 October 2019
Published: 30 October 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10044-019-00852-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Customs fraud detection

Abstract

Access this article

Similar content being viewed by others

The effects of class rarity on the evaluation of supervised healthcare fraud detection models

Data-Centric AI for Healthcare Fraud Detection

Credit Card Fraud Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Single administrative document form

Appendix 2: AdaBoost

Appendix 3: Data ensembles

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Customs fraud detection

Abstract

Access this article

Similar content being viewed by others

The effects of class rarity on the evaluation of supervised healthcare fraud detection models

Data-Centric AI for Healthcare Fraud Detection

Credit Card Fraud Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Single administrative document form

Appendix 2: AdaBoost

Appendix 3: Data ensembles

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation