Role Recognition of Illegal Online Gambling Participants Using Monetary Transaction Data

Han, Xiaohui; Wang, Lianhai; Xu, Shujiang; Zhao, Dawei; Liu, Guangqi

doi:10.1007/978-3-030-01950-1_34

Xiaohui Han²¹,
Lianhai Wang²¹,
Shujiang Xu²¹,
Dawei Zhao²¹ &
…
Guangqi Liu²¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11149))

Included in the following conference series:

International Conference on Information and Communications Security

1613 Accesses

Abstract

Online gambling has become a substantial global industry during the past two decades. However, it is explicitly prohibited or restricted by most countries in the world due to social problems caused by it. This results in rapid expansion of the illegal online gambling (IOG) market where players profits are under little protection. To fight against IOG, this paper addresses the IOG participant-role recognition (PRR) problem by learning a supervised classifier with monetary transaction data. We propose two sets of features, i.e., transaction statistical features and network structural features, to effectively represent participants. Based on the feature representation, we adopt an ensemble learning strategy in the training phase of a PRR classifier to reduce the impact of unbalanced data. Results of experiments performed on real-world IOG case data demonstrate the feasibility and validity of the proposed approach. The proposed approach could help investigators in a law enforcement agency find the key members of an IOG organization quickly and destroy the ecosystem efficiently.

You have full access to this open access chapter, Download conference paper PDF

Anti-money Laundering (AML) Research: A System for Identification and Multi-classification

Predicting self-exclusion among online gamblers: An empirical real-world study

Article Open access 10 August 2022

An individual-group-merchant relation model for identifying fake online reviews: an empirical study on a Chinese e-commerce platform

Article 14 May 2018

Keywords

1 Introduction

Online gambling, which was initiated in the mid-1990s, has exploded from a minor sideshow on the Internet into a substantial global industry over the past two decades. Due to factors such as convenience, accessibility, affordability, anonymity, and interactivity, online gambling could be potentially more tempting and addictive to consumers than traditional offline gambling. However, several recent studies suggest that it would exacerbate gambling problems in society, such as problem gambling, pathological gambling, and underage gambling [6, 8]. Consequently, a number of countries, including Mainland China, the United States, and Russia, explicitly prohibit most or all forms of online gambling. In the most remaining countries, online gambling is under strict regulations.

The prohibition and restriction of online gambling result in rapid expansion of the illegal online gambling (IOG) market where players’ profits are under little protection. Several studies allege that IOG sites victimize participants rather than benefit them [1,2,3,4]. More seriously, it is reported that the IOG business is correlated with money laundering and other complex fraudulent and extortionist activities that, in turn, can fund yet other criminal activities [3]. Although states have adopted legal measures targeted at both consumers and operators to limit citizens’ access to IOG sites. Yet the effectiveness of such measures may well be limited.

To fight against IOG, it is necessary to gain sufficient knowledge regarding how these businesses operate in an environment characterized by extreme uncertainty and high risk. This would require a systematic analysis of sophisticated organizational structures formed by IOG participants. However, to the best of our knowledge, fairly few researches have been carried out on automatic techniques for analyzing IOG ecosystems. This study addresses this absence by investigating the problem of automatically recognizing the roles that IOG participants played in their ecosystem. The proposed approach could help investigators in a law enforcement agency (LEA) find the key members of an IOG organization quickly and destroy the ecosystem efficiently. To be specific, this study makes the following contributions:

We present a novel participant-role recognition (PRR) approach which learns a supervised classifier based on monetary transaction data to predict the roles of IOG participants.
We propose two sets of features, i.e. transaction statistical features and network structural features, to effectively represent participants.
We adopt an ensemble learning strategy in the training phase of the PRR classifier to reduce the impact of unbalanced training data.
We evaluate the performance of the proposed approach using real-world IOG monetary transaction data. Experimental results demonstrate the feasibility and validity of the proposed approach.

The remainder of this paper is organized as follows: Sect. 2 reviews related work. Section 3 provides preliminaries related to this paper. Section 4 describes the details of the proposed approach. Section 5 gives the experimental results and analysis. Section 6 concludes the content of this paper.

2 Related Work

Prosperity on Internet gambling has drawn much attention from academics [11]. A number of studies have investigated possible precipitating factors for the expansion of online gambling. For example, Gainsbury et al. [6] claimed that technological innovations, including the availability of cheap, fast broadband connections in essentially any location, the emergence of mobile technology, and the use of trustable online payment systems, have played an important role in the growth of online gambling. They also found that the primary reasons people gave for preferring to gambling were: ease of access, convenience, comfort, greater privacy, and anonymity. Similar motivations for online gambling have been found in [14]. Several pieces of research studied demographic profiles of online gamblers and found that Internet gamblers were significantly more likely to be male gender, younger age, from higher socio-economic strata, employed full time, more technologically savvy, having more positive attitudes toward gambling, and better educated [6, 8, 9, 12, 13]. However, it was reported more women and young people are engaging in the activity [7].

Many recent studies reported online gambling is more harmful to gamblers compared to terrestrial gambling [6, 8, 9]. Wood and William [13] found online gamblers are under a higher probability of using drugs and alcohol than non-online gamblers. Due to the fragmented nature of governance, online gambling also presents a significant opportunity for crime and victimization. McMullan and Rege [10] investigated the types, techniques, and organizational dynamics of crime at portals of online gambling sites using document analysis based on data retrieved with the Google search engine.

In jurisdictions, such as the United States and Mainland China, for example, the prohibition of gambling has given rise to illegal online gambling business. Banks [3] summarized three principal forms of the illegal gambling business, i.e., accepting bets from a resident in a country where gambling is illegal, operating without appropriate licenses, and accepting bets from underage gamblers. There are numerous documented cases in which providers of illegal online gambling have been found to cheat customers on payouts, have apparently not paid winnings, have cheated players with unfair games, or have absconded with player deposits [1,2,3,4]. As claimed by McMullan and Rege [10], these cases identified by existing studies are likely to represent “the tip of the iceberg”, with many more crimes going unreported and unrecorded.

Based on our review of the related work, we have found that although IOG has been studied for years, most existing researches only focused on types of crime related to IOG. There continues to be a paucity of research on the systematic analysis of sophisticated organizational structures IOG ecosystems. Especially, few automatic techniques have been proposed to analyze IOG ecosystems. In this study, we address this absence by presenting an approach to automatically recognize the roles that IOG participants played in their ecosystem.

3 Preliminaries

3.1 The IOG Ecosystem and Participant Roles

Figure 1 shows the typical structure of an IOG ecosystem, which is like a pyramid. A small number of investors provide funds for building online gambling platforms in countries/regions where online gambling is legal. Then these investors seek local organizers in a country/region, where online gambling is forbidden, to run the IOG business in that country/region. Local organizers employ gambling agents to attract gamblers to participate in IOG games. Gambling agents collect bets from gamblers and distribute IOG proceeds among participants. Gamblers can only participate in IOG games by transferring their bets to gambling agents. Participant roles in a higher level of the pyramid structure will keep interest at a definite ratio from the proceeds they collected from the direct lower level role. Based on the understanding of the IOG ecosystem, we aim to identify IOG participants into one of the four roles, i.e., investor, local organizer, gambling agent, and gambler.

3.2 Problem Statement

The participant-role recognition (PRR) problem can be considered as a supervised classification problem. We give a formal definition of PRR as follows.

Definition 1

Given a set of K pre-defined roles $\mathcal {R}=\{r_{1}, r_{2}, \cdots , r_{K}\}$ and a set of IOG participants $\mathcal {P}=\{p_{1}, p_{2}, \dots , p_{M}\}$, the task of participant-role recognition is to find a function $f(\cdot )$ to decide if a participant $p_{i}$’s role is $r_{j}$, i.e., $f:\mathcal {P}\times \mathcal {R}\mapsto \{0,1\}$ such that for any pair $(p_{i}, r_{j})\in \mathcal {P}\times \mathcal {R}$, we have

$$\begin{aligned} f(p_{i}, r_{j})= {\left\{ \begin{array}{ll} 1, &{} \textit{if } p_{i}{\text {'}}{\textit{s role is }}r_{j}\\ 0, &{} otherwise \end{array}\right. }. \end{aligned}$$

(1)

For the i-th participant, we represent them as a n-dimensional feature vector $p_{i}=[p_{i1},p_{i2},\cdots ,p_{in}]^{\mathrm T}$, where $p_{i}\in \mathbb {R}^{n}$. By this definition, solving the PRR problem involves extracting discriminative features as well as finding an appropriate classification scheme. We present our solution in the following section.

4 The Proposed Approach

Figure 2 provides an illustration of our PRR approach. We first compute transaction statistics and build a money flow network based on monetary transaction data obtained from an IOG ecosystem. Then we extract transaction statistical features and network structural features to form a vector representation for each participant sample. Based on the extracted features, we further learn a classifier to predict the role of new participant samples. This section describes each component of the approach in detail.

4.1 Monetary Transaction Data

Monetary transaction data (MTD), which records the monetary transactions between IOG participants, could be obtained from bank accounts of IOG participants or derived from IOG servers. Each transaction record typically consists of information including the sender, the recipient, the amount of money transferred, and a time stamp. MTD contains very useful information for understating an IOG ecosystem. However, it is usually difficult to access MTD due to privacy reasons. The MTD used in this study was provided by a law enforcement agency. They obtained the data with warrants during the investigation of an IOG case.

4.2 Feature Extraction

We propose two sets of features to represent IOG participants in the learning process of a PRR classifier.

Transaction Statistical Features. We assume that participants in different roles have discriminative money transfer patterns which can be captured by statistics extracted from MTD. Here, we first introduce some notations used in the computation of transaction statistical features. Let M be the number of participants in an IOG ecosystem, and let $N_{D}$ be the number of days covered by the MTD. We denote $\mathbf {A}^{(in)}$, $\mathbf {A}^{(out)}$, $\mathbf {F}^{(in)}$, $\mathbf {F}^{(out)}$, $\mathbf {C}^{(in)}$, and $\mathbf {C}^{(out)}$ as $M\times N_{D}$ dimensional matrices, where $\mathbf {A}^{(in)}_{ij}$ ($\mathbf {A}^{(out)}_{ij}$) is the amount of money participant $p_{i}$ received (sent out) on the j-th day, $\mathbf {F}^{(in)}_{ij}$ ($\mathbf {F}^{(out)}_{ij}$) is the number of transfers $p_{i}$ received (sent out) on the j-th day, and $\mathbf {C}^{(in)}_{ij}$ ($\mathbf {C}^{(out)}_{ij}$) is the number of incoming (outgoing) counterparties of $p_{i}$ on the j-th day, respectively. To capture money transfer patterns of IOG participants, we compute the following features.

Means of Daily Income ($\hat{\mu }^{(ain)}_{i}$) and Expenditure ($\hat{\mu }^{(aout)}_{i}$) measure the average amounts of money participant $p_{i}$ receives from and sends to other participants daily, which are computed by the following equations:
$$\begin{aligned} \hat{\mu }^{(ain)}_{i}=\frac{1}{N_{D}}\sum ^{N_{D}}_{j=1}\mathbf {A}^{(in)}_{ij}, \end{aligned}$$
(2)

$$\begin{aligned} \hat{\mu }^{(aout)}_{i}=\frac{1}{N_{D}}\sum ^{N_{D}}_{j=1}\mathbf {A}^{(out)}_{ij}. \end{aligned}$$
(3)
Variances of Daily Income ($\hat{\sigma }^{(ain)}_{i}$) and Expenditure ($\hat{\sigma }^{(aout)}_{i}$) measure how far the daily incomes and expenditures of participant $p_{i}$ are spread out from the mean values, which are computed by the following equations:
$$\begin{aligned} \hat{\sigma }^{(ain)}_{i}= & {} \frac{1}{N_{D}-1}\sum ^{N_{D}}_{j=1}[\mathbf {A}^{(in)}_{ij}-\hat{\mu }^{(ain)}_{i}]^{2}, \end{aligned}$$
(4)

$$\begin{aligned} \hat{\sigma }^{(aout)}_{i}= & {} \frac{1}{N_{D}-1}\sum ^{N_{D}}_{j=1}[\mathbf {A}^{(out)}_{ij}-\hat{\mu }^{(aout)}_{i}]^{2}. \end{aligned}$$
(5)
Means of Daily Incoming ($\hat{\mu }^{(fin)}_{i}$) and Outgoing Transfer Numbers ($\hat{\mu }^{(fout)}_{i}$) measure the average numbers of daily transfers participant $p_{i}$ receives and sends out, which are computed by the following equations:
$$\begin{aligned} \hat{\mu }^{(fin)}_{i}= & {} \frac{1}{N_{D}}\sum ^{N_{D}}_{j=1}\mathbf {F}^{(in)}_{ij}, \end{aligned}$$
(6)

$$\begin{aligned} \hat{\mu }^{(fout)}_{i}= & {} \frac{1}{N_{D}}\sum ^{N_{D}}_{j=1}\mathbf {F}^{(out)}_{ij}. \end{aligned}$$
(7)
Variances of Daily Incoming ($\hat{\sigma }^{(fin)}_{i}$) and Outgoing Transfer Numbers ($\hat{\sigma }^{(fout)}_{i}$) measure how far $p_{i}$’s daily numbers of incoming and outgoing transfers are spread out from their mean values, which are computed by the following equations:
$$\begin{aligned} \hat{\sigma }^{(fin)}_{i}= & {} \frac{1}{N_{D}-1}\sum ^{N_{D}}_{j=1}[\mathbf {F}^{(in)}_{ij}-\hat{\mu }^{(fin)}_{i}]^{2}, \end{aligned}$$
(8)

$$\begin{aligned} \hat{\sigma }^{(fout)}_{i}= & {} \frac{1}{N_{D}-1}\sum ^{N_{D}}_{j=1}[\mathbf {F}^{(out)}_{ij}-\hat{\mu }^{(fout)}_{i}]^{2}. \end{aligned}$$
(9)
Means of Daily Incoming ($\hat{\mu }^{(cin)}_{i}$) and Outgoing Counterparty Numbers ($\hat{\mu }^{(cout)}_{i}$) measure $p_{i}$’s average numbers of daily incoming and outgoing counterparties, which are computed by the following equations:
$$\begin{aligned} \hat{\mu }^{(cin)}_{i}= & {} \frac{1}{N_{D}}\sum ^{N_{D}}_{j=1}\mathbf {C}^{(in)}_{ij}, \end{aligned}$$
(10)

$$\begin{aligned} \hat{\mu }^{(cout)}_{i}= & {} \frac{1}{N_{D}}\sum ^{N_{D}}_{j=1}\mathbf {C}^{(out)}_{ij}. \end{aligned}$$
(11)
Variances of Daily Incoming ($\hat{\sigma }^{(cin)}_{i}$) and Outgoing Counterparty Numbers ($\hat{\sigma }^{(cout)}_{i}$) measure how far $p_{i}$’s daily numbers of incoming and outgoing counterparties are spread out from the mean values, which are computed by the following equations:
$$\begin{aligned} \hat{\sigma }^{(cin)}_{i}= & {} \frac{1}{N_{D}-1}\sum ^{N_{D}}_{j=1}[\mathbf {C}^{(in)}_{ij}-\hat{\mu }^{(cin)}_{i}]^{2}, \end{aligned}$$
(12)

$$\begin{aligned} \hat{\sigma }^{(cout)}_{i}= & {} \frac{1}{N_{D}-1}\sum ^{N_{D}}_{j=1}[\mathbf {C}^{(out)}_{ij}-\hat{\mu }^{(cout)}_{i}]^{2}. \end{aligned}$$
(13)

Network Structural Features. All IOG participants are in pursuit of money. Gamblers want to win money by playing various kinds of games provided by an IOG platform. Investors, local organizers, and gambling agents aim to earn interest from gamblers. Therefore, money flow can be seen as the “blood” of an IOG ecosystem. We use a money flow network (MFN) to describe the “flow” of “blood”. Let $f_{ij}$ be the total amount of money transferred from participant $p_{i}$ to participant $p_{j}$, and let $f_{ji}$ be the total amount of money transferred from $p_{j}$ to $p_{i}$. We first define the money flow between $p_{i}$ and $p_{j}$ as follows.

Definition 2

The money flow between two participants $p_{i}$ and $p_{j}$ is the absolute value of the difference between $f_{ij}$ and $f_{ji}$, which is computed by:

$$\begin{aligned} flow(p_{i},p_{j})=\arrowvert f_{ij}-f_{ji}\arrowvert . \end{aligned}$$

(14)

The direction of the flow is from $p_{i}$ to $p_{j}$ if $f_{ij}>f_{ji}$, and is from $p_{j}$ to $p_{i}$ if $f_{ji}>f_{ij}$.

Based on the definition of money flow, we further give the definition of an MFN.

Definition 3

A money flow network is a directed graph $\mathcal {G}=(\mathcal {V},\mathcal {E})$, where $\mathcal {V}$ is a finite set of vertices representing participants, and $\mathcal {E}$ is a set of directed edges representing money flows between participants.

In $\mathcal {G}$, there will be an edge $e_{ij}\in \mathcal {E}$ between two vertices $v_{i}$ and $v_{j}$ in $\mathcal {V}$ if $flow(p_{i},p_{j})\ne 0$, and the weight and direction of the edge are the same with $flow(p_{i},p_{j})$ and the direction of the flow, respectively. An MFN can further be represented by an $M\times M$ dimensional adjacency matrix $\mathbf {W}$, where

$$\begin{aligned} \mathbf {W}_{ij}= {\left\{ \begin{array}{ll} flow(p_{i},p_{j}), &{} \text {if }e_{ij}\in \mathcal {E}\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(15)

We assume that the importance of different roles in an MFN should be different. Thus, we compute the following centrality measures as network structural features to identify the importance of participants in an MFN.

In-degree. This measure counts the number of incoming ties of a vertex. As edges in an MFN are weighted, values of incoming edge weights are summed up. The in-degree for vertex $v_{i}$ is represented by the equation:
$$\begin{aligned} D_{in}(v_{i})=\sum ^{M}_{j=1} \mathbf {W}_{ji} \end{aligned}$$
(16)
Out-degree. This measure counts the number of out-going ties of a vertex. The weighted out-degree for vertex $v_{i}$ is represented by the equation:
$$\begin{aligned} D_{out}(v_{i})=\sum ^{M}_{j=1}\mathbf {W}_{ij} \end{aligned}$$
(17)
All-degree. This measure counts the number of ties connecting a vertex to the others, regardless of their directionality. In the weighted version, the all-degree for vertex $v_{i}$ is represented by the equation:
$$\begin{aligned} D_{all}(v_{i})=\sum ^{M}_{j=1}(\mathbf {W}_{ij}+\mathbf {W}_{ji}) \end{aligned}$$
(18)
Betweenness. Degrees are local measures, i.e., they do not take into account the whole network, but only the local neighborhood of a vertex. As a global measure, betweenness quantifies how frequently a vertex acts as a bridge along the shortest paths that connect every other couple of vertices. More formally, the betweenness centrality of node $v_{k}$ is computed by the following equation:
$$\begin{aligned} C_{b}(v_{k})=\frac{1}{(M-1)(M-2)}\sum _{i,j\ne k}\frac{N^{s}_{ij}(v_k)}{N^{s}_{ij}}, \end{aligned}$$
(19)
where $N^{s}_{ij}$ is the number of shortest paths linking the couple of vertices $v_{i}$ and $v_{j}$, and $N^{s}_{ij}(v_k)$ is the number of that paths which contain $v_{k}$. $[(M-1)(M-2)]$ is the total number of pairs of vertices not including $v_{k}$.
Closeness. This feature measures the inverse of the distance of a vertex from all the others in the network, considering the shortest paths that connect each couple of vertices. That is, it denotes how close a vertex is to others. Let $dist(v_{i},v_{j})$ be the number of edges in the shortest path linking vertices $v_{i}$ and $v_{j}$, the closeness centrality of the vertex $v_{i}$ is computed by the following equation:
$$\begin{aligned} C_{c}(v_{i})=\frac{1}{\sum ^{M}_{j=1}dist(v_{i},v_{j})}, \end{aligned}$$
(20)
where $\sum ^{M}_{j=1}dist(v_{i},v_{j})$ is the distance of vertex $v_{i}$ from all the other vertices in the graph.

4.3 Participant Role Recognition

After we determine the features, we use them to represent IOG participants as vectors. Based on the vector representation, we train a classifier using labeled data for role recognition. One thing should be noted is that there may be an imbalance between the number of samples in each role category. For example, in an IOG ecosystem, the number of gamblers is much larger than that of investors. This imbalance could affect the performance of the trained classifier.

To reduce the negative impact of unbalanced training data, we adopt an ensemble learning strategy based on the AdaBoost algorithm [5]. We use Naive Bayesian as the base classifier. Each training tuple is assigned with a weight. A series of t classifiers are iteratively learned. In each learning iteration, the samples from the original training set are re-sampled to form a new training set. The samples with higher weights are selected with a higher probability. After a classifier $H_{i}$ is learned, the samples misclassified by $H_{i}$ are assigned higher weights. In the following learning iteration, the classifier $H_{i+1}$ will pay more attention to the misclassified samples. In each round, we restrict the number of re-sampled samples in each role category to be the same (i.e., the size of the smallest category). The final class prediction is based on the weighted votes of the classifiers learned in each iteration. We named the overall ensemble classifier as EC4PRR (Ensemble Classifier for Participant Role Recognition).

5 Experiments and Analysis

In this section, we present a set of experiments to evaluate the performance of our PRR approach using real-world IOG monetary transaction data. We implemented all experiments in Java and MATLAB.

5.1 Data

As aforementioned, the MTD used in experiments were provided by an LEA in Shandong Province of China. The data contains two years of monetary transaction records between participants of an IOG platform. The LEA obtained this data during the investigation of the IOG case with warrants, and some of the key members of the IOG organization have been arrested. However, to protect the privacy, the MTD has been pre-processed by the LEA. Each participant was represented by an ID, and only a few fields of each transaction record were kept, including source ID, target ID, amount of transferred money, and time stamp. All the other information was removed. There are totally 4690 participants and 3.9 million transactions in the MTD. Each participant has been assigned a role manually by LEA investigators during the investigation. In Tables 1 and 2, we summarize some essential characteristics of the MTD. From Table 1, we can see the imbalance between the number of samples in each role category. In the training phase, we randomly select 85% of the samples as the training set to build a classifier and 15% as testing samples to validate the performance of the classifier.

Table 1. Participants distribution among roles.

Full size table

Table 2. Experimental data statistics.

Full size table

5.2 Evaluation Measures

We adopt precision, recall, and $F_{1}$ score as performance measures. Let $\widetilde{\mathcal {R}}=\{\widetilde{R}_{1}, \widetilde{R}_{2}, ..., \widetilde{R}_{K}\}$ refer to the role assignment output by a PRR classifier, and $\mathcal {R}=\{R_{1}, R_{2}, ..., R_{K}\}$ be the ground-truth. Let $TP_{i}$ be the number of participants assigned to $\widetilde{R}_{i} $ correctly according to $R_{i}$, $FP_{i}$ refer to the number of participants assigned to $\widetilde{R}_{i}$ by mistake, and $FN_{i}$ refer to the number of participants should be assigned to $\widetilde{R}_{i}$ but are assigned to other roles, the precision, recall, and $F_{1}$ for the whole experimental results can be briefed as follows:

$$\begin{aligned} precision= & {} \frac{\sum _{i\in [1,K]}TP_{i}}{\sum _{i\in [1,K]}TP_{i}+\sum _{i\in [1,K]}FP_{i}} \end{aligned}$$

(21)

$$\begin{aligned} recall= & {} \frac{\sum _{i\in [1,K]}TP_{i}}{\sum _{i\in [1,K]}TP_{i}+\sum _{i\in [1,K]}FN_{i}} \end{aligned}$$

(22)

$$\begin{aligned} F_{1}= & {} \frac{2\times recall\times precision}{recall+precision} \end{aligned}$$

(23)

5.3 Effectiveness of Features

To tested the effectiveness of the proposed features, we trained EC4PRR with different feature configurations. Three classifiers, named EC4PRR-TSF, EC4PRR-NSF, and EC4PRR-All, were learned using only transaction statistical features, only network structural features, and a combination of all features, respectively. For each classifier, we ran it using a 10-fold cross-validation.

The prediction results are shown in Fig. 3. From the results, we can see both EC4PRR-TSF and EC4PRR-NSF achieved relatively satisfactory performances. Precision scores of EC4PRR-TSF and EC4PRR-NSF achieved 0.76 and 0.71, respectively. The overall performances of the two classifiers are 0.69 and 0.65 respectively in terms of $F_{1}$. This indicates that both transaction statistical features and network structural features are discriminative for role recognition. EC4PRR-All, which was trained with all features, obtained the best performance in terms of all evaluation measures. The precision, recall, and $F_{1}$ of EC4PRR-All were 0.8759, 0.7898, and 0.8356, respectively. This reveals that a combination of the two sets of features can improve the performance of the EC4PRR classifier.

As the efficiency of transaction statistical features can be impacted by the time coverage of MTD, we also examined the sensitivity of EC4PRR-TSF and EC4PRR-All to the size of a dataset. We divided the two years of MTD into eight units with a time coverage of three months for each unit. We then derived subsets of the MTD by varying the time coverage from one to eight units and tested $F_{1}$ performances of the two classifiers using these eight subsets. Figure 4 shows the performance variations of EC4PRR-TSF and EC4PRR-All with data size enlarging. We can see that $F_{1}$ scores of both EC4PRR-TSF and EC4PRR-All become stable when the data size is larger than five units.

Table 3. Performance comparison of different classifiers.

Full size table

5.4 Role Recognition Results

We compared the proposed EC4PRR with three common-used classifiers, which include Naive Bayesian, SVM, and Random Forest, to validate its predictive performance. These classifiers have been implemented in WEKA^{Footnote 1} and LIBSVM^{Footnote 2}. We also ran each of the classifiers using a 10-fold cross-validation.

Table 3 gives the results of this experiment. From Table 3, we can see that EC4PRR outperformed all the other three classifiers in terms of precision, recall, and $F_{1}$. This demonstrates the effectiveness of EC4PRR for recognizing the roles of IOG participants. The performance of Naive Bayesian, which we used as the base classifier in our ensemble learning process, was the worst among all classifiers. The ensemble learning strategy made the final classifier much better. Note that the Random Forest is also a kind of classifier based on ensemble learning. However, our modification of the re-sampling strategy made EC4PRR more suitable for the unbalanced training data and achieve better performance.

6 Conclusion

In this study, we propose an automatic approach to address the IOG participant-role recognition problem. The proposed approach extracts two sets of features, i.e. transaction statistical features and network structural features, from monetary transaction data to effectively represent participants. A classifier named EC4PRR is trained based on the feature representation to predict the roles of participants. To reduce the impact of unbalanced training data, EC4PRR adopts an ensemble learning strategy in the training phase. Experiments were carried out on real-world IOG case data. The results indicate that both transaction statistical features and network structural features are discriminative for role recognition, and a combination of the two sets of features can improve the performance of EC4PRR. In comparison with other common-used classifiers, EC4PRR achieved the best performance. The result also reveals that our modification of the ensemble learning process makes EC4PRR more suitable for unbalanced training data.

There are primarily two limitations of this study we should pay attention to in our future work. First, although we used real-world data in our experiments, the data was only extracted from a specific IOG case, more case data should be collected in the future. Second, we did not consider the situation that an IOG participant plays multiple roles. New techniques should be proposed to address this problem.

Notes

References

Albanese, J.S.: Illegal gambling businesses & organized crime: an analysis of federal convictions. Trends Organ. Crime 21(3), 262–277 (2018)
Article Google Scholar
Banks, J.: Edging your bets: advantage play, gambling, crime and victimisation. Crime Media Cult. 9(2), 171–187 (2013)
Article Google Scholar
Banks, J.: Internet gambling, crime and the regulation of virtual environments. In: Banks, J. (ed.) Gambling, Crime and Society, pp. 183–223. Palgrave Macmillan UK, London (2017). https://doi.org/10.1057/978-1-137-57994-2_6
Chapter Google Scholar
Blaszczynski, A.: Online gambling and crime: causes, controls and controversies. Int. Gambl. Stud. 15(2), 340–341 (2016)
Article Google Scholar
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: European Conference on Computational Learning Theory, pp. 23–37 (1995)
Google Scholar
Gainsbury, S., Wood, R., Russell, A., Hing, N., Blaszczynski, A.: A digital revolution: comparison of demographic profiles, attitudes and gambling behavior of internet and non-internet gamblers. Comput. Hum. Behav. 28(4), 1388–1398 (2012)
Article Google Scholar
Gainsbury, S.M., Russell, A., Hing, N., Wood, R., Blaszczynski, A.: The impact of internet gambling on gambling problems: a comparison of moderate-risk and problem internet and non-internet gamblers. Psychol. Addict. Behav. 27(4), 1092–101 (2013)
Article Google Scholar
Gainsbury, S.M., Russell, A., Hing, N., Wood, R., Dan, L., Blaszczynski, A.: How the internet is changing gambling: findings from an australian prevalence survey. J. Gambl. Stud. 31(1), 1–15 (2013)
Article Google Scholar
Gainsbury, S.M., Russell, A., Wood, R., Hing, N., Blaszczynski, A.: How risky is internet gambling? A comparison of subgroups of internet gamblers based on problem gambling status. New Media Soc. 17(6), 861–879 (2015)
Article Google Scholar
Mcmullan, J.L., Rege, A.: Online crime and internet gambling. J. Gambl. Issues 24(5), 115–116 (2010)
Google Scholar
Tong, S., Zhang, H., Shen, B., Zhong, H., Wang, Y., Jin, B.: Detecting gambling sites from post behaviors. In: 2016 IEEE 11th Conference on Industrial Electronics and Applications, pp. 2495–2500 (2016)
Google Scholar
Wardle, H., Moody, A., Griffiths, M., Orford, J., Volberg, R.: Defining the online gambler and patterns of behaviour integration: evidence from the british gambling prevalence survey 2010. Int. Gambl. Stud. 11(3), 339–356 (2011)
Article Google Scholar
Wood, R.T., Williams, R.J.: A comparative profile of the internet gambler: demographic characteristics, game-play patterns, and problem gambling status. New Media Soc. 13(13), 1123–1141 (2011)
Article Google Scholar
Wood, R.T., Williams, R.J., Lawton, P.K.: Why do internet gamblers prefer online versus land-based venues? Some preliminary findings and implications. J. Gambl. Issues 20, 235–252 (2007)
Google Scholar

Download references

Acknowledgment

This work was supported in part by National Natural Science Foundation of China (61602281, 61702309), Natural Science Foundation of Shandong Province of China (ZR2015YL018, ZR2016YL011, and ZR2016YL014), Shandong Provincial Key Research and Development Program (2018CXGC0701, 2018GGX106005, 2017CXGC0701, and 2017CXGC0706).

Author information

Authors and Affiliations

Shandong Computer Science Center (National Supercomputer Center in Jinan), Shandong Provincial Key Laboratory of Computer Networks, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
Xiaohui Han, Lianhai Wang, Shujiang Xu, Dawei Zhao & Guangqi Liu

Authors

Xiaohui Han
View author publications
You can also search for this author in PubMed Google Scholar
Lianhai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shujiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guangqi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Han .

Editor information

Editors and Affiliations

Départment d'Informatique, Ecole Normale Supérieure, Paris, France
David Naccache
University of Texas, San Antonio, TX, USA
Shouhuai Xu
Chinese Academy of Sciences, Beijing, China
Sihan Qing
Dipto di Informatica, Univ degli Studi di Milano, Crema, Italy
Pierangela Samarati
Télécom SudParis, Evry, France
Gregory Blanc
University of New Brunswick, Fredericton, NB, Canada
Rongxing Lu
IMT Lille Douai, Villeneuve-d'Ascq, France
Zonghua Zhang
IMT Lille Douai, Villeneuve-d'Ascq, France
Ahmed Meddahi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, X., Wang, L., Xu, S., Zhao, D., Liu, G. (2018). Role Recognition of Illegal Online Gambling Participants Using Monetary Transaction Data. In: Naccache, D., et al. Information and Communications Security. ICICS 2018. Lecture Notes in Computer Science(), vol 11149. Springer, Cham. https://doi.org/10.1007/978-3-030-01950-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-01950-1_34
Published: 26 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01949-5
Online ISBN: 978-3-030-01950-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Role Recognition of Illegal Online Gambling Participants Using Monetary Transaction Data

Abstract

Similar content being viewed by others

Anti-money Laundering (AML) Research: A System for Identification and Multi-classification

Predicting self-exclusion among online gamblers: An empirical real-world study

An individual-group-merchant relation model for identifying fake online reviews: an empirical study on a Chinese e-commerce platform

Keywords

1 Introduction

2 Related Work

3 Preliminaries