1 Introduction

User-based systems, such as web-services like Amazon, Twitter or corporate IT networks, have become popular targets of fraud or attacks. A popular research problem is to detect the spammers/fraudsters/attackers that are trying to attack a given system [3, 11, 13, 21]. Similarly, in the social networks setting, there are multiple websites where anyone can buy fake Facebook page-likes or Twitter followers. In all these cases, such fraudulent activities take the form of “lockstep” or highly synchronized behavior: such as, multiple users liking the same set of pages on Facebook, or multiple users following the same users almost at the same time on Twitter [3]. Such behavior results in dense blocks in matrices/tensors. The reason behind these blocks is intuitive, as most of the fraudsters have constrained resources (accounts, IP addresses, time, etc.) and they reuse their resources to add as many fraudulent activities as possible to maximize their profits.

Various methods have been proposed to identify users exhibiting such behavior, which involve finding dense blocks in tensors [11, 21] or clustering in subgraphs [3, 23]. However, for security experts monitoring the systems, it is imperative to know which users are more suspicious than other users, since it directs their attention to such users for further analysis or actions. In this paper we propose a method that ranks entities effectively (see Fig. 1) for a security analyst to view. Consider Fig. 2; all three users, A, B and C are participating in dense blocks (as they are part of the 2 rectangles), however their contribution towards the suspiciousness of each block is different. A core question we answer in our paper is as follows:

Informal Problem 1

(Individual Suspiciousness Metric). Given multimodal temporal data in the form of (userId, productId, \(\dots \), timestamp), how can we find and score suspicious entities (e.g. users, activities, products, days,etc.)?

Fig. 1.
figure 1

Effectiveness of zooRank on real world datasets. (Top Left) Perfect precision-recall on software marketplace dataset. (Top Right) zooRank obtains good precision recall on Reddit dataset. (Bottom Left) Top 100 suspicious users found by zooRank show high synchronicity (formed groups) in rating and reviewing top suspicious products. (Bottom Right) The suspicious users (bottom; red) detected by zooRank for Reddit dataset show irregular spikes in inter-arrival time distribution, as compared to all the users (top; blue). (Color figure online)

In addition, almost all the social networking websites and services have timestamps associated with every user activity. However, few approaches in the literature consider temporal features [3]. These timestamps can be useful for detecting fraudsters. However, it is not clear in dense block detection literature, in what ways we can incorporate the temporal information available to us. In this paper we answer the following question:

Informal Problem 2

(Temporal data handling). Given data in the form of (cat 1, cat 2, \(\dots \), timestamp), how can we generate features from timestamps useful for detecting fraudsters? Here cat 1,cat 2 are any categorical features (generally userId, productId, activityId, ratings, etc.).

Fig. 2.
figure 2

How to rank users based on their suspiciousness, matching human intuition (A > B > C)?

We propose zooRank, a novel approach for successfully scoring entities based on their participation in suspicious dense blocks. We introduce a set of axioms that any ideal individual scoring metric should satisfy. We show theoretically, that our proposed scoring function satisfies the proposed axioms. Additionally, zooRank also provides a framework to make good use of temporal information that generally exists in all the real-world datasets. As shown in Fig. 1, zooRank successfully finds suspicious users in multiple real-world datasets (Software Marketplace data and Reddit data) with high accuracy. Additionally, the suspicious users found by our method showed clearly anomalous patterns. In Fig. 1 (Bottom Left), we see that multiple users are working in groups to target certain products. Similarly, in Fig. 1 (Bottom Right), the suspicious users detected by our method show extremely regular and bot-like behavior resulting in spikes in the inter-arrival time distribution (difference in seconds between consecutive posts).

Our main contributions are as follows:

  • Theory

    • Axioms: We propose a set of axioms that an individual scoring metric for measuring contribution of a user towards a suspicious block should follow.

    • Metric: We propose an individual suspiciousness scoring metric.

    • Proofs: We further prove that our proposed individual metric follows all the proposed axioms.

  • Temporal Features: We provide a way of creating temporal features from the timestamp information present in the data.

  • Multimodality and Effectiveness: The proposed approach zooRank can take into account various features, including temporal features. The approach detects suspicious entities in all modes of the data. We tested zooRank on various real-world datasets and were able to find suspicious entities with high accuracy, revealing interesting fraud patterns.

Reproducability: Our code and link to the datasets used is available at https://goo.gl/rrvDTx.

2 Background and Related Work

A lot of work exists in the literature which aims at finding dense blocks, but none of the methods present a way of scoring the individual entities in dense blocks.

Detecting dense blocks: Densest-subgraph identification (i.e., the problem of finding a subgraph with maximum average degree) has been broadly studied in theory [5, 8]. These theoretical results have been extended and applied to anomaly and fraud detection [11, 20] since dense subgraphs (dense blocks) in real-world graph data tend to indicate fraudulent lock-step behavior, such as follower-buying services in Twitter. Spectral methods, which make use of eigen and singular value decomposition, also have been used for detecting dense subgraphs corresponding to ‘cut-and-paste’ bibliography in patent graphs [18], lock-step followers [13] and small-scale stealthy attacks [19] in social networks. Other approaches for dense-subgraph detection include co-clustering [3] and belief propagation [17]. Recently, dense-block detection in multi-aspect data also has been researched [12, 21] for spotting groups synchronized in multiple aspects, such as IPs, review scores and review keywords. For our experiments, we use the best performing dense subgraph detection method M-Zoom [21]. The existing methods, however aim at only finding blocks, and do not provide a rank-list of users to inspect according to their suspiciousness.

Scoring Anomalies: Evaluating the anomalousness or suspiciousness of individuals is complementary to detecting dense blocks, which correspond to group activities. A widely-used approach is to detect outliers. Outlier detection methods are divided into parametric methods assuming underlying data distribution [10] and non-parametric methods using local features, such as distances to neighbors [14] and local density [4, 15]. For graph data, on the other hand, various approaches, based on minimum description length [6, 9], neighborhood information [22], egonet features [2] have been proposed for scoring nodes. Many methods do exist in the literature, which use temporal information such as inter-arrival time [7, 16]. These features have been used to successfully detect bot-like behavior [7].

Our proposed method zooRank scores each entity (individual-scoring) in any of the dimensions (multimodal) of the tensor based on the entity’s participation in the suspicious dense blocks (dense-blocks). It provides ways of transforming temporal data into useful features and thus handles both numerical and categorical features. A comparison between zooRank and other algorithms is summarized in Table 1. Our proposed method zooRank is the only one that matches all specifications.

Table 1. Comparison of other methods and their features

3 Preliminaries and Problem Definition

3.1 Problem Definition

Definition 1

(K-way timed tensor). A K-way timed tensor is a higher-order matrix containing entries of the form (category 1, category 2, \(\ldots \), category K, timestamp).

Many types of data including “like” data from Facebook (UserId, PageId, Timestamp), “follow” data from Twitter (UserId, FolloweeId, Timestamp), activity log from an organization (UserId, OperationId, Timestamp) or network data (Source IP, Source Port, Destination IP, Destination Port, Timestamp) all can be formulated as a K-way timed tensor. We now give a precise definition of the problem statements.

Problem 1

(Temporal Features Handling). Given a K-way timed tensor \(\mathcal {A}\), how can we effectively transform the temporal features associated with \(\mathcal {A}\) to generate a categorical tensor \(\mathcal {X}\)?

Problem 2

(Individual-Suspiciousness). Given a L-way categorical tensor \(\mathcal {X}\) of size \(N_{1} \times N_{2} \times \dots \times N_{L}\) with non-negative entries, compute a score function \(f_{\mathcal {X}}(i)\), which defines the suspiciousness of entity i in the \(m(i)^{th}\) mode of \(\mathcal {X}\) with respect to the overall tensor \(\mathcal {X}\).

3.2 Block Level Suspiciousness Metrics

In this paper, we consider three block-level suspiciousness metrics although our proposed method is not restricted to them. The metrics are Arithmetic (\(g_{ari}\)), Geometric (\(g_{geom}\)) and Density (\(g_{susp}\)). Arithmetic computes the arithmetic average mass of a sub-block \(\mathcal {Y}\) of a tensor \(\mathcal {X}\). Similarly, Geometric metric is the geometric average mass of the block. The Density metric is the KL-divergence (Kullback Leibler) between the distribution of the mass in the sub-block with respect to the distribution of the mass in the tensor. These metrics are explained in the following sections.

3.3 Axioms

In this sub-section, we establish axioms that a good score function \(f = f_{\mathcal {X}}(i)\) should satisfy. The suspiciousness of an entity should be based on its participation in dense blocks \(\mathcal {B}\). Hence, our first two axioms govern the scores with respect to a single block \(\mathcal {Y} \in \mathcal {B}\): our third axiom then governs how the single-block scores are combined to form \(f_{\mathcal {X}}(i)\).

Table 2. Symbols and definitions

Let \(\rho _\mathcal {Y}\) be the density (i.e. mass divided by volume) of \(\mathcal {Y}\), and \(\rho _{\mathcal {Y}}(i)\) be the density of the slice of \(\mathcal {Y}\) defined by entity i. Similarly, let \(\mathcal {C}_{\mathcal {Y}}(i)\) denote the mass of that same slice. The entire list of symbols is shown in Table 2.

Axiom 1

(Mass). If an entity a has more mass than entity b in a block and given the fixed size of block in both the modes m(a) and m(b), then entity a is more suspicious. Formally

$$\begin{aligned} \text {IF}\ \mathcal {C}_{\mathcal {Y}}(a)> \mathcal {C}_{\mathcal {Y}}(b),\quad&\text {AND}\quad \mathcal {N}_{\mathcal {Y}}^{m(a)} = \mathcal {N}_{\mathcal {Y}}^{m(b)}, \text {THEN}\ \delta _{\mathcal {Y}}(a) > \delta _{\mathcal {Y}}(b) \end{aligned}$$

This is represented in Fig. 2. See how entities are ranked by suspiciousness in the top right block (User A > User B > Activity D).

Axiom 2

(Concentration). Given two entities ab in different modes m(a), m(b), where number of entities in one mode (\(N_{\mathcal {Y}}^{(m(a))}\)) is less than the number of entities in the second mode (\(N_{\mathcal {Y}}^{(m(b))}\)), then for fixed density, entity a is more suspicious than entity b.

Formally,

$$\begin{aligned} \text {IF}\ N_{\mathcal {Y}}^{m(a)} < N_{\mathcal {Y}}^{m(b)} \quad&\text {AND}\quad \rho _{\mathcal {Y}} = \rho _{\mathcal {Y}}(a) = \rho _{\mathcal {Y}}(b), \\ \text {THEN}\quad \delta _{\mathcal {Y}}(a)&> \delta _{\mathcal {Y}}(b) \end{aligned}$$

This is represented in Fig. 2. See how entities are ranked by suspiciousness in the lower left block (User C > Activity E).

Axiom 3

(Monotonocity). If for every block, entity a has higher suspiciousness than entity b, then entity a has higher overall suspiciousness.

Formally,

$$\begin{aligned} \text {IF}\ \delta _{\mathcal {Y}}(a)&> \delta _{\mathcal {Y}}(b)\quad \forall \mathcal {Y} \in \mathcal {B}, \text {THEN}\ f_{\mathcal {X}}(a) > f_{\mathcal {X}}(b) \end{aligned}$$

3.4 Shortcomings of Other Metrics

While these axioms are simple and intuitive, many other candidate metrics are not able to satisfy them. We consider some of them, and show why they fail.

Block Score: One simple metric to consider is the block suspiciousness score itself. The metric is to assign each individual the maximum block suspiciousness score out of all the blocks it is part of. The metric doesn’t change if the two entities have different contributions to the block, and hence fails Axiom 1 (Mass) and Axiom 2 (Concentration).

SVD-score: Any matrix \(\mathbf {A}\) can be decomposed using SVD decomposition as follows: \(\mathbf {A=U\Sigma V^{T}}\). Each of the singular values in \(\mathbf {\Sigma }\) represents the singular value related to a dense block that exists in the dataset. The metric here is the score of the maximum component for each user. This metric would again fail Axiom 1 (Mass) and Axiom 2 (Concentration).

Average \(\mathbf {\delta }\) -Block Score: Another proposed metric could be the average of all the contributions by the given entity to all the suspicious blocks. The contribution to a block is computed as the difference in the suspiciousness between the block and the block after removing the specified entity. This statistic fails to satisfy Axiom 3 (Monotonocity) as if entity 1 has higher suspiciousness in 2 blocks than entity 2, but entity 2 exists only in one of the blocks, then the mean statistic is ambiguous.

As we show above, the metrics based on aggregation of block statistics do not work. In the following section, we propose \(\textsc {zooRank} \), a scalable and effective method for finding and scoring suspicious individual entities in multimodal temporal data.

4 Proposed Approach: zooRank

4.1 Temporal Feature Handling

As mentioned, any data from a social networking website or a web service can be represented as a K-way timed tensor. We propose a way to handle such tensors by converting the numerical timestamp mode into interpretable categorical features. We propose to generate \(0^{th}\)-order, \(1^{st}\)-order, and temporal folding features.

  • \(0^{th}\) -order features: The \(0^{th}\) order features bucketize the timestamp into number of days, hours, minutes, etc. passed since the first observation was made. The temporal resolution can be chosen by practitioners based on the typical level of temporal variation present in their dataset.

  • \(1^{st}\) -order features: Inter-arrival time is defined as the time interval between 2 consecutive timestamps of the same user. [7] found that bots tend to display regular inter-arrival time behavior such as performing an activity every exactly 5 min, due to automated scripts. To capture this pattern, we propose \(1^{st}\)-order features, which is the log-bucketized inter-arrival time between 2 consecutive operations of a user (generalizable to any entity).

  • Temporal folding features: We propose another way to detect fraudsters showing periodic behavior, which are common in bot-like behavior. For instance, a group of anomalous users might try to perform multiple login activities only from Wednesday 10 PM to 11 PM, or only on a specific day of the week. We work with 3 such features: (1) day of the week, (2) hour of the day and (3) hour of the week. We call these features temporal folding features.

4.2 Proposed Metric

Our metric is based on the \(\delta \)-contribution of each entity towards the block suspiciousness score. We first define the \(\delta \)-contribution for a given entity i in mode m(i) of a specific block \(\mathcal {Y} \in \mathcal {B}\), where \(\mathcal {B}\) is a list of blocks. We denote this by \(\delta _{\mathcal {Y}}(i)\).

Definition 2

(Entity’s Block-level Suspiciousness ( \({\delta _{\mathcal {Y}}(\mathbf{i}))}\) ). We define \(\delta _{\mathcal {Y}}(i)\) as the difference between the suspiciousness score of block \(\mathcal {Y}\) and block \(\mathcal {Y}\) after removing entity i from the block i.e., \(\delta _{\mathcal {Y}}(i) = g(\mathcal {Y})-g(\mathcal {Y} \setminus i)\).

We need to aggregate the \(\delta \)-metric over the entire list of blocks \(\mathcal {B}\), in such a way that the given axioms are satisfied. We propose two metrics both of which satisfy the given axioms. The first metric is the sum of the \(\delta \)-contributions, and the second is the maximum of the \(\delta \)-contributions. We define the maximum metric as follows:

$$ f_{\mathcal {X}}(i) = \max _{\mathcal {Y} \in \mathcal {B}}(\delta _{\mathcal {Y}}(i)) $$

We empirically found that the maximum metric performs the best on the real-world datasets, and hence for the rest of the paper, all references to the proposed metric is for the maximum version of the metric.

4.3 Algorithm

After handling the temporal features, we produce a categorical tensor \(\mathcal {X}\). Algorithm 1 defines the outline of zooRank. The first step is to compute suspicious blocks for the given tensor \(\mathcal {X}\). To compute suspicious blocks, any existing method for block detection can be used.

We first find the M top suspicious dense blocks as determined by g (Line 1), where g is one of the metrics defined in Sect. 3.2. These top M suspicious blocks are stored in the list \(\mathcal {B}\). For every entity i that has occurred at least once in any of the blocks in \(\mathcal {B}\), we compute the individual suspiciousness score function f. This score function captures the contribution of a particular entity towards making the block suspicious. To do this, we compute the marginal contribution of each entity towards that block. This is equivalent to removing the entity i from the block, and re-computing the suspiciousness score (Lines 6–7). The difference between the new suspiciousness score and the original suspiciousness score is the marginal contribution of entity i. We compute the marginal contribution of each entity i over all the blocks (Lines 4–8). We define the individual suspiciousness score of the entity i as the maximum of the marginal contributions of entity i (Line 9). Another potential metric is to replace the maximization in Line 9 by the sum function. We conduct experiments with that metric as well.

figure a

This formulation of the scores \(f_{\mathcal {X}}(i)\) satisfies intuitively reasonable properties, namely our axioms defined in Sect. 3.3:

Theorem 1

The scores \(f_{\mathcal {X}}(i)\) computed by Algorithm 1, using any of the metrics \(g_{ari}\), \(g_{geo},\) or \(g_{susp}\), satisfies Axioms 1 to 3.

Proof

We first start by defining some of the standard block suspiciousness methods as follows:

$$\begin{aligned} g_{ari}(\mathcal {Y},\mathcal {X})&= C_{\mathcal {Y}}/(\sum _j N_{\mathcal {Y}}^{j}/L) \\ g_{geo}(\mathcal {Y},\mathcal {X})&= C_{\mathcal {Y}}/(V_{\mathcal {Y}}^{1/L}) \\ g_{susp}(\mathcal {Y},\mathcal {X})&= V_{\mathcal {Y}} \cdot \text {D}(\rho _\mathcal {Y} || \rho _\mathcal {X}) \end{aligned}$$

where \(\text {D}(\rho _\mathcal {Y} || \rho _\mathcal {X}) = \rho _\mathcal {X} - \rho _\mathcal {Y} + \rho _\mathcal {Y} \log \frac{\rho _\mathcal {Y}}{\rho _\mathcal {X}}\).

zooRank satisfies Axiom 1 (Mass)

If we fix the block’s dimensions \(N_{\mathcal {Y}}^{1},\dots ,N_{\mathcal {Y}}^{L}\), all three metrics above strictly increase the mass of the block (i.e. \(C_{\mathcal {Y}}\)); this can be inferred directly from the form of \(g_{ari}\) and \(g_{geo}\), and for \(g_{susp}\).

As \(\mathcal {C}_{\mathcal {Y}}(a) > \mathcal {C}_{\mathcal {Y}}(b)\), thus \(\mathcal {Y}\setminus a\) has lower mass than \(\mathcal {Y}\setminus b\), and since g is strictly increasing in mass (for fixed block dimensions), we get \(g(\mathcal {Y}\setminus a) < g(\mathcal {Y}\setminus b)\). Therefore:

$$\begin{aligned} \delta _{\mathcal {Y}}(a)&= g(\mathcal {Y}) - g(\mathcal {Y}\setminus a) > g(\mathcal {Y})-g(\mathcal {Y}\setminus b) \\&=\delta _{\mathcal {Y}}(b) \end{aligned}$$

zooRank satisfies Axiom 2 (Concentration)

Using the same reasoning as above, it suffices to show \(g(\mathcal {Y} \setminus a) < g(\mathcal {Y} \setminus b)\). Note that \(N_\mathcal {Y}^{m(a)}< N_\mathcal {Y}^{m(b)} \Rightarrow V_{\mathcal {Y} \setminus a} < V_{\mathcal {Y} \setminus b}\) (since removing from a smaller mode decreases the volume more). Consider each metric \(g_{ari}\), \(g_{geo}\), and \(g_{susp}\) separately:

  • case 1: \(g_{ari}\).

    Here \(\mathcal {Y} \setminus a\) and \(\mathcal {Y} \setminus b\) have the same sum of block dimensions, and \(\mathcal {C}_{\mathcal {Y} \setminus a} = \rho _\mathcal {Y} \cdot \mathcal {V}_{\mathcal {Y} \setminus a} < \rho _\mathcal {Y} \cdot \mathcal {V}_{\mathcal {Y} \setminus b} = \mathcal {C}_{\mathcal {Y} \setminus b}\) so that \(g_{ari}(\mathcal {Y} \setminus a) < g_{ari}(\mathcal {Y} \setminus b)\).

  • case 2: \(g_{geo}\).

    Note that \(g_{geo}(\mathcal {Y}) = C_{\mathcal {Y}}/(V_{\mathcal {Y}}^{1/L}) = \rho _\mathcal {Y} \cdot V_{\mathcal {Y}} /(V_{\mathcal {Y}}^{1/L}) = \rho _\mathcal {Y} \cdot V_{\mathcal {Y}}^{\frac{L-1}{L}}\). Thus:

    $$ g_{geo}(\mathcal {Y} \setminus a) = \rho _\mathcal {Y} \cdot (\mathcal {V}_{\mathcal {Y} \setminus a})^{\frac{L-1}{L}} < \rho _\mathcal {Y} \cdot (\mathcal {V}_{\mathcal {Y} \setminus b})^{\frac{L-1}{L}} = g_{geo}(\mathcal {Y} \setminus b). $$
  • case 3: \(g_{susp}\).

    \( g_{susp}(\mathcal {Y} \setminus a) = V_{\mathcal {Y} \setminus a} \cdot \text {D}(\rho _\mathcal {Y} || \rho _\mathcal {X}) < V_{\mathcal {Y} \setminus b} \cdot \text {D}(\rho _\mathcal {Y} || \rho _\mathcal {X}) = g_{susp}(\mathcal {Y} \setminus b)\).

zooRank satisfies Axiom 3 (Monotonocity)

$$ f_{\mathcal {X}}(a) = \max _{\mathcal {Y} \in \mathcal {B}} \delta _{\mathcal {Y}}(a) > \max _{\mathcal {Y} \in \mathcal {B}} \delta _{\mathcal {Y}}(b) = f_{\mathcal {X}}(b). $$

5 Experiments

In this section, we conducted experiments to answer the following questions:

  • Q1: How effectively does zooRank find suspicious entities across all modes?

  • Q2: How generalizable is zooRank over different datasets?

  • Q3: Does zooRank scale linearly with size of the data?

5.1 Datasets

We used various real-world datasets including a software marketplace dataset, a dataset from a popular social news aggregation website (Reddit), a dataset about Indian elections from Twitter, and a research lab’s intrusion detection dataset.

  • Software Marketplace Dataset (SWM): We used the SWM dataset that was used previously by [1]. The dataset contains the reviews for all the products (software) under the entertainment category of the marketplace. The dataset contains 1,132,373 reviews from 966,839 unique users for 15,094 products. Each review has a rating from 1 to 5, and the timestamp on which the review was posted. The dataset, thus is in the format (UserId, ProductId, Rating, Timestamp). Previous studies [1, 23] manually annotated ground truth labels for suspicious users, which we considered as our ground truth.

  • Reddit Dataset: Reddit is a social news aggregator website, which allows users to post, comment on, upvote and downvote stories. The dataset was collected and analyzed by [7]. The dataset contains 1,020,834 user comments for 1,036 users. The Reddit dataset is in the form (UserId, #Upvotes, #Downvotes, Length, Timestamp). The dataset has information about ground truth suspicious user accounts.

  • DARPA Intrusion Detection: The DARPA intrusion detection dataset contains a sample of network data for the US Air Force laboratory. The dataset contains records in the format (Source IP, Destination IP, Timestamp). Further, it also contains labels for anomalous connections. For ground truth, we considered any source IP address that participates in at least 10 such anomalous connections, and any destination IP address that participates in at least 400 such connections. We altered this definition for ground truth thresholds and still achieved similar results as mentioned in the paper.

  • Indian Elections 2014 Dataset: We collected tweets from 2014 Indian Elections. We crawled all the tweets from the 10% Sample API (Decahose). All the tweets contain the top 5 hashtags on Indian Elections per week. We further considered only those users who have at least 2 tweets in our dataset. This led us to a dataset of tweets from March, 2014 to May, 2014 consisting of 10,786 users.

  • Simulated Dataset: We also tested our approach on a simulated dataset. For simulation, we used a realistic way of generating user-timestamps [7], then for each of the timestamp, we added activities based on a Poisson distribution. We simulated 3 blocks, comprising of 300, 400 and 200 genuine users respectively, where each block has different parameters for the activity Poisson distribution. For the suspicious blocks, we simulated three blocks for 50, 25 and 25 users respectively. The first block does the most popular activity over the entire duration of the simulation and with random inter-arrival times. The second and third block do the second most and third most popular activities, respectively, at a steady inter-arrival time of 1 min on a single day.

Experimental Settings: All our experiments were conducted on a machine on Intel(R) Xeon(R) CPU W3530 @ 2.80 GHz and 24 GB RAM. For all our experiments, we chose \(M=30\) and used M-Zoom [21] for dense block detection. We created multiple tensors based on different resolutions of time features (such as day of week, hour of the day, Inter-arrival time (in seconds, bucketized), etc.). However, we reported only the best accuracy obtained. The choice of what tensor to use, what block-level metric to use, and what value of M is appropriate, is for the practitioner to decide and depends on the type of data, on which the method is being applied.

5.2 Q1. Effectiveness of zooRank

To test the effectiveness of zooRank, we compare our ranking of the suspicious entities with the ground truth suspicious users in our datasets. We further test the accuracy of our method on the SWM dataset. For software marketplace, we experimented with different versions of temporal features. Note that our algorithm achieves 100% accuracy in identifying suspicious users in the SWM dataset. From Fig. 3a, we observed that adding the inter-arrival time feature increased the accuracy of the method. Our algorithm can rank entities in multiple modes; hence, we also tried to rank the products on basis of their suspiciousness. Though we do not have ground truth for which products were suspicious, we analyzed the top 5 suspicious products in Table 3b. We used the number of reviews by ground truth fraudsters as an indicator for suspiciousness. It can be observed that all the suspicious products are popular (high number of total reviews) and have also been targeted significantly from fraudsters (high number of fraud users). We also noticed that most of the reviews by fraudsters were highly synchronized and a large majority came on a single day (Fig. 3c).

Fig. 3.
figure 3

zooRank is effective. (a) It gives nearly 100% accuracy while identifying suspicious users in the SWM dataset. (b) zooRank marks products reviewed by known fraudsters as suspicious. (c) Product #2 received nearly all of it’s reviews by fraud users on one single day.

5.3 Q2. Generalizability of zooRank

We tested our method on multiple real-world datasets. In Table 3, we present our accuracy on each dataset. We observed that using maximum of the marginal contributions is better than using sum for all of the cases. Further, we also compared our method with a baseline approach. We define the following baseline: Block Score: defined as the maximum of all block suspiciousness scores a block is part of. From Fig. 4, it can be observed that our approach clearly is better than the mentioned approach.

Table 3. zooRank is generalizable over multiple datasets, and multiple modes that exist in the datasets.
Fig. 4.
figure 4

zooRank is generalizable. zooRank outperforms the baseline across different modes (see (a) and (b)) and across multiple datasets (see (c) and (d))

For Indian elections data, we did not have any ground truth. We extracted the top 100 suspicious users and evaluated them manually. The results for top 100 suspicious users are shown in Fig. 5. The user ids are sorted by their suspiciousness score, and plotted on the scatter plot along with top suspicious hashtags. Figure 5 clearly shows groups of suspicious users. It is evident that the first two users are “hashtag hijackers”. These two users tweeted spam messages with other hashtags but also focussed on generic hashtags related to the Indian elections. Both of these users have an identical behavior, which imply they do follow “lock-step” behavior. The second group of users were tweeting hashtags related to themselves and also generic hashtags related to the elections (“self-promoters”). We also spotted the user who tweets out all the trending topics at regular intervals, possibly through automated scripts (“trending topic aggregator”). We believe that the remaining users are users who were discussing indian elections a lot and were influencers in the political discussion. On further analysis, 20 users out of the 100 users were already suspended by Twitter. Thus, our algorithm was able to identify users that were considered spam by Twitter but also users that were missed by Twitter algorithm (“self-promoters”) but were clearly malicious.

Fig. 5.
figure 5

zooRank identifies fraudulent suspicious behavior in Twitter: Top 100 suspicious users, and top hashtags as identified by zooRank.

Fig. 6.
figure 6

Scalability of zooRank (a) zooRank scales linearly with the number of records. (b) zooRank scales linearly with the number of blocks we want to find.

5.4 Q3. Scalability of zooRank

In this section, we evaluate the scalability of the zooRank. We measure the effects of the number of blocks and the number of records on the runtime of zooRank. To study the effect of the number of records, we generated the dataset with given number of entries in 3 dimensions, where cardinality of each dimension is \(10^{6}\). For all our results, we used arithmetic metric and operated on the most suspicious 30 blocks. The results are shown in Fig. 6a, showing that our method scales linearly both in the data size and the number of blocks searched for. For the effect of the number of blocks, we generated a dataset with \(10^{4}\) records with a similar number of entries in each dimension.

6 Conclusions

In this paper, we proposed a set of axioms that a given individual suspiciousness scoring metric should follow. We presented such a metric that satisfies all the proposed axioms. Specifically, our contributions are as follows:

  • Individual-Suspiciousness Metric: We propose a suspiciousness metric which scores each entity participating in dense blocks. The proposed criteria \(f_{\mathcal {X}}(i)\) satisfies intuitive axioms.

  • Temporal Features: The proposed method provides ways to transform the numerical timestamp mode to information rich categorical temporal features.

  • Effectiveness: The proposed method zooRank was successfully tested on various real-world datasets. It scored the suspicious entities with high accuracy, and also uncovered interesting fraud patterns.

  • Scalability: The method is linearly scalable with the size of the data and can be used for big-data problems (see Fig. 6).