Skip to main content
Log in

Community-based anomaly detection in evolutionary networks

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks—their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non–representative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11–46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Bader, D. A., & Madduri, K. (2006). Gtgraph: A synthetic graph generator suite. Technical Report GA 30332, Georgia Institute of Technology, Atlanta.

  • Baird, D., & Ulanowicz, R. E. (1989). The seasonal dynamics of the chesapeake bay ecosystem. Ecological Monographs, 59, 329–364.

    Article  Google Scholar 

  • Chakrabarti, D. (2004). Autopart: Parameter-free graph partitioning and outlier detection. In PKDD (pp. 112–124).

  • Chakrabarti, D., Zhan, Y., & Faloutsos, C. (2004). R-mat: A recursive model for graph mining. In SDM.

  • Chan, P. K., & Mahoney, M. V. (2005). Modeling multiple time series for anomaly detection. In ICDM (pp. 90–97).

  • Chen, L., DeVries, A. L., & Cheng, C. H. (1997). Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod. Proceedings of the National Academy of Sciences of the United States of America, 94, 3817–3822.

    Article  Google Scholar 

  • Cheng, H., Tan, P.-N., Potter, C., & Klooster, S. (2008). A robust graph–based algorithm for detection and characterization of anomalies in noisy multivariate time series. In IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008 (pp. 349–358).

  • Clauset, G., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70, 1–6.

    Article  Google Scholar 

  • Eberle, W., & Holder, L. (2006). Detecting anomalies in cargo shipments using graph properties. In Proceedings of the IEEE intelligence and security informatics conference.

  • Eberle, W., & Holder, L. (2007). Discovering structural anomalies in graph–based data. In Workshops proceedings of the 7th IEEE international conference on data mining (pp. 393–398).

  • Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.

    Article  MathSciNet  MATH  Google Scholar 

  • Hautamäki, V., Kärkkäinen, I., & Fränti, P. (2004). Outlier detection using k-nearest neighbour graph. In ICPR (3) (pp. 430–433).

  • Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2004). Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101, 5249–5253.

    Article  Google Scholar 

  • Keogh, E. J., Lin, J., & Fu, A. W.-C. (2005). Hot sax: Efficiently finding the most unusual time series subsequence. In ICDM (pp. 226–233).

  • Lin, S., & Chalupsky, H. (2003). Unsupervised link discovery in multi-relational data via rarity analysis. In ICDM (pp. 171–178).

  • Long, M., Betran, E., Thornton, K., & Wang, W. (2003). The origin of new genes: Glimpses from the young and old. Nature Reviews. Genetics, 4(11), 865–875.

    Article  Google Scholar 

  • Moonesinghe, H., & Tan, P.-N. (2006). Outlier detection using random walks. In International Conference on Tools with Artificial Intelligence, ICTAI (pp. 532–539).

  • Noble, C. C., & Cook, D. J. (2003). Graph–based anomaly detection. In KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636). New York: ACM.

    Chapter  Google Scholar 

  • Padmanabh, K., Vanteddu, A., Sen, S., & Gupta, P. (2007). Random walk on random graph based outlier detection in wireless sensor networks. In Wireless communication and sensor networks (pp. 45–49).

  • Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.

    Article  Google Scholar 

  • Palla, G., Albert-László Barabási, A., & Vicsek, T. (2007). Quantifying social group evolution. Nature, 446, 664–667.

    Article  Google Scholar 

  • Schmidt, M. C., Samatova, N. F., Thomas, K., & Park, B.-H. (2009). A scalable, parallel algorithm for maximal clique enumeration. Journal of Parallel and Distributed Computing, 69(4), 417–428.

    Article  Google Scholar 

  • Shetty, J., & Adibi, J. (2005). Discovering important nodes through graph entropy the case of enron email database. In LinkKDD ’05: proceedings of the 3rd international workshop on link discovery (pp. 74–81). New York: ACM.

    Chapter  Google Scholar 

  • Snel, B., Bork, P., & Huynen, M. A. (2000). Genome evolution. Gene fusion versus gene fission. Trends in Genetics, 16, 9–11.

    Article  Google Scholar 

  • Staniford-chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagl, J., et al. (1996). Grids—a graph based intrusion detection system for large networks. In Proceedings of the 19th national information systems security conference (pp. 361–370).

  • Steinhaeuser, K., Chawla, N. V., & Ganguly, A. R. (2009). An exploration of climate data using complex networks. In SensorKDD ’09: Proceedings of the 3rd international workshop on knowledge discovery from sensor data (pp. 23–31). New York: ACM.

    Chapter  Google Scholar 

  • Sun, J., Faloutsos, C., Papadimitriou, S., & Yu, P. S. (2007). Graphscope: Parameter-free mining of large time-evolving graphs. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 687–696). San Jose: ACM.

    Chapter  Google Scholar 

  • Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In The 5th IEEE International Conference on Data Mining (ICDM) (pp. 418–425).

  • Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 374–383). New York: ACM.

    Chapter  Google Scholar 

  • Tantipathananandh, C., Wolf, T. B., & Kempe, D. (2007). A framework for community identification in dynamic social networks. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 717–726). ACM.

  • Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.

    Article  Google Scholar 

  • Zhang J. (2003). Evolution by gene duplication: An update. Trends in Ecology & Evolution, 18, 292–298.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Matthew C. Schmidt for his maximal clique enumeration program code, and we would like to thank Kevin A. Wilson and Ye Jin for valuable discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nagiza F. Samatova.

Additional information

This work was supported in part by the U.S. Department of Energy, Office of Science, the office of Advanced Scientific Computing Research (ASCR) and the Office of Biological and Environmental Research (BER) and the U.S. National Science Foundation (Expeditions in Computing). Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under contract no. DEAC05-00OR22725.

Appendices

Appendix

Proofs for Theorems and Lemmas of Section 4.1

Lemma 1

If community \(C^i_t\) has more than one predecessor (or successor), the sizes of its predecessors (or successors) are either all larger than \(\left| C^i_t \right|\) or all smaller than \(\left| C^i_t \right|\) .

Proof

Suppose otherwise, that \(C^1_t\) has a predecessor with smaller size, as well as one with a larger size. Let \(C^1_{t-1}\), \(C^2_{t-1}\),..., \(C^n_{t-1}\) (where n ≥ 2) be all predecessors of \(C^i_t\), and suppose that \(\left| C^j_{t-1} \right| < \left| C^i_t \right|\) and \(\left| C^k_{t-1} \right| > \left| C^i_t \right|\) for some 1 ≤ j, k ≤ n, j ≠ k. From Definition 5 and the sizes of the three communities, we know that \(C^j_{t-1} \subset C^i_t\) and \(C^i_t \subset C^k_{t-1}\), so \(C^j_{t-1} \subset C^k_{t-1}\). However, \(C^j_{t-1}\) and \(C^k_{t-1}\) are both maximal cliques in the same graph, and \(C^j_{t-1} \subset C^k_{t-1}\) contradicts the definition of a maximal clique. Therefore, it is impossible to have the size of one predecessor be larger than the size of the community and the size of another predecessor be smaller than the size of the community.□

Theorem 1

Let G t and G t + 1 both be simple, undirected graphs, where communities are defined as maximal cliques. If G t + 1 is the perturbed graph formed by either adding edges/nodes to or removing edges/nodes from the baseline graph G t , then there are only six possible types of community-based anomalies between G t and G t + 1 : grown communities, shrunken communities, merged communities, split communities, born communities, and vanished communities, as defined in Definition 7.

Proof

Assume that \(C_t^1, C_t^2, \ldots, C_t^m\) are all communities in G t and that \(V_t^1, V_t^2,\) \(\ldots, V_t^m\) are the node sets of the communities, respectively. Also assume that \(C_{t+1}^1, C_{t+1}^2, \ldots, C_t^n\) are all communities in G t + 1 and that \(V_{t+1}^1, V_{t+1}^2, \ldots, V_{t+1}^n\) are the node sets of the communities, respectively. Here, we define \(V_t^i = V_{t+1}^j\) to mean that \(V_t^i\) only contains all the nodes in \(V_{t+1}^j\).

To determine the type of a specific community, we only need to compare the node sets of communities in G t + 1 with the node sets of communities in G t . If \(V_{t+1}^j = V_t^i\), where 1 ≤ i ≤ m and 1 ≤ j ≤ n, then community \(C_{t+1}^j\) contains exactly those nodes in community \(C_t^i\), which means that \(C_{t+1}^j\) is a conserved community and not an anomaly.

In the following, we consider all possible anomalies by analyzing all possible mappings between predecessors and successors. In particular, when deciding if community \(C_{t}^i\) is an anomaly, we do not need to consider the situation where \(C_{t}^i\) has a single successor as long as we have covered all cases for the predecessors of \(C_{t+1}^j\). If the community \(C_{t}^i\) has only one successor \(C_{t+1}^j\), then the community \(C_{t+1}^j\) should have either one predecessor or more than one predecessor, both of which can be covered by using predecessor conditions. The same reasoning applies for not considering the case where a community has more than one successor of larger size. In other words, we need to consider all cases for predecessors, but only two cases for successors: when a community has no successor and when a community has more than one successor of smaller size.

  1. 1.

    For a specific j (where 1 ≤ j ≤ n), there is at least one i (where 1 ≤ i ≤ m) that satisfies \(V_{t+1}^j \subset V_t^i\). Then, by Definition 5, community \(C_{t+1}^j\) has at least one predecessor, including \(C_{t}^i\), with larger size than \(C_{t+1}^j\). Let \(I =\{i \mid V_{t+1}^j \subset V_t^i\}\). There are two non-exclusive sub-cases here:

    1. (a)

      For ℓ ∈ I, if there is some k (where 1 ≤ k ≤ n) other than j that satisfies \(V_{t+1}^k \subset V_t^\ell\), then \(C_t^\ell\) has more than one smaller-size successor (\(C_{t+1}^j\) and \(C_{t+1}^k\)). Additionally, by Lemma 1, we know that \(C_{t}^\ell\) cannot have a successor with larger size than \(C_{t}^j\). Thus, \(C_{t}^\ell\) is a split community, and \(C_{t+1}^j\) is one of its products.

    2. (b)

      For ℓ ∈ I, if there is no k (where 1 ≤ k ≤ n) other than j that satisfies \(V_{t+1}^k \subset V_t^\ell\), then \(C_t^\ell\) has only one smaller-size successor \(C_{t+1}^j\), and \(C_{t+1}^j\) has at least one predecessor, including \(C_t^\ell\), with larger size. Also, by Lemma 1, we know that \(C_{t+1}^j\) cannot have a predecessor with smaller size than \(C_{t+1}^j\). Thus, \(C_{t+1}^j\) is a shrunken community.

  2. 2.

    For a specific j (where 1 ≤ j ≤ n), there is only one i (where 1 ≤ i ≤ m) that satisfies \(V_{t+1}^j \supset V_t^i\). Then, community \(C_{t+1}^j\) has one predecessor \(C_{t}^i\) with smaller size than \(C_{t+1}^j\). Additionally, by Lemma 1, we know that \(C_{t+1}^j\) cannot have a predecessor with larger size than \(C_{t+1}^j\). Thus, community \(C_{t+1}^j\) is a grown community.

  3. 3.

    For a specific j (where 1 ≤ j ≤ n), there is more than one i (where 1 ≤ i ≤ m) that satisfies \(V_{t+1}^j \supset V_t^i\). Then, community \(C_{t+1}^j\) has more than one predecessor with smaller size. Also, by Lemma 1, we know that \(C_{t+1}^j\) cannot have a predecessor with larger size than \(C_{t+1}^j\). Thus, community \(C_{t+1}^j\) is a merged community.

  4. 4.

    For a specific j (where 1 ≤ j ≤ n), there is no i (where 1 ≤ i ≤ m) that satisfies \(V_{t+1}^j \supset V_t^i\) or \(V_{t+1}^j \subset V_t^i\), which means that community \(C_{t+1}^j\) has no predecessor. Thus, \(C_{t+1}^j\) is a born community.

  5. 5.

    For a specific i (where 1 ≤ i ≤ m), there is at least one j (where 1 ≤ j ≤ n) that satisfies \(V_{t+1}^j \subset V_t^i\). Let \(J = \{j \mid V_{t+1}^{j} \subset V_t^i\}\). Then, for each k ∈ J, there is at least one i (where 1 ≤ i ≤ m) that satisfies \(V_{t+1}^{k} \subset V_t^i\), which is case 1. Thus, this case can be converted to case 1.

  6. 6.

    For a specific i (where 1 ≤ i ≤ m), there is at least one j (where 1 ≤ j ≤ n) that satisfies \(V_{t+1}^j \supset V_t^i\). Let \(J = \{j \mid V_{t+1}^{j} \supset V_t^i\}\). Then, for each k ∈ J, there is at least one i (where 1 ≤ i ≤ m) that satisfies \(V_{t+1}^{k} \supset V_t^i\), which is case 2 or 3. Thus, this case can be converted to case 2 or 3.

  7. 7.

    For a specific i (where 1 ≤ i ≤ m), there is no j (where 1 ≤ j ≤ n) that satisfies \(V_{t+1}^j \supset V_t^i\) or \(V_{t+1}^j \subset V_t^i\), which means that community \(C_{t}^i\) has no successor. Thus, \(C_{t}^i\) is a vanished community.

Since all relationships between \(V_{t+1}^j\) (where 1 ≤ j ≤ n) and \(V_t^i\) (where 1 ≤ i ≤ m) have been covered, there are only six possible different types of community-based anomalies.□

Theorem 2

If community \(C_t^i\) is represented by vertex \(v_i \in C_t^i\) , and community \(C_{t+1}^j\) is represented by vertex \(v_j \in C_{t+1}^j\) , where \(C_t^i \to C_{t+1}^j\) , then \(v_i \in C_{t+1}^j\) or \(v_j \in C_t^i\).

Proof

By Definition 5, \(C_t^i \to C_{t+1}^j\) implies that \(C_t^i \subseteq C_{t+1}^j\) or \(C_{t+1}^j \subseteq C_t^i\). If \(C_t^i \subseteq C_{t+1}^j\), then \(v_i \in C_{t+1}^j\), and if \(C_{t+1}^j \subseteq C_t^i\), then \(v_j \in C_t^i\).□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Hendrix, W. & Samatova, N.F. Community-based anomaly detection in evolutionary networks . J Intell Inf Syst 39, 59–85 (2012). https://doi.org/10.1007/s10844-011-0183-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0183-2

Keywords

Navigation