Community-based anomaly detection in evolutionary networks

Chen, Zhengzhang; Hendrix, William; Samatova, Nagiza F.

doi:10.1007/s10844-011-0183-2

Community-based anomaly detection in evolutionary networks

Published: 19 October 2011

Volume 39, pages 59–85, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Zhengzhang Chen^1,2,
William Hendrix¹ &
Nagiza F. Samatova^1,2

1573 Accesses
6 Altmetric
Explore all metrics

Abstract

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks—their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non–representative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11–46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph

Article 02 February 2022

A Comparative Study of Community Detection Techniques for Large Evolving Graphs

Ensemble clustering for graphs: comparisons and applications

Article Open access 22 July 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bader, D. A., & Madduri, K. (2006). Gtgraph: A synthetic graph generator suite. Technical Report GA 30332, Georgia Institute of Technology, Atlanta.
Baird, D., & Ulanowicz, R. E. (1989). The seasonal dynamics of the chesapeake bay ecosystem. Ecological Monographs, 59, 329–364.
Article Google Scholar
Chakrabarti, D. (2004). Autopart: Parameter-free graph partitioning and outlier detection. In PKDD (pp. 112–124).
Chakrabarti, D., Zhan, Y., & Faloutsos, C. (2004). R-mat: A recursive model for graph mining. In SDM.
Chan, P. K., & Mahoney, M. V. (2005). Modeling multiple time series for anomaly detection. In ICDM (pp. 90–97).
Chen, L., DeVries, A. L., & Cheng, C. H. (1997). Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod. Proceedings of the National Academy of Sciences of the United States of America, 94, 3817–3822.
Article Google Scholar
Cheng, H., Tan, P.-N., Potter, C., & Klooster, S. (2008). A robust graph–based algorithm for detection and characterization of anomalies in noisy multivariate time series. In IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008 (pp. 349–358).
Clauset, G., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70, 1–6.
Article Google Scholar
Eberle, W., & Holder, L. (2006). Detecting anomalies in cargo shipments using graph properties. In Proceedings of the IEEE intelligence and security informatics conference.
Eberle, W., & Holder, L. (2007). Discovering structural anomalies in graph–based data. In Workshops proceedings of the 7th IEEE international conference on data mining (pp. 393–398).
Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
Article MathSciNet MATH Google Scholar
Hautamäki, V., Kärkkäinen, I., & Fränti, P. (2004). Outlier detection using k-nearest neighbour graph. In ICPR (3) (pp. 430–433).
Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2004). Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101, 5249–5253.
Article Google Scholar
Keogh, E. J., Lin, J., & Fu, A. W.-C. (2005). Hot sax: Efficiently finding the most unusual time series subsequence. In ICDM (pp. 226–233).
Lin, S., & Chalupsky, H. (2003). Unsupervised link discovery in multi-relational data via rarity analysis. In ICDM (pp. 171–178).
Long, M., Betran, E., Thornton, K., & Wang, W. (2003). The origin of new genes: Glimpses from the young and old. Nature Reviews. Genetics, 4(11), 865–875.
Article Google Scholar
Moonesinghe, H., & Tan, P.-N. (2006). Outlier detection using random walks. In International Conference on Tools with Artificial Intelligence, ICTAI (pp. 532–539).
Noble, C. C., & Cook, D. J. (2003). Graph–based anomaly detection. In KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636). New York: ACM.
Chapter Google Scholar
Padmanabh, K., Vanteddu, A., Sen, S., & Gupta, P. (2007). Random walk on random graph based outlier detection in wireless sensor networks. In Wireless communication and sensor networks (pp. 45–49).
Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.
Article Google Scholar
Palla, G., Albert-László Barabási, A., & Vicsek, T. (2007). Quantifying social group evolution. Nature, 446, 664–667.
Article Google Scholar
Schmidt, M. C., Samatova, N. F., Thomas, K., & Park, B.-H. (2009). A scalable, parallel algorithm for maximal clique enumeration. Journal of Parallel and Distributed Computing, 69(4), 417–428.
Article Google Scholar
Shetty, J., & Adibi, J. (2005). Discovering important nodes through graph entropy the case of enron email database. In LinkKDD ’05: proceedings of the 3rd international workshop on link discovery (pp. 74–81). New York: ACM.
Chapter Google Scholar
Snel, B., Bork, P., & Huynen, M. A. (2000). Genome evolution. Gene fusion versus gene fission. Trends in Genetics, 16, 9–11.
Article Google Scholar
Staniford-chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagl, J., et al. (1996). Grids—a graph based intrusion detection system for large networks. In Proceedings of the 19th national information systems security conference (pp. 361–370).
Steinhaeuser, K., Chawla, N. V., & Ganguly, A. R. (2009). An exploration of climate data using complex networks. In SensorKDD ’09: Proceedings of the 3rd international workshop on knowledge discovery from sensor data (pp. 23–31). New York: ACM.
Chapter Google Scholar
Sun, J., Faloutsos, C., Papadimitriou, S., & Yu, P. S. (2007). Graphscope: Parameter-free mining of large time-evolving graphs. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 687–696). San Jose: ACM.
Chapter Google Scholar
Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In The 5th IEEE International Conference on Data Mining (ICDM) (pp. 418–425).
Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 374–383). New York: ACM.
Chapter Google Scholar
Tantipathananandh, C., Wolf, T. B., & Kempe, D. (2007). A framework for community identification in dynamic social networks. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 717–726). ACM.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.
Article Google Scholar
Zhang J. (2003). Evolution by gene duplication: An update. Trends in Ecology & Evolution, 18, 292–298.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Matthew C. Schmidt for his maximal clique enumeration program code, and we would like to thank Kevin A. Wilson and Ye Jin for valuable discussions.

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, Raleigh, NC, 27695, USA
Zhengzhang Chen, William Hendrix & Nagiza F. Samatova
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Zhengzhang Chen & Nagiza F. Samatova

Authors

Zhengzhang Chen
View author publications
You can also search for this author in PubMed Google Scholar
William Hendrix
View author publications
You can also search for this author in PubMed Google Scholar
Nagiza F. Samatova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nagiza F. Samatova.

Additional information

This work was supported in part by the U.S. Department of Energy, Office of Science, the office of Advanced Scientific Computing Research (ASCR) and the Office of Biological and Environmental Research (BER) and the U.S. National Science Foundation (Expeditions in Computing). Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under contract no. DEAC05-00OR22725.

Appendices

Appendix

Proofs for Theorems and Lemmas of Section 4.1

Lemma 1

If community $C^i_t$ has more than one predecessor (or successor), the sizes of its predecessors (or successors) are either all larger than $\left| C^i_t \right|$ or all smaller than $\left| C^i_t \right|$ .

Proof

Suppose otherwise, that $C^1_t$ has a predecessor with smaller size, as well as one with a larger size. Let $C^1_{t-1}$, $C^2_{t-1}$,..., $C^n_{t-1}$ (where n ≥ 2) be all predecessors of $C^i_t$, and suppose that $\left| C^j_{t-1} \right| < \left| C^i_t \right|$ and $\left| C^k_{t-1} \right| > \left| C^i_t \right|$ for some 1 ≤ j, k ≤ n, j ≠ k. From Definition 5 and the sizes of the three communities, we know that $C^j_{t-1} \subset C^i_t$ and $C^i_t \subset C^k_{t-1}$, so $C^j_{t-1} \subset C^k_{t-1}$. However, $C^j_{t-1}$ and $C^k_{t-1}$ are both maximal cliques in the same graph, and $C^j_{t-1} \subset C^k_{t-1}$ contradicts the definition of a maximal clique. Therefore, it is impossible to have the size of one predecessor be larger than the size of the community and the size of another predecessor be smaller than the size of the community.□

Theorem 1

Let G _t and G _t + 1 both be simple, undirected graphs, where communities are defined as maximal cliques. If G _t + 1 is the perturbed graph formed by either adding edges/nodes to or removing edges/nodes from the baseline graph G _t , then there are only six possible types of community-based anomalies between G _t and G _t + 1 : grown communities, shrunken communities, merged communities, split communities, born communities, and vanished communities, as defined in Definition 7.

Proof

Assume that $C_t^1, C_t^2, \ldots, C_t^m$ are all communities in G _t and that $V_t^1, V_t^2,$ $\ldots, V_t^m$ are the node sets of the communities, respectively. Also assume that $C_{t+1}^1, C_{t+1}^2, \ldots, C_t^n$ are all communities in G _t + 1 and that $V_{t+1}^1, V_{t+1}^2, \ldots, V_{t+1}^n$ are the node sets of the communities, respectively. Here, we define $V_t^i = V_{t+1}^j$ to mean that $V_t^i$ only contains all the nodes in $V_{t+1}^j$.

To determine the type of a specific community, we only need to compare the node sets of communities in G _t + 1 with the node sets of communities in G _t. If $V_{t+1}^j = V_t^i$, where 1 ≤ i ≤ m and 1 ≤ j ≤ n, then community $C_{t+1}^j$ contains exactly those nodes in community $C_t^i$, which means that $C_{t+1}^j$ is a conserved community and not an anomaly.

In the following, we consider all possible anomalies by analyzing all possible mappings between predecessors and successors. In particular, when deciding if community $C_{t}^i$ is an anomaly, we do not need to consider the situation where $C_{t}^i$ has a single successor as long as we have covered all cases for the predecessors of $C_{t+1}^j$. If the community $C_{t}^i$ has only one successor $C_{t+1}^j$, then the community $C_{t+1}^j$ should have either one predecessor or more than one predecessor, both of which can be covered by using predecessor conditions. The same reasoning applies for not considering the case where a community has more than one successor of larger size. In other words, we need to consider all cases for predecessors, but only two cases for successors: when a community has no successor and when a community has more than one successor of smaller size.

1.
For a specific j (where 1 ≤ j ≤ n), there is at least one i (where 1 ≤ i ≤ m) that satisfies $V_{t+1}^j \subset V_t^i$. Then, by Definition 5, community $C_{t+1}^j$ has at least one predecessor, including $C_{t}^i$, with larger size than $C_{t+1}^j$. Let $I =\{i \mid V_{t+1}^j \subset V_t^i\}$. There are two non-exclusive sub-cases here:
1. (a)
  For ℓ ∈ I, if there is some k (where 1 ≤ k ≤ n) other than j that satisfies $V_{t+1}^k \subset V_t^\ell$, then $C_t^\ell$ has more than one smaller-size successor ($C_{t+1}^j$ and $C_{t+1}^k$). Additionally, by Lemma 1, we know that $C_{t}^\ell$ cannot have a successor with larger size than $C_{t}^j$. Thus, $C_{t}^\ell$ is a split community, and $C_{t+1}^j$ is one of its products.
2. (b)
  For ℓ ∈ I, if there is no k (where 1 ≤ k ≤ n) other than j that satisfies $V_{t+1}^k \subset V_t^\ell$, then $C_t^\ell$ has only one smaller-size successor $C_{t+1}^j$, and $C_{t+1}^j$ has at least one predecessor, including $C_t^\ell$, with larger size. Also, by Lemma 1, we know that $C_{t+1}^j$ cannot have a predecessor with smaller size than $C_{t+1}^j$. Thus, $C_{t+1}^j$ is a shrunken community.
2.
For a specific j (where 1 ≤ j ≤ n), there is only one i (where 1 ≤ i ≤ m) that satisfies $V_{t+1}^j \supset V_t^i$. Then, community $C_{t+1}^j$ has one predecessor $C_{t}^i$ with smaller size than $C_{t+1}^j$. Additionally, by Lemma 1, we know that $C_{t+1}^j$ cannot have a predecessor with larger size than $C_{t+1}^j$. Thus, community $C_{t+1}^j$ is a grown community.
3.
For a specific j (where 1 ≤ j ≤ n), there is more than one i (where 1 ≤ i ≤ m) that satisfies $V_{t+1}^j \supset V_t^i$. Then, community $C_{t+1}^j$ has more than one predecessor with smaller size. Also, by Lemma 1, we know that $C_{t+1}^j$ cannot have a predecessor with larger size than $C_{t+1}^j$. Thus, community $C_{t+1}^j$ is a merged community.
4.
For a specific j (where 1 ≤ j ≤ n), there is no i (where 1 ≤ i ≤ m) that satisfies $V_{t+1}^j \supset V_t^i$ or $V_{t+1}^j \subset V_t^i$, which means that community $C_{t+1}^j$ has no predecessor. Thus, $C_{t+1}^j$ is a born community.
5.
For a specific i (where 1 ≤ i ≤ m), there is at least one j (where 1 ≤ j ≤ n) that satisfies $V_{t+1}^j \subset V_t^i$. Let $J = \{j \mid V_{t+1}^{j} \subset V_t^i\}$. Then, for each k ∈ J, there is at least one i (where 1 ≤ i ≤ m) that satisfies $V_{t+1}^{k} \subset V_t^i$, which is case 1. Thus, this case can be converted to case 1.
6.
For a specific i (where 1 ≤ i ≤ m), there is at least one j (where 1 ≤ j ≤ n) that satisfies $V_{t+1}^j \supset V_t^i$. Let $J = \{j \mid V_{t+1}^{j} \supset V_t^i\}$. Then, for each k ∈ J, there is at least one i (where 1 ≤ i ≤ m) that satisfies $V_{t+1}^{k} \supset V_t^i$, which is case 2 or 3. Thus, this case can be converted to case 2 or 3.
7.
For a specific i (where 1 ≤ i ≤ m), there is no j (where 1 ≤ j ≤ n) that satisfies $V_{t+1}^j \supset V_t^i$ or $V_{t+1}^j \subset V_t^i$, which means that community $C_{t}^i$ has no successor. Thus, $C_{t}^i$ is a vanished community.

Since all relationships between $V_{t+1}^j$ (where 1 ≤ j ≤ n) and $V_t^i$ (where 1 ≤ i ≤ m) have been covered, there are only six possible different types of community-based anomalies.□

Theorem 2

If community $C_t^i$ is represented by vertex $v_i \in C_t^i$ , and community $C_{t+1}^j$ is represented by vertex $v_j \in C_{t+1}^j$ , where $C_t^i \to C_{t+1}^j$ , then $v_i \in C_{t+1}^j$ or $v_j \in C_t^i$.

Proof

By Definition 5, $C_t^i \to C_{t+1}^j$ implies that $C_t^i \subseteq C_{t+1}^j$ or $C_{t+1}^j \subseteq C_t^i$. If $C_t^i \subseteq C_{t+1}^j$, then $v_i \in C_{t+1}^j$, and if $C_{t+1}^j \subseteq C_t^i$, then $v_j \in C_t^i$.□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Hendrix, W. & Samatova, N.F. Community-based anomaly detection in evolutionary networks . J Intell Inf Syst 39, 59–85 (2012). https://doi.org/10.1007/s10844-011-0183-2

Download citation

Received: 07 March 2011
Revised: 02 August 2011
Accepted: 06 October 2011
Published: 19 October 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10844-011-0183-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Community-based anomaly detection in evolutionary networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph

A Comparative Study of Community Detection Techniques for Large Evolving Graphs

Ensemble clustering for graphs: comparisons and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix

Proofs for Theorems and Lemmas of Section 4.1

Lemma 1

Proof

Theorem 1

Proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Community-based anomaly detection in evolutionary networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph

A Comparative Study of Community Detection Techniques for Large Evolving Graphs

Ensemble clustering for graphs: comparisons and applications

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix

Proofs for Theorems and Lemmas of Section 4.1

Lemma 1

Proof

Theorem 1

Proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation