Abstract
Given a heatmap for millions of points, what patterns exist in the distributions of point characteristics, and how can we detect them and separate anomalies in a way similar to human vision? In this paper, we propose a vision-guided algorithm, EagleMine, to recognize and summarize point groups in the feature spaces. EagleMine utilizes a water-level tree to capture group structures according to vision-based intuition at multiple resolutions, and adopts statistical hypothesis tests to determine the optimal groups along the tree. Moreover, EagleMine can identify anomalous micro-clusters (i.e., micro-size groups), which exhibit very similar behavior but deviate away from the majority. Extensive experiments are conducted for large graph scenario, and show that our method can recognize intuitive node groups as human vision does, and achieves the best performance in summarization compared to baselines. In terms of anomaly detection, EagleMine also outperforms state-of-the-art graph-based methods by significantly improving accuracy in synthetic and microblog datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
One of the largest microblog websites in China.
- 2.
Binary opening is a basic workhorse of morphological noise removal in computer vision and image processing. Here we use \(\underset{dim(\mathcal {H})}{\underbrace{2\times \cdots \times 2}}\) square-shape “probe”.
- 3.
This measures the goodness-of-fit of the left-truncated Gaussian distribution.
- 4.
The public datasets are available at: Amazon: http://konect.uni-koblenz.de/networks/amazon-ratings, Yelp: https://www.yelp.com/dataset_challenge, Flickr: https://www.aminer.cn/data-sna#Flickr-large, Youtube: http://networkrepository.com/soc-youtube.php, Tagged: https://linqs-data.soe.ucsc.edu/public/social_spammer/.
- 5.
The status is checked three years later (May 2017) with API provided by Sina weibo service.
References
Supplementary document (proof and additional experiments). https://goo.gl/ZjMwYe
Akoglu, L., Chau, D.H., Kang, U., Koutra, D., Faloutsos, C.: OPAvion: mining and visualization in large graphs. In: SIGMOD, pp. 717–720 (2012)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. PAMI 33, 898–916 (2011)
Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Robust information-theoretic clustering. In: KDD, pp. 65–75. ACM (2006)
Borkin, M., et al.: Evaluation of artery visualizations for heart disease diagnosis. IEEE Trans. Vis. Comput. Graph. 17, 2479–2488 (2011)
Buja, A., Tukey, P.A.: Computing and Graphics in Statistics. Springer, New York (1991)
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM TKDD 10(1), 5:1–5:51 (2015). https://doi.org/10.1145/2733381
Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: SIGKDD, pp. 79–88 (2004)
Chernobai, A., Rachev, S.T., Fabozzi, F.J.: Composite goodness-of-fit tests for left-truncated loss samples. In: Lee, C.-F., Lee, J.C. (eds.) Handbook of Financial Econometrics and Statistics, pp. 575–596. Springer, New York (2015). https://doi.org/10.1007/978-1-4614-7750-1_20
Cubedo, M., Oller, J.M.: Hypothesis testing: a model selection approach (2002)
DiCarlo, J.J., Zoccolan, D., Rust, N.C.: How does the brain solve visual object recognition? Neuron 73, 415–434 (2012)
Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory 21, 194–203 (1975)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Fakhraei, S., Foulds, J., Shashanka, M., Getoor, L.: Collective spammer detection in evolving multi-relational social networks. In: SIGKDD, KDD 2015. ACM (2015)
Gonzalez, R.C., Woods, R.E.: Digital image processing (2007)
Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2004)
Heynckes, M.: The predictive vs. the simulating brain: a literature review on the mechanisms behind mimicry. Maastricht Stud. J. Psychol. Neurosci. 4(15) (2016)
Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: FRAUDAR: bounding graph fraud in the face of camouflage. In: SIGKDD, pp. 895–904 (2016)
Huber, P.J.: Projection pursuit. Ann. Stat. 13(2), 435–475 (1985)
Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: CatchSync: catching synchronized behavior in large directed graphs. In: SIGKDD (2014)
Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Inferring strange behavior from connectivity pattern in social networks. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 126–138. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_11
Kang, U., Lee, J.-Y., Koutra, D., Faloutsos, C.: Net-ray: visualizing and mining billion-scale graphs. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 348–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_29
Kang, U., Meeder, B., Faloutsos, C.: Spectral analysis for billion-scale graphs: discoveries and implementation. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 13–25. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20847-8_2
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. JACM 46(5), 604–632 (1999). https://doi.org/10.1145/324133.324140
Koutra, D., Jin, D., Ning, Y., Faloutsos, C.: Perseus: an interactive large-scale graph mining and visualization tool. VLDB 8(12), 1924–1927 (2015)
Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: Yu, P., Han, J., Faloutsos, C. (eds.) Link Mining: Models, Algorithms, and Applications, pp. 337–357. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-6515-8_13
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80, 056117 (2009)
Liu, X.M., Ji, R., Wang, C., Liu, W., Zhong, B., Huang, T.S.: Understanding image structure via hierarchical shape parsing. In: CVPR (2015)
McAuley, J.J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: WWW (2013)
Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 435–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13672-6_42
Roerdink, J.B., Meijster, A.: The watershed transform: definitions, algorithms and parallelization strategies. Fundam. Informaticae 41, 187–228 (2000)
Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1, 27–64 (2007)
Stephens, M.A.: EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 63, 730–737 (1974)
Thompson, H.R.: Truncated normal distributions. Nature 165, 444–445 (1950)
Tukey, J.W., Tukey, P.A.: Computer graphics and exploratory data analysis: an introduction. National Computer Graphics Association (1985)
Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. PAMI 13, 583–598 (1991)
Wang, W., Yang, J., Muntz, R., et al.: STING: a statistical information grid approach to spatial data mining. In: VLDB, pp. 186–195 (1997)
Ware, C.: Color sequences for univariate maps: theory, experiments and principles. IEEE Comput. Graph. Appl. 8, 41–49 (1988)
Wilkinson, L., Anand, A., Grossman, R.: Graph-theoretic scagnostics. In: Proceedings - IEEE Symposium on Information Visualization, INFO VIS, pp. 157–164 (2005)
Acknowledgments
This material is based upon work supported by the Strategic Priority Research Program of CAS (XDA19020400), NSF of China (61772498, 61425016, 91746301, 61872206), and the Beijing NSF (4172059).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, W., Liu, S., Faloutsos, C., Hooi, B., Shen, H., Cheng, X. (2019). Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-16148-4_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)