Detection of cross-channel anomalies

Pham, Duc-Son; Saha, Budhaditya; Phung, Dinh Q.; Venkatesh, Svetha

doi:10.1007/s10115-012-0509-6

Detection of cross-channel anomalies

Regular Paper
Published: 12 June 2012

Volume 35, pages 33–59, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Duc-Son Pham¹,
Budhaditya Saha²,
Dinh Q. Phung² &
…
Svetha Venkatesh²

374 Accesses
4 Citations
Explore all metrics

Abstract

The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common among the individual channel anomalies and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single-channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large-scale data stream analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adams B, Phung D, Venkatesh S (2009) Social reader: following social networks in the wilds of the blogosphere. In: Proceedings of the first SIGMM workshop on Social media, pp 73–80
Agarwal D (2007) Detecting anomalies in cross-classified streams: a bayesian approach. Knowl Inf Syst 11(1): 29–44
Article Google Scholar
Allan, J (eds) (2002) Topic detection and tracking: event-based information organization. Kluwer, Boston
MATH Google Scholar
Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st ACM SIGIR, pp 37–45
Blei DM, Ng AY, Jordan MY (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 993–1022
MATH Google Scholar
Brants T, Chen F, Farahat A (2003) A system for new event detection. In: Proceedings of the 26th ACM SIGIR, pp 330–337
Budhaditya S, Pham DS, Lazarescu M, Venkatesh S (2009) Effective anomaly detection in sensor networks data streams. In: IEEE Proceedings of the ICDM, pp 722–727
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3): 1–58
Article Google Scholar
Chandola V, Mithal V, Kumar V (2008) Comparative evaluation of anomaly detection techniques for sequence data. In: IEEE Proceedings of the ICDM, pp 743–748
Chen K-Y, Luesukprasert L, Chou ST (2007) Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans Knowl Data Eng 19(8): 1016–1025
Article Google Scholar
de Vries T, Chawla S, Houle ME (2011) Density-preserving projections for large-scale local anomaly detection. Knowl Inf Syst 1–28. doi:10.1007/s10115-011-0430-4
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6): 391–407
Article Google Scholar
Dereszynski EW, Dietterich TG (2007) Probabilistic models for anomaly detection in remote sensor data streams. In: 23rd Conference on UAI. Citeseer
Eisenhardt M, Muller W, Henrich A (2003) Classifying documents by distributed p2p clustering. In: Informatik 2003: innovative information technology uses
Fu Q, Lou JG, Wang Y, Li J,(2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: IEEE Proceedings of the ICDM, pp 149–158
Fu Y, Cao L, Guo G, Huang TS (2008) Multiple feature fusion by subspace learning. In: Proceedings of the international conference on content-based image and video retrieval, ACM, pp 127–134
Hammouda K, Kamel M (2006) Collaborative document clustering. In: Proceedings of the SDM, Citeseer, pp 453–463
Hawkes AG (1982) Approximating the normal tail. The Statistican 31(3): 231–236
Article MathSciNet Google Scholar
Hido S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26(2): 309–336
Article Google Scholar
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd ACM SIGIR, pp 50–57
Huang L, Nguyen XL, Garofalakis M, Jordan MI, Joseph A, Taft N (2007) In-network PCA and anomaly detection. NIPS 19:617
Google Scholar
Johnstone IM (2001) On the distribution of the largest eigenvalue in principal component analysis. Ann Stat 29(2): 295–327
Article MathSciNet MATH Google Scholar
Kashef R, Kamel MS (2010) Cooperative clustering. Pattern Recogn 43: 2315–2329
Article MATH Google Scholar
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: IEEE Proceedings of the ICDM, 8 pp
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4): 373–397
Article MathSciNet Google Scholar
Lakhina A, Crovella M, Diot C (2004) Diagnosing network-wide traffic anomalies. ACM SIGCOMM 34(4): 219–230
Article Google Scholar
Li Z, Wang W, Li M, Ma WY (2005) A probabilistic model for retrospective news event detection. In: Proceedings of the 28th ACM SIGIR, pp 106–113
Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1): 1–30
Article Google Scholar
Manevitz LM, Yousef M (2002) One-class svms for document classification. J Mach Learn Res 2: 139–154
MATH Google Scholar
Min K, Zhang Z, Wright J, Ma Y (2010) Decomposing background topics from keywords by principal component pursuit. In: Proceedings of the 19th ACM CIKM, pp 269–278
Moerchen F, Brinker K, Neubauer C (2007) Any-time clustering of high frequency news streams. In: DMCS Workshop, 13th ACM SIGKDD
Panov P, Džeroski S (2007) Combining bagging and random subspaces to create better ensembles. In: Proceedings of the 7th international conference on intelligent data analysis. Springer, New York, pp 118–129
Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, pp 697–708
Pham D-S, Saha B, Phung D, Venkatesh S (2011) Detection of cross-channel anomalies from multiple data channels. In: IEEE Proceedings of the ICDM
Srivastava AN, Zane-Ulman B (2005) Discovering recurring anomalies in text reports regarding complex space systems. In: Proceedings of the IEEE Aerospace Conference
Sun B, Mitra P, Giles CL, Yen J, Zha H (2007) Topic segmentation with shared topic detection and alignment of multiple documents. In: Proceedings of the 30th ACM SIGIR, pp 199–206
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: IEEE Proceedings of the ICDM, 8 pp
Vershynin R (2010) Introduction to the non-asymptotic analysis of random matrices, Arxiv preprint arxiv:1011.3027, 2010 (available at http://arxiv.org/abs/1011.3027)
Wang B, Tang J, Fan W, Chen S, Tan C, Yang Z (2012) Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 1–37. doi:10.1007/s10115-011-0472-7
Wang X, Zhang K, Jin X, Shen D (2009) Mining common topics from multiple asynchronous text streams. In: Proceedings of the 2nd WSDM, pp 192–201
Wang X, Zhai C, Hu X, Sproat R (2007) Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD, pp 784–793
Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st ACM SIGIR, pp 28–36
Yu S, Tranchevent LC, Moor B, Moreau Y (2011) Kernel-based data fusion for machine learning: methods and applications in bioinformatics and text mining, vol 345. Springer, Berlin
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Curtin University, Perth, WA, Australia
Duc-Son Pham
Center for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, VIC, Australia
Budhaditya Saha, Dinh Q. Phung & Svetha Venkatesh

Authors

Duc-Son Pham
View author publications
You can also search for this author in PubMed Google Scholar
Budhaditya Saha
View author publications
You can also search for this author in PubMed Google Scholar
Dinh Q. Phung
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc-Son Pham.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, DS., Saha, B., Phung, D.Q. et al. Detection of cross-channel anomalies. Knowl Inf Syst 35, 33–59 (2013). https://doi.org/10.1007/s10115-012-0509-6

Download citation

Received: 19 December 2011
Revised: 12 February 2012
Accepted: 17 May 2012
Published: 12 June 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10115-012-0509-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of cross-channel anomalies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Compressed Sensing

Anomaly detection in multifactor data

Principle component analysis: Robust versions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Detection of cross-channel anomalies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Compressed Sensing

Anomaly detection in multifactor data

Principle component analysis: Robust versions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation