Skip to main content
Log in

Detection of cross-channel anomalies

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common among the individual channel anomalies and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single-channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large-scale data stream analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adams B, Phung D, Venkatesh S (2009) Social reader: following social networks in the wilds of the blogosphere. In: Proceedings of the first SIGMM workshop on Social media, pp 73–80

  2. Agarwal D (2007) Detecting anomalies in cross-classified streams: a bayesian approach. Knowl Inf Syst 11(1): 29–44

    Article  Google Scholar 

  3. Allan, J (eds) (2002) Topic detection and tracking: event-based information organization. Kluwer, Boston

    MATH  Google Scholar 

  4. Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st ACM SIGIR, pp 37–45

  5. Blei DM, Ng AY, Jordan MY (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 993–1022

    MATH  Google Scholar 

  6. Brants T, Chen F, Farahat A (2003) A system for new event detection. In: Proceedings of the 26th ACM SIGIR, pp 330–337

  7. Budhaditya S, Pham DS, Lazarescu M, Venkatesh S (2009) Effective anomaly detection in sensor networks data streams. In: IEEE Proceedings of the ICDM, pp 722–727

  8. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3): 1–58

    Article  Google Scholar 

  9. Chandola V, Mithal V, Kumar V (2008) Comparative evaluation of anomaly detection techniques for sequence data. In: IEEE Proceedings of the ICDM, pp 743–748

  10. Chen K-Y, Luesukprasert L, Chou ST (2007) Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans Knowl Data Eng 19(8): 1016–1025

    Article  Google Scholar 

  11. de Vries T, Chawla S, Houle ME (2011) Density-preserving projections for large-scale local anomaly detection. Knowl Inf Syst 1–28. doi:10.1007/s10115-011-0430-4

  12. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6): 391–407

    Article  Google Scholar 

  13. Dereszynski EW, Dietterich TG (2007) Probabilistic models for anomaly detection in remote sensor data streams. In: 23rd Conference on UAI. Citeseer

  14. Eisenhardt M, Muller W, Henrich A (2003) Classifying documents by distributed p2p clustering. In: Informatik 2003: innovative information technology uses

  15. Fu Q, Lou JG, Wang Y, Li J,(2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: IEEE Proceedings of the ICDM, pp 149–158

  16. Fu Y, Cao L, Guo G, Huang TS (2008) Multiple feature fusion by subspace learning. In: Proceedings of the international conference on content-based image and video retrieval, ACM, pp 127–134

  17. Hammouda K, Kamel M (2006) Collaborative document clustering. In: Proceedings of the SDM, Citeseer, pp 453–463

  18. Hawkes AG (1982) Approximating the normal tail. The Statistican 31(3): 231–236

    Article  MathSciNet  Google Scholar 

  19. Hido S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26(2): 309–336

    Article  Google Scholar 

  20. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd ACM SIGIR, pp 50–57

  21. Huang L, Nguyen XL, Garofalakis M, Jordan MI, Joseph A, Taft N (2007) In-network PCA and anomaly detection. NIPS 19:617

    Google Scholar 

  22. Johnstone IM (2001) On the distribution of the largest eigenvalue in principal component analysis. Ann Stat 29(2): 295–327

    Article  MathSciNet  MATH  Google Scholar 

  23. Kashef R, Kamel MS (2010) Cooperative clustering. Pattern Recogn 43: 2315–2329

    Article  MATH  Google Scholar 

  24. Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: IEEE Proceedings of the ICDM, 8 pp

  25. Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4): 373–397

    Article  MathSciNet  Google Scholar 

  26. Lakhina A, Crovella M, Diot C (2004) Diagnosing network-wide traffic anomalies. ACM SIGCOMM 34(4): 219–230

    Article  Google Scholar 

  27. Li Z, Wang W, Li M, Ma WY (2005) A probabilistic model for retrospective news event detection. In: Proceedings of the 28th ACM SIGIR, pp 106–113

  28. Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1): 1–30

    Article  Google Scholar 

  29. Manevitz LM, Yousef M (2002) One-class svms for document classification. J Mach Learn Res 2: 139–154

    MATH  Google Scholar 

  30. Min K, Zhang Z, Wright J, Ma Y (2010) Decomposing background topics from keywords by principal component pursuit. In: Proceedings of the 19th ACM CIKM, pp 269–278

  31. Moerchen F, Brinker K, Neubauer C (2007) Any-time clustering of high frequency news streams. In: DMCS Workshop, 13th ACM SIGKDD

  32. Panov P, Džeroski S (2007) Combining bagging and random subspaces to create better ensembles. In: Proceedings of the 7th international conference on intelligent data analysis. Springer, New York, pp 118–129

  33. Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, pp 697–708

  34. Pham D-S, Saha B, Phung D, Venkatesh S (2011) Detection of cross-channel anomalies from multiple data channels. In: IEEE Proceedings of the ICDM

  35. Srivastava AN, Zane-Ulman B (2005) Discovering recurring anomalies in text reports regarding complex space systems. In: Proceedings of the IEEE Aerospace Conference

  36. Sun B, Mitra P, Giles CL, Yen J, Zha H (2007) Topic segmentation with shared topic detection and alignment of multiple documents. In: Proceedings of the 30th ACM SIGIR, pp 199–206

  37. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: IEEE Proceedings of the ICDM, 8 pp

  38. Vershynin R (2010) Introduction to the non-asymptotic analysis of random matrices, Arxiv preprint arxiv:1011.3027, 2010 (available at http://arxiv.org/abs/1011.3027)

  39. Wang B, Tang J, Fan W, Chen S, Tan C, Yang Z (2012) Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 1–37. doi:10.1007/s10115-011-0472-7

  40. Wang X, Zhang K, Jin X, Shen D (2009) Mining common topics from multiple asynchronous text streams. In: Proceedings of the 2nd WSDM, pp 192–201

  41. Wang X, Zhai C, Hu X, Sproat R (2007) Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD, pp 784–793

  42. Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st ACM SIGIR, pp 28–36

  43. Yu S, Tranchevent LC, Moor B, Moreau Y (2011) Kernel-based data fusion for machine learning: methods and applications in bioinformatics and text mining, vol 345. Springer, Berlin

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc-Son Pham.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, DS., Saha, B., Phung, D.Q. et al. Detection of cross-channel anomalies. Knowl Inf Syst 35, 33–59 (2013). https://doi.org/10.1007/s10115-012-0509-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0509-6

Keywords

Navigation