skip to main content
research-article

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

Published: 20 July 2019 Publication History

Abstract

Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.

References

[1]
Avinash Achar, A. Ibrahim, and P. S. Sastry. 2013. Pattern-growth based frequent serial episode discovery. DKE 87 (2013), 91--108.
[2]
Avinash Achar, Srivatsan Laxman, and P. S. Sastry. 2012. A unified view of the a priori-based algorithms for frequent episode discovery. KAIS 31, 2 (2012), 223--250.
[3]
Avinash Achar, Srivatsan Laxman, Raajay Viswanathan, and P. S. Sastry. 2012. Discovering injective episodes with general partial orders. DMKD 25, 1 (2012), 67--108.
[4]
Xiang Ao, Yang Liu, Zhen Huang, Luo Zuo, and Qing He. 2018. Free-rider episode screening via dual partition model. In DASFAA. 665--683.
[5]
Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, and Qing He. 2015. Online frequent episode mining. In ICDE. 891--902.
[6]
Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. 2018. Discovering and learning sensational episodes of news events. Information Systems 78 (2018), 68--80.
[7]
Xiang Ao, Ping Luo, Jin Wang, Fuzhen Zhuang, and Qing He. 2018. Mining precise-positioning episode rules from event sequences. IEEE TKDE 30, 3 (2018), 530--543.
[8]
Mikhail Atallah, Wojciech Szpankowski, and R. Gwadera. 2004. Detection of significant sets of episodes in event sequences. In ICDM. 3--10.
[9]
Marina Barsky, Sangkyum Kim, Tim Weninger, and Jiawei Han. 2011. Mining flipping correlations from large datasets with taxonomies. VLDB 5, 4 (2011), 370--381.
[10]
Kaustubh Beedkar and Rainer Gemulla. 2015. LASH: Large-scale sequence mining with hierarchies. In SIGMOD. 491--503.
[11]
Bouchra Bouqata, Christopher D. Carothers, Boleslaw K. Szymanski, and Mohammed J. Zaki. 2006. Vogue: A novel variable order-gap state machine for modeling sequences. In PKDD. 42--54.
[12]
Alexandra M. Carvalho, Arlindo L. Oliveira, Ana T. Freitas, and Marie-France Sagot. 2004. A parallel algorithm for the extraction of structured motifs. In SAC. 147--153.
[13]
Gemma Casas-Garriga. 2003. Discovering unbounded episodes in sequential data. In PKDD. 83--94.
[14]
Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. 2005. A sampling-based framework for parallel data mining. In PPoPP. 255--265.
[15]
Shengnan Cong, Jiawei Han, and David Padua. 2005. Parallel mining of closed sequential patterns. In KDD. 562--567.
[16]
Lina Fahed, Armelle Brun, and Anne Boyer. 2018. DEER: Distant and essential episode rules for early prediction. ESWA 93 (2018), 283--298.
[17]
Robert R. Grauer, Nils H. Hakansson, and Frederick C. Shen. 1990. Industry rotation in the US stock market: 1934--1986 returns on passive, semi-passive, and active strategies. Journal of Banking 8 Finance (1990).
[18]
Jiaqi Gu, Jin Wang, and Carlo Zaniolo. 2016. Ranking support for matched patterns over complex event streams: The CEPR system. In ICDE. 1354--1357.
[19]
Jiawei Han and Yongjian Fu. 1995. Discovery of multiple-level association rules from large databases. In VLDB, Vol. 95. 420--431.
[20]
Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8, 1 (2004), 53--87.
[21]
Kuo-Yu Huang and Chia-Hui Chang. 2008. Efficient mining of frequent episodes from complex sequences. Information Systems 33, 1 (2008), 96--114.
[22]
Klaus Julisch. 2002. Data mining for intrusion detection. In Applications of Data Mining in Computer Security.
[23]
Shibamouli Lahiri. 2014. Complexity of word collocation networks: A preliminary structural analysis. In EACL Workshop.
[24]
Srivatsan Laxman, P. S. Sastry, and K. P. Unnikrishnan. 2007. A fast algorithm for finding frequent episodes in event streams. In KDD. 410--419.
[25]
Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM TIST 3, 1 (2011), 17.
[26]
Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. 2012. Syntactic annotations for the Google Books Ngram Corpus. In ACL. 169--174.
[27]
Yu Feng Lin, Cheng Wei Wu, Chien Feng Huang, and Vincent S. Tseng. 2015. Discovering utility-based episode rules in complex event sequences. ESWA 42, 12 (2015), 5303--5314.
[28]
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In IJCAI. 4244--4250.
[29]
Xi Ma, HweeHwa Pang, and Kian-Lee Tan. 2004. Finding constrained frequent episodes using minimal occurrences. In ICDM. 471--474.
[30]
Heikki Mannila and Hannu Toivonen. 1996. Discovering generalized episodes using minimal occurrences. In KDD, Vol. 96. 146--151.
[31]
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. 1997. Discovery of frequent episodes in event sequences. DMKD 1, 3 (1997), 259--289.
[32]
Iris Miliaraki, Klaus Berberich, Rainer Gemulla, and Spyros Zoupanos. 2013. Mind the gap: Large-scale frequent sequence mining. In SIGMOD. 797--808.
[33]
Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In WSDM. 227--236.
[34]
Anny Ng and Ada Wai-Chee Fu. 2003. Mining frequent episodes for relating financial events and stock trends. In PAKDD. 27--39.
[35]
Debprakash Patnaik, Patrick Butler, Naren Ramakrishnan, Laxmi Parida, Benjamin J. Keller, and David A. Hanauer. 2011. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In KDD. 360--368.
[36]
Majed Sahli, Essam Mansour, and Panos Kalnis. 2013. Parallel motif extraction from very long sequences. In CIKM. 549--558.
[37]
Ramakrishnan Srikant and Rakesh Agrawal. 1995. Mining generalized association rules. In VLDB.
[38]
Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In EDBT. 1--17.
[39]
Nikolaj Tatti. 2014. Discovering episodes with compact minimal windows. DMKD 28, 4 (2014), 1046--1077.
[40]
Nikolaj Tatti. 2015. Ranking episodes using a partition model. DMKD 29, 5 (2015), 1312--1342.
[41]
Nikolaj Tatti and Boris Cule. 2011. Mining closed episodes with simultaneous events. In KDD. 1172--1180.
[42]
Nikolaj Tatti and Jilles Vreeken. 2012. The long and the short of it: Summarising event sequences with serial episodes. In KDD. 462--470.
[43]
K. P. Unnikrishnan, Basel Q. Shadid, P. S. Sastry, and Srivatsan Laxman. 2009. Root cause diagnostics using temporal data mining. U.S. Patent No. 7,509,234, Issued Mar. 24th, 2009.
[44]
Cheng-Wei Wu, Yu-Feng Lin, S. Yu Philip, and Vincent S. Tseng. 2013. Mining high utility episodes in complex event sequences. In KDD. 536--544.
[45]
Mohammed J. Zaki. 2001. Parallel sequence mining on shared-memory machines. JPDC 61, 3 (2001), 401--426.
[46]
Chao Zhang, Jiawei Han, Lidan Shou, Jiajun Lu, and Thomas La Porta. 2014. Splitter: Mining fine-grained sequential patterns in semantic trajectories. In VLDB. 769--780.
[47]
Yong Zhang, Jiacheng Wu, Jin Wang, and Chunxiao Xing. 2019. A transformation-based framework for KNN set similarity search. IEEE Trans. Knowl. Data Eng. (2019).

Cited By

View all
  • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 13-Apr-2024
  • (2024)Breadth-First Search Approach for Mining Serial Episodes with Simultaneous EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632445(36-44)Online publication date: 4-Jan-2024
  • (2024)OPF-Miner: Order-Preserving Pattern Mining With Forgetting Mechanism for Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.343827436:12(8981-8995)Online publication date: 1-Dec-2024
  • Show More Cited By

Index Terms

  1. Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 10, Issue 4
    Survey Papers and Regular Papers
    July 2019
    327 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3344873
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2019
    Accepted: 01 April 2019
    Revised: 01 January 2019
    Received: 01 July 2018
    Published in TIST Volume 10, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Frequent episode mining
    2. hierarchy-aware maximal/closed episode
    3. large-scale sequence mining
    4. peak episode miner

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Key Research and Development Program of China
    • National Natural Science Foundation of China
    • CCF-Tencent Rhino-Bird Young Faculty Open Research
    • Ant Financial through the Ant Financial Science Funds for Security Research
    • Youth Innovation Promotion Association CAS

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 13-Apr-2024
    • (2024)Breadth-First Search Approach for Mining Serial Episodes with Simultaneous EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632445(36-44)Online publication date: 4-Jan-2024
    • (2024)OPF-Miner: Order-Preserving Pattern Mining With Forgetting Mechanism for Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.343827436:12(8981-8995)Online publication date: 1-Dec-2024
    • (2024)RNP-Miner: Repetitive Nonoverlapping Sequential Pattern MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333430036:9(4874-4889)Online publication date: 1-Sep-2024
    • (2024)Cross-Layer Alarm Association Rules Discovery of Cloud-Network based on Knowledge Graph2024 International Wireless Communications and Mobile Computing (IWCMC)10.1109/IWCMC61514.2024.10592472(400-405)Online publication date: 27-May-2024
    • (2024)Towards episode rules with non-overlapping frequency and targeted miningInformation Sciences10.1016/j.ins.2024.121028678(121028)Online publication date: Sep-2024
    • (2024)An efficient pruning method for mining inter-sequence patterns based on pseudo-IDListExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121738238:PBOnline publication date: 27-Feb-2024
    • (2023)From basic approaches to novel challenges and applications in Sequential Pattern MiningElectronic Research Archive10.3934/aci.20230043:1(44-78)Online publication date: 2023
    • (2023)On Real-time Failure Localization via Instance Correlation in Optical Transport Networks2023 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking57963.2023.10186406(1-9)Online publication date: 12-Jun-2023
    • (2023)COPP-Miner: Top-k Contrast Order-Preserving Pattern Mining for Time Series ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332174936:6(2372-2387)Online publication date: 19-Oct-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media