skip to main content
10.1145/2389656.2389660acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Efficient mining of correlated sequential patterns based on null hypothesis

Published: 29 October 2012 Publication History

Abstract

Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns.
In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.

References

[1]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[2]
S. Banerjee and T. Pedersen. The design, implementation, and use of the ngram statistics package. In CICLing, 2003.
[3]
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD, 1997.
[4]
C. Chen, C. X. Lin, X. Yan, and J. Han. On effective presentation of graph patterns: a structural representative approach. In CIKM, 2008.
[5]
K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. In Comput. Linguist., 1990.
[6]
G. Corder and D. Foreman. Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. Wiley, 2009.
[7]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 3 edition, 2006.
[8]
J. Han and J. Pei. Mining frequent patterns by pattern-growth: Methodology and implications. 2000.
[9]
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In KDD, 2000.
[10]
M. A. Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. Origami: Mining representative orthogonal graph patterns. In ICDM, 2007.
[11]
Y. Ke, J. Cheng, and W. Ng. Mining quantitative correlated patterns using an information-theoretic approach. In KDD, 2006.
[12]
Y. Ke, J. Cheng, and J. X. Yu. Efficient discovery of frequent correlated subgraph pairs. In ICDM, 2009.
[13]
Y. Ke, J. Cheng, and J. X. Yu. Top-k correlative graph mining. In SDM, 2009.
[14]
S. Kim, M. Barsky, and J. Han. Efficient mining of top correlated patterns based on null-invariant measures. In PKDD, 2011.
[15]
D. E. Knuth. The Art of Computer Programming: Sorting and Searching. Addison-Wesley, 1968.
[16]
Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J. Han. Comine: Efficient mining of correlated patterns. In ICDM, 2003.
[17]
C. X. Lin, Q. Mei, J. Han, Y. Jiang, and M. Danilevsky. The joint inference of topic diffusion and evolution in social communities. In ICDM, 2011.
[18]
C. X. Lin, B. Zhao, Q. Mei, and J. Han. Pet: a statistical model for popular events tracking in social communities. In KDD, 2010.
[19]
C. D. Manning and H. Schtze. Foundations of statistical natural language processing. MIT Press, 1999.
[20]
Q. Mei, X. Shen, and C. Zhai. Automatic labeling of multinomial topic models. In KDD, 2007.
[21]
E. Omiecinski. Alternative interest measures for mining associations in databases. Trans. Knowl. Data Eng., 2003.
[22]
J. Pei and J. Han. Constrained frequent pattern mining: a pattern-growth view. In KDD, 2002.
[23]
J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Trans. Knowledge and Data Engineering, 2004.
[24]
J. L. Rodgers and W. A. Nicewander. Thirteen ways to look at the correlation coefficient. In The American Statistician, 1988.
[25]
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right objective measure for association analysis. In KDD, 2002.
[26]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In KDD, 2008.
[27]
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD, 2004.
[28]
G. I. Webb. Self-sufficient itemsets: An approach to screening potentially interesting associations between items. TKDD, 4(1), 2010.
[29]
T. Wu, Y. Chen, and J. Han. Re-examination of interestingness measures in pattern mining: A unified framework. In Data Mining and Knowledge Discovery, 2010.
[30]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, 2002.
[31]
Z. Yin, L. Cao, J. Han, J. Luo, and T. S. Huang. Diversified trajectory pattern ranking in geo-tagged social media. In SDM, 2011.
[32]
J. Zhang, B. Jiang, M. Li, J. Tromp, X. Zhang, and M. Q. Zhang. Computing exact p-values for dna motifs. Bioinformatics, 23(5):531--537, 2007.
[33]
S. Zhang, J. Yang, and S. Li. Ring: An integrated method for frequent representative subgraph mining. In ICDM, 2009.

Cited By

View all
  • (2019)Order Dependency in Sequential Correlation2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE)10.1109/ICECTE48615.2019.9303557(49-52)Online publication date: 26-Dec-2019
  • (2018)Mining Sequential Correlation with a New MeasureAdvances in Data Mining. Applications and Theoretical Aspects10.1007/978-3-319-95786-9_3(29-43)Online publication date: 4-Jul-2018
  • (2017)Extracting user behavior-related words and phrases using temporal patterns of sequential pattern evaluation indicesVietnam Journal of Computer Science10.1007/s40595-016-0084-y4:3(147-160)Online publication date: 1-Aug-2017
  • Show More Cited By

Index Terms

  1. Efficient mining of correlated sequential patterns based on null hypothesis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Web-KR '12: Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
    October 2012
    32 pages
    ISBN:9781450317115
    DOI:10.1145/2389656
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. correlated pattern mining
    2. frequent pattern mining

    Qualifiers

    • Research-article

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 4 of 4 submissions, 100%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Order Dependency in Sequential Correlation2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE)10.1109/ICECTE48615.2019.9303557(49-52)Online publication date: 26-Dec-2019
    • (2018)Mining Sequential Correlation with a New MeasureAdvances in Data Mining. Applications and Theoretical Aspects10.1007/978-3-319-95786-9_3(29-43)Online publication date: 4-Jul-2018
    • (2017)Extracting user behavior-related words and phrases using temporal patterns of sequential pattern evaluation indicesVietnam Journal of Computer Science10.1007/s40595-016-0084-y4:3(147-160)Online publication date: 1-Aug-2017
    • (2015)Analyzing User Behaviors Based on Temporal Patterns of Sequential Pattern Evaluation Indices on TwitterRevised Selected Papers of the PAKDD 2015 Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 944110.1007/978-3-319-25660-3_15(177-188)Online publication date: 19-May-2015
    • (2012)The 2012 international workshop on web-scale knowledge representation, retrieval, and reasoningProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398755(2760-2761)Online publication date: 29-Oct-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media