skip to main content
10.1145/1150402.1150441acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Generating semantic annotations for frequent patterns with context analysis

Published: 20 August 2006 Publication History

Abstract

As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent patterns, but little attention has been paid to the important nextstep - interpreting the discovered frequent patterns. Although some recent work has studied the compression and summarization of frequent patterns, the proposed techniques can only annotate a frequent pattern with non-semantical information (e.g. support), which provides only limited help for a user to understand the patterns.In this paper, we propose the novel problem of generating semantic annotations for frequent patterns. The goal is to annotate a frequent pattern with in-depth, concise, and structured information that can better indicate the hidden meanings of the pattern. We propose a general approach to generate such anannotation for a frequent pattern by constructing its context model, selecting informative context indicators, and extracting representative transactions and semantically similar patterns. This general approach has potentially many applications such as generating a dictionary-like description for a pattern, finding synonym patterns, discovering semantic relations, and summarizing semantic classes of a set of frequent patterns. Experiments on different datasets show that our approach is effective in generating semantic pattern annotations.

References

[1]
F. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent sets. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 12--19,2004.]]
[2]
R. Agrawal, T. Imieliski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207--216, 1993.]]
[3]
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, pages 3--14, 1995.]]
[4]
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: generalizing association rules to correlations. In Proceedings of the 1997 ACMSIGMOD International Conference on Management of Data, pages 265--276, 1997.]]
[5]
S. C. Deerwester, S. T. Dumais, T. K. Landauer,G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.]]
[6]
M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proceedings of ICDM'03, page 35, 2003.]]
[7]
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In FIMI'03 Workshop on Frequent Itemset Mining Implementations., 2003.]]
[8]
J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov., 8(1):53--87, 2004.]]
[9]
J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proceedings of ICDM'02, 2002.]]
[10]
P. Jaccard. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Nat., 44:223C--270, 1908.]]
[11]
P. Kantor and E. Voorhees. The TREC-5 confusion track: Comparing retrieval methods for scanned text. Information Retrieval, 2:165--176, 2000.]]
[12]
R. Krovetz. Viewing morphology as an inference process. In Proceedings of SIGIR '93, pages 191--202,1993.]]
[13]
D. Lin and P. Pantel. Induction of semantic classes from natural language text. In Proceedings of KDD'01, pages 317--322, 2001.]]
[14]
X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai, and B. Schatz. Automatically generating gene summaries from biomedical literature. In Proceedings of Pacific Symposium on Biocomputing, pages 40--51, 2006.]]
[15]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proceeding of the 7th International Conference on Database Theory, pages 398--416, 1999.]]
[16]
J. Roberto J. Bayardo. Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 85--93, 1998.]]
[17]
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975.]]
[18]
T. Tao, C. Zhai, X. Lu, and H. Fang. A study of statistical methods for function prediction of protein motifs. Applied Bioinformatics, 3(2-3):115--124, 2004.]]
[19]
K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In Proceedings of CIKM'99, pages 483--490, 1999.]]
[20]
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In Proceedings of VLDB'05, pages 709--720, 2005.]]
[21]
X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In Proceeding of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 314--323, 2005.]]
[22]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proceedings ICDM'02, pages 721--724, 2002.]]
[23]
X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets. In Proceedings of SDM'03, pages 166--177, 2003.]]

Cited By

View all
  • (2024)Data heterogeneity's impact on the performance of frequent itemset mining algorithmsInformation Sciences: an International Journal10.1016/j.ins.2024.120981678:COnline publication date: 1-Sep-2024
  • (2018)A Novel Method on Information Recommendation via Hybrid SimilarityIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2016.263357348:3(448-459)Online publication date: Mar-2018
  • (2018)Interpretation of text patternsData Mining and Knowledge Discovery10.1007/s10618-018-0556-z32:4(849-884)Online publication date: 1-Jul-2018
  • Show More Cited By

Index Terms

  1. Generating semantic annotations for frequent patterns with context analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. frequent pattern
    2. pattern annotation
    3. pattern context
    4. pattern semantic analysis

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data heterogeneity's impact on the performance of frequent itemset mining algorithmsInformation Sciences: an International Journal10.1016/j.ins.2024.120981678:COnline publication date: 1-Sep-2024
    • (2018)A Novel Method on Information Recommendation via Hybrid SimilarityIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2016.263357348:3(448-459)Online publication date: Mar-2018
    • (2018)Interpretation of text patternsData Mining and Knowledge Discovery10.1007/s10618-018-0556-z32:4(849-884)Online publication date: 1-Jul-2018
    • (2017)Conceptual annotation of text patternsComputational Intelligence10.1111/coin.1213333:4(948-979)Online publication date: 26-Jul-2017
    • (2017)Ranking and tagging bursty features in text streams with context language modelsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5144-z11:5(852-862)Online publication date: 1-Oct-2017
    • (2016)ReferencesText Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining10.1145/2915031.2915054Online publication date: 23-Jun-2016
    • (2016)AppendixesText Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining10.1145/2915031.2915053Online publication date: 23-Jun-2016
    • (2016)Toward A Unified System for Text Management and AnalysisText Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining10.1145/2915031.2915052Online publication date: 23-Jun-2016
    • (2016)Joint Analysis of Text and Structured DataText Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining10.1145/2915031.2915051Online publication date: 23-Jun-2016
    • (2016)Opinion Mining and Sentiment AnalysisText Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining10.1145/2915031.2915050Online publication date: 23-Jun-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media