skip to main content
article

Semantic annotation of frequent patterns

Published: 01 December 2007 Publication History

Abstract

Using frequent patterns to analyze data has been one of the fundamental approaches in many data mining applications. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent patterns, but little attention has been paid to the important next step—interpreting the discovered frequent patterns. Although the compression and summarization of frequent patterns has been studied in some recent work, the proposed techniques there can only annotate a frequent pattern with nonsemantical information (e.g., support), which provides only limited help for a user to understand the patterns.
In this article, we study the novel problem of generating semantic annotations for frequent patterns. The goal is to discover the hidden meanings of a frequent pattern by annotating it with in-depth, concise, and structured information. We propose a general approach to generate such an annotation for a frequent pattern by constructing its context model, selecting informative context indicators, and extracting representative transactions and semantically similar patterns. This general approach can well incorporate the user's prior knowledge, and has potentially many applications, such as generating a dictionary-like description for a pattern, finding synonym patterns, discovering semantic relations, and summarizing semantic classes of a set of frequent patterns. Experiments on different datasets show that our approach is effective in generating semantic pattern annotations.

References

[1]
Afrati, F., Gionis, A., and Mannila, H. 2004. Approximating a collection of frequent sets. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 12--19.
[2]
Agrawal, R., Imieliski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 207--216.
[3]
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, 3--14.
[4]
Bayardo, J. R. J. 1998. Efficiently mining long patterns from databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 85--93.
[5]
Brin, S., Motwani, R., and Silverstein, C. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 265--276.
[6]
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, 89--96.
[7]
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist. 16, 1, 22--29.
[8]
Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. Wiley.
[9]
Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer.
[10]
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6, 391--407.
[11]
Deshpande, M., Kuramochi, M., and Karypis, G. 2003. Frequent sub-structure-based approaches for classifying chemical compounds. In Proceedings of the International Conference on Information and Data Management (ICDM), 35.
[12]
DuMouchel, W. and Pregibon, D. 2001. Empirical bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 67--76.
[13]
Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[14]
Gionis, A., Mannila, H., Mielikäinen, T., and Tsaparas, P. 2006. Assessing data mining results via swap randomization. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 167--176.
[15]
Grahne, G. and Zhu, J. 2003. Efficiently using prefix-trees in mining frequent itemsets. In FIMI'03 Workshop on Frequent Itemset Mining Implementations.
[16]
Han, J., Pei, J., Yin, Y., and Mao, R. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8, 1, 53--87.
[17]
Han, J., Wang, J., Lu, Y., and Tzvetkov, P. 2002. Mining top-k frequent closed patterns without minimum support. In Proceedings of the IEEE International Conference on Data Mining (ICDM).
[18]
Jaccard, P. 1908. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Nat. 44, 223C-270.
[19]
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 133--142.
[20]
Kantor, P. and Voorhees, E. 2000. The TREC-5 confusion track: Comparing retrieval methods for scanned text. Inf. Retriev. 2, 165--176.
[21]
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.
[22]
Krovetz, R. 1993. Viewing morphology as an inference process. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 191--202.
[23]
Lin, D. and Pantel, P. 2001. Induction of semantic classes from natural language text. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Recovery and Data Mining, 317--322.
[24]
Ling, X., Jiang, J., He, X., Mei, Q., Zhai, C., and Schatz, B. 2006. Automatically generating gene summaries from biomedical literature. In Proceedings of the Pacific Symposium on Biocomputing, 40--51.
[25]
Pantel, P. and Lin, D. 2002. Discovering word senses from text. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 613--619.
[26]
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Discovering frequent closed itemsets for association rules. In Proceeding of the 7th International Conference on Database Theory, 398--416.
[27]
Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11, 613--620.
[28]
Tao, T., Zhai, C., Lu, X., and Fang, H. 2004. A study of statistical methods for function prediction of protein motifs. Appl. Bioinf. 3, 2-3, 115--124.
[29]
Wagner, R. A. and Fischer, M. J. 1974. The string-to-string correction problem. J. ACM 21, 1, 168--173.
[30]
Wang, K., Xu, C., and Liu, B. 1999. Clustering transactions using large items. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), 483--490.
[31]
Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33.
[32]
Xin, D., Han, J., Yan, X., and Cheng, H. 2005. Mining compressed frequent-pattern sets. In Proceedings of VLDB International Conference on Very Large Databases, 709--720.
[33]
Yan, X., Cheng, H., Han, J., and Xin, D. 2005. Summarizing itemset patterns: A profile-based approach. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 314--323.
[34]
Yan, X. and Han, J. 2002. gspan: Graph-Based substructure pattern mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 721--724.
[35]
Yan, X., Han, J., and Afshar, R. 2003. Clospan: Mining closed sequential patterns in large datasets. In Proceedings of the 3rd SIAM International Conference on Data Mining (SDM), 166--177.

Cited By

View all
  • (2014)Discovering contextual tags from product review using semantic relatednessJournal of Industrial and Production Engineering10.1080/21681015.2014.89596631:2(108-118)Online publication date: 19-Mar-2014
  • (2014)Pattern-based causal relationships discovery from event sequences for modeling behavioral user profile in ubiquitous environmentsInformation Sciences: an International Journal10.1016/j.ins.2014.06.026285:C(204-222)Online publication date: 20-Nov-2014
  • (2014)Language independent semantic kernels for short-text classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.07.09741:2(735-743)Online publication date: 1-Feb-2014
  • Show More Cited By

Index Terms

  1. Semantic annotation of frequent patterns

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 1, Issue 3
    December 2007
    145 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/1297332
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 December 2007
    Published in TKDD Volume 1, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Frequent pattern
    2. pattern annotation
    3. pattern context
    4. pattern semantic analysis

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Discovering contextual tags from product review using semantic relatednessJournal of Industrial and Production Engineering10.1080/21681015.2014.89596631:2(108-118)Online publication date: 19-Mar-2014
    • (2014)Pattern-based causal relationships discovery from event sequences for modeling behavioral user profile in ubiquitous environmentsInformation Sciences: an International Journal10.1016/j.ins.2014.06.026285:C(204-222)Online publication date: 20-Nov-2014
    • (2014)Language independent semantic kernels for short-text classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.07.09741:2(735-743)Online publication date: 1-Feb-2014
    • (2013)Mining Frequent Patterns in Print Logs with Semantically Alternative LabelsPart II of the Proceedings of the 9th International Conference on Advanced Data Mining and Applications - Volume 834710.1007/978-3-642-53917-6_10(107-119)Online publication date: 14-Dec-2013
    • (2011)Graph-Based Bioinformatics Mining Research and ApplicationProceedings of the 2011 Fourth International Symposium on Knowledge Acquisition and Modeling10.1109/KAM.2011.83(286-290)Online publication date: 8-Oct-2011
    • (2011)Semantic Pattern Tree Kernels for Short-Text ClassificationProceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing10.1109/DASC.2011.202(1250-1252)Online publication date: 12-Dec-2011
    • (2011)Extracting semantically similar frequent patterns using ontologiesProceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II10.1007/978-3-642-27242-4_19(157-165)Online publication date: 19-Dec-2011
    • (2010)Computational knowledge and information management in veterinary epidemiology2010 IEEE International Conference on Intelligence and Security Informatics10.1109/ISI.2010.5484764(120-125)Online publication date: May-2010
    • (2010)Multi-facet product information search and retrieval using semantically annotated product family ontologyInformation Processing and Management: an International Journal10.1016/j.ipm.2009.09.00146:4(479-493)Online publication date: 1-Jul-2010
    • (2009)Connections between the linesProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1557019.1557044(169-178)Online publication date: 28-Jun-2009
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media