article

Semantic annotation of frequent patterns

Authors:

Chengxiang ZhaiAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 1, Issue 3

Pages 11 - es

https://doi.org/10.1145/1297332.1297335

Published: 01 December 2007 Publication History

Abstract

Using frequent patterns to analyze data has been one of the fundamental approaches in many data mining applications. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent patterns, but little attention has been paid to the important next step—interpreting the discovered frequent patterns. Although the compression and summarization of frequent patterns has been studied in some recent work, the proposed techniques there can only annotate a frequent pattern with nonsemantical information (e.g., support), which provides only limited help for a user to understand the patterns.

In this article, we study the novel problem of generating semantic annotations for frequent patterns. The goal is to discover the hidden meanings of a frequent pattern by annotating it with in-depth, concise, and structured information. We propose a general approach to generate such an annotation for a frequent pattern by constructing its context model, selecting informative context indicators, and extracting representative transactions and semantically similar patterns. This general approach can well incorporate the user's prior knowledge, and has potentially many applications, such as generating a dictionary-like description for a pattern, finding synonym patterns, discovering semantic relations, and summarizing semantic classes of a set of frequent patterns. Experiments on different datasets show that our approach is effective in generating semantic pattern annotations.

References

[1]

Afrati, F., Gionis, A., and Mannila, H. 2004. Approximating a collection of frequent sets. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 12--19.

Digital Library

[2]

Agrawal, R., Imieliski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 207--216.

Digital Library

[3]

Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, 3--14.

Digital Library

[4]

Bayardo, J. R. J. 1998. Efficiently mining long patterns from databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 85--93.

Digital Library

[5]

Brin, S., Motwani, R., and Silverstein, C. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 265--276.

Digital Library

[6]

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, 89--96.

Digital Library

[7]

Church, K. W. and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist. 16, 1, 22--29.

Digital Library

[8]

Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. Wiley.

Digital Library

[9]

Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer.

Digital Library

[10]

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6, 391--407.

[11]

Deshpande, M., Kuramochi, M., and Karypis, G. 2003. Frequent sub-structure-based approaches for classifying chemical compounds. In Proceedings of the International Conference on Information and Data Management (ICDM), 35.

Digital Library

[12]

DuMouchel, W. and Pregibon, D. 2001. Empirical bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 67--76.

Digital Library

[13]

Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[14]

Gionis, A., Mannila, H., Mielikäinen, T., and Tsaparas, P. 2006. Assessing data mining results via swap randomization. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 167--176.

Digital Library

[15]

Grahne, G. and Zhu, J. 2003. Efficiently using prefix-trees in mining frequent itemsets. In FIMI'03 Workshop on Frequent Itemset Mining Implementations.

[16]

Han, J., Pei, J., Yin, Y., and Mao, R. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8, 1, 53--87.

Digital Library

[17]

Han, J., Wang, J., Lu, Y., and Tzvetkov, P. 2002. Mining top-k frequent closed patterns without minimum support. In Proceedings of the IEEE International Conference on Data Mining (ICDM).

Digital Library

[18]

Jaccard, P. 1908. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Nat. 44, 223C-270.

[19]

Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 133--142.

Digital Library

[20]

Kantor, P. and Voorhees, E. 2000. The TREC-5 confusion track: Comparing retrieval methods for scanned text. Inf. Retriev. 2, 165--176.

Digital Library

[21]

Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.

Digital Library

[22]

Krovetz, R. 1993. Viewing morphology as an inference process. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 191--202.

Digital Library

[23]

Lin, D. and Pantel, P. 2001. Induction of semantic classes from natural language text. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Recovery and Data Mining, 317--322.

Digital Library

[24]

Ling, X., Jiang, J., He, X., Mei, Q., Zhai, C., and Schatz, B. 2006. Automatically generating gene summaries from biomedical literature. In Proceedings of the Pacific Symposium on Biocomputing, 40--51.

[25]

Pantel, P. and Lin, D. 2002. Discovering word senses from text. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 613--619.

Digital Library

[26]

Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Discovering frequent closed itemsets for association rules. In Proceeding of the 7th International Conference on Database Theory, 398--416.

Digital Library

[27]

Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11, 613--620.

Digital Library

[28]

Tao, T., Zhai, C., Lu, X., and Fang, H. 2004. A study of statistical methods for function prediction of protein motifs. Appl. Bioinf. 3, 2-3, 115--124.

[29]

Wagner, R. A. and Fischer, M. J. 1974. The string-to-string correction problem. J. ACM 21, 1, 168--173.

Digital Library

[30]

Wang, K., Xu, C., and Liu, B. 1999. Clustering transactions using large items. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), 483--490.

Digital Library

[31]

Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33.

Digital Library

[32]

Xin, D., Han, J., Yan, X., and Cheng, H. 2005. Mining compressed frequent-pattern sets. In Proceedings of VLDB International Conference on Very Large Databases, 709--720.

Digital Library

[33]

Yan, X., Cheng, H., Han, J., and Xin, D. 2005. Summarizing itemset patterns: A profile-based approach. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 314--323.

Digital Library

[34]

Yan, X. and Han, J. 2002. gspan: Graph-Based substructure pattern mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 721--724.

Digital Library

[35]

Yan, X., Han, J., and Afshar, R. 2003. Clospan: Mining closed sequential patterns in large datasets. In Proceedings of the 3rd SIAM International Conference on Data Mining (SDM), 166--177.

Cited By

Lim SWang SLiu Y(2014)Discovering contextual tags from product review using semantic relatednessJournal of Industrial and Production Engineering10.1080/21681015.2014.89596631:2(108-118)Online publication date: 19-Mar-2014
https://doi.org/10.1080/21681015.2014.895966
Chikhaoui BWang SXiong TPigot H(2014)Pattern-based causal relationships discovery from event sequences for modeling behavioral user profile in ubiquitous environmentsInformation Sciences: an International Journal10.1016/j.ins.2014.06.026285:C(204-222)Online publication date: 20-Nov-2014
https://dl.acm.org/doi/10.1016/j.ins.2014.06.026
Kim KChung BChoi YLee SJung JPark J(2014)Language independent semantic kernels for short-text classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.07.09741:2(735-743)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1016/j.eswa.2013.07.097
Show More Cited By

Index Terms

Semantic annotation of frequent patterns
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Generating semantic annotations for frequent patterns with context analysis
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent ...
Discovering Periodic-Frequent Patterns in Transactional Databases
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Since mining frequent patterns from transactional databases involves an exponential mining space and generates a huge number of patterns, efficient discovery of user-interest-based frequent pattern set becomes the first priority for a mining algorithm. ...
The Studies of Mining Frequent Patterns Based on Frequent Pattern Tree
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Mining frequent patterns is to discover the groups of items appearing always together excess of a user specified threshold. Many approaches have been proposed for mining frequent pattern. However, either the search space or memory space is huge, such ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 1, Issue 3

December 2007

145 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/1297332

Issue’s Table of Contents

Copyright © 2007 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2007

Published in TKDD Volume 1, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,434
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lim SWang SLiu Y(2014)Discovering contextual tags from product review using semantic relatednessJournal of Industrial and Production Engineering10.1080/21681015.2014.89596631:2(108-118)Online publication date: 19-Mar-2014
https://doi.org/10.1080/21681015.2014.895966
Chikhaoui BWang SXiong TPigot H(2014)Pattern-based causal relationships discovery from event sequences for modeling behavioral user profile in ubiquitous environmentsInformation Sciences: an International Journal10.1016/j.ins.2014.06.026285:C(204-222)Online publication date: 20-Nov-2014
https://dl.acm.org/doi/10.1016/j.ins.2014.06.026
Kim KChung BChoi YLee SJung JPark J(2014)Language independent semantic kernels for short-text classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.07.09741:2(735-743)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1016/j.eswa.2013.07.097
Li XZhang LChen EZong YXu G(2013)Mining Frequent Patterns in Print Logs with Semantically Alternative LabelsPart II of the Proceedings of the 9th International Conference on Advanced Data Mining and Applications - Volume 834710.1007/978-3-642-53917-6_10(107-119)Online publication date: 14-Dec-2013
https://dl.acm.org/doi/10.1007/978-3-642-53917-6_10
Tang DTan Y(2011)Graph-Based Bioinformatics Mining Research and ApplicationProceedings of the 2011 Fourth International Symposium on Knowledge Acquisition and Modeling10.1109/KAM.2011.83(286-290)Online publication date: 8-Oct-2011
https://dl.acm.org/doi/10.1109/KAM.2011.83
Kim KChung BChoi YPark J(2011)Semantic Pattern Tree Kernels for Short-Text ClassificationProceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing10.1109/DASC.2011.202(1250-1252)Online publication date: 12-Dec-2011
https://dl.acm.org/doi/10.1109/DASC.2011.202
Vasavi SJayaprada SSrinivasa Rao V(2011)Extracting semantically similar frequent patterns using ontologiesProceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II10.1007/978-3-642-27242-4_19(157-165)Online publication date: 19-Dec-2011
https://dl.acm.org/doi/10.1007/978-3-642-27242-4_19
Volkova SHsu W(2010)Computational knowledge and information management in veterinary epidemiology2010 IEEE International Conference on Intelligence and Security Informatics10.1109/ISI.2010.5484764(120-125)Online publication date: May-2010
https://doi.org/10.1109/ISI.2010.5484764
Johnson Lim SLiu YLee W(2010)Multi-facet product information search and retrieval using semantically annotated product family ontologyInformation Processing and Management: an International Journal10.1016/j.ipm.2009.09.00146:4(479-493)Online publication date: 1-Jul-2010
https://dl.acm.org/doi/10.1016/j.ipm.2009.09.001
Chang JBoyd-Graber JBlei DElder JFogelman FFlach PZaki M(2009)Connections between the linesProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1557019.1557044(169-178)Online publication date: 28-Jun-2009
https://dl.acm.org/doi/10.1145/1557019.1557044
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents