Separating Structure from Interestingness

Mielikäinen, Taneli

doi:10.1007/978-3-540-24775-3_58

Taneli Mielikäinen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3056))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2929 Accesses
7 Citations

Abstract

Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented.

In this paper we propose a general approach to build condensed representations of pattern collections. The approach is based on separating the structure of the pattern collection from the interestingness values of the patterns. We study also the concrete case of representing the frequent sets and their (approximate) frequencies following this approach: we discuss the trade-offs in representing the frequent sets by the maximal frequent sets, the minimal infrequent sets and their combinations, and investigate the problem approximating the frequencies from samples by giving new upper bounds on sample complexity based on frequent closed sets and describing how convex optimization can be used to improve and score the obtained samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hand, D.J.: Pattern detection and discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 1–12. Springer, Heidelberg (2002)
Chapter Google Scholar
Mannila, H.: Local and global methods in data mining: Basic techniques and open problems. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 57–68. Springer, Heidelberg (2002)
Chapter Google Scholar
: In: Goethals, B., Zaki, M.J. (eds.) Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI 2003), Melbourne Florida, USA, November 19. CEUR Workshop Proceedings, vol. 90 (2003), http://CEUR-WS.org/Vol-90/
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 189–194. AAAI Press, Menlo Park (1996)
Google Scholar
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2003)
Article Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of The ACM 39, 58–64 (1996)
Article Google Scholar
Mannila, H.: Inductive databases and condensed representations for data mining. In: Maluszynski, J. (ed.) Logic Programming, pp. 21–30. MIT Press, Cambridge (1997)
Google Scholar
Mannila, H.: Theoretical frameworks for data mining. SIGKDD Explorations 1, 30–32 (2000)
Article Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transactions on Database Systems 28, 140–174 (2003)
Article Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)
Article MathSciNet Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings of the Twenteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM, New York (2001)
Google Scholar
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunctionfree generators. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 305–312. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–865. Springer, Heidelberg (2002)
Chapter Google Scholar
Pei, J., Dong, G., Zou, W., Han, J.: On computing condensed pattern bases. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pp. 378–385. IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Mielikäinen, T., Mannila, H.: The pattern ordering problem. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 327–338. Springer, Heidelberg (2003)
Chapter Google Scholar
Mielikäinen, T.: Chaining patterns. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 232–243. Springer, Heidelberg (2003)
Chapter Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) VLDB 1996, Proceedings of 22nd International Conference on Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Alt, H., Ferreira, A. (eds.) STACS 2002. LNCS, vol. 2285, pp. 133–141. Springer, Heidelberg (2002)
Chapter Google Scholar
Ausiello, G., Crescenzi, P., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999)
MATH Google Scholar
Alon, N., Awerbuch, B., Azar, Y., Buchbinder, N., Naor, J.S.: The online set cover problem. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pp. 100–105. ACM, New York (2003)
Google Scholar
Ioannidis, Y.: Approximations in database systems. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 16–30. Springer, Heidelberg (2002)
Chapter Google Scholar
Mielikäinen, T.: Finding all occurring sets of interest. In: Boulicaut, J.F., Džeroski, S. (eds.) 2nd International Workshop on Knowledge Discovery in Inductive Databases, pp. 97–106 (2003)
Google Scholar
Matoušek, J.: Geometric Discrepancy: An Illustrated Guide. Algorithms and Combinatorics, vol. 18. Springer, Heidelberg (1999)
MATH Google Scholar
Chazelle, B.: The Discrepancy Method: Randomness and Complexity. Paperback edn. Cambridge University Press, Cambridge (2001)
Google Scholar
Ben-Tal, A., Nemirovksi, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. MPS-SIAM Series on Optimization, vol. 2. SIAM, Philadelphia (2001)
Book MATH Google Scholar
de Farias Jr., I.R., Nemhauser, G.L.: A polyhedral study of the cardinality constrained knapsack problem. Mathematical Programming 96, 439–467 (2003)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

HIIT Basic Research Unit, Department of Computer Science, University of Helsinki, Finland
Taneli Mielikäinen

Authors

Taneli Mielikäinen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering and Information Technology, Deakin University, VIC 3125, Australia
Honghua Dai
University of Illinois at Urbana-Champaign, 61801, Urbana, IL, USA
Ramakrishnan Srikant
Faculty of Engineering and Information Technology, Centre for Quantum Computation and Intelligent Systems, and Australian ACS National Committee for Artificial Intelligence, University of Technology, Sydney, Australia
Chengqi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mielikäinen, T. (2004). Separating Structure from Interestingness. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_58

Download citation

DOI: https://doi.org/10.1007/978-3-540-24775-3_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics