Skip to main content

Separating Structure from Interestingness

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3056))

Included in the following conference series:

Abstract

Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented.

In this paper we propose a general approach to build condensed representations of pattern collections. The approach is based on separating the structure of the pattern collection from the interestingness values of the patterns. We study also the concrete case of representing the frequent sets and their (approximate) frequencies following this approach: we discuss the trade-offs in representing the frequent sets by the maximal frequent sets, the minimal infrequent sets and their combinations, and investigate the problem approximating the frequencies from samples by giving new upper bounds on sample complexity based on frequent closed sets and describing how convex optimization can be used to improve and score the obtained samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hand, D.J.: Pattern detection and discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 1–12. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Mannila, H.: Local and global methods in data mining: Basic techniques and open problems. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 57–68. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. : In: Goethals, B., Zaki, M.J. (eds.) Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI 2003), Melbourne Florida, USA, November 19. CEUR Workshop Proceedings, vol. 90 (2003), http://CEUR-WS.org/Vol-90/

  4. Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 189–194. AAAI Press, Menlo Park (1996)

    Google Scholar 

  5. De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2003)

    Article  Google Scholar 

  6. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of The ACM 39, 58–64 (1996)

    Article  Google Scholar 

  7. Mannila, H.: Inductive databases and condensed representations for data mining. In: Maluszynski, J. (ed.) Logic Programming, pp. 21–30. MIT Press, Cambridge (1997)

    Google Scholar 

  8. Mannila, H.: Theoretical frameworks for data mining. SIGKDD Explorations 1, 30–32 (2000)

    Article  Google Scholar 

  9. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transactions on Database Systems 28, 140–174 (2003)

    Article  Google Scholar 

  10. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)

    Article  MathSciNet  Google Scholar 

  12. Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings of the Twenteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM, New York (2001)

    Google Scholar 

  13. Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunctionfree generators. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 305–312. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  14. Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–865. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Pei, J., Dong, G., Zou, W., Han, J.: On computing condensed pattern bases. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pp. 378–385. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  16. Mielikäinen, T., Mannila, H.: The pattern ordering problem. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 327–338. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. Mielikäinen, T.: Chaining patterns. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 232–243. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) VLDB 1996, Proceedings of 22nd International Conference on Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  19. Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Alt, H., Ferreira, A. (eds.) STACS 2002. LNCS, vol. 2285, pp. 133–141. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Ausiello, G., Crescenzi, P., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  21. Alon, N., Awerbuch, B., Azar, Y., Buchbinder, N., Naor, J.S.: The online set cover problem. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pp. 100–105. ACM, New York (2003)

    Google Scholar 

  22. Ioannidis, Y.: Approximations in database systems. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 16–30. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  23. Mielikäinen, T.: Finding all occurring sets of interest. In: Boulicaut, J.F., Džeroski, S. (eds.) 2nd International Workshop on Knowledge Discovery in Inductive Databases, pp. 97–106 (2003)

    Google Scholar 

  24. Matoušek, J.: Geometric Discrepancy: An Illustrated Guide. Algorithms and Combinatorics, vol. 18. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  25. Chazelle, B.: The Discrepancy Method: Randomness and Complexity. Paperback edn. Cambridge University Press, Cambridge (2001)

    Google Scholar 

  26. Ben-Tal, A., Nemirovksi, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. MPS-SIAM Series on Optimization, vol. 2. SIAM, Philadelphia (2001)

    Book  MATH  Google Scholar 

  27. de Farias Jr., I.R., Nemhauser, G.L.: A polyhedral study of the cardinality constrained knapsack problem. Mathematical Programming 96, 439–467 (2003)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mielikäinen, T. (2004). Separating Structure from Interestingness. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24775-3_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22064-0

  • Online ISBN: 978-3-540-24775-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics