Abstract
Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented.
In this paper we propose a general approach to build condensed representations of pattern collections. The approach is based on separating the structure of the pattern collection from the interestingness values of the patterns. We study also the concrete case of representing the frequent sets and their (approximate) frequencies following this approach: we discuss the trade-offs in representing the frequent sets by the maximal frequent sets, the minimal infrequent sets and their combinations, and investigate the problem approximating the frequencies from samples by giving new upper bounds on sample complexity based on frequent closed sets and describing how convex optimization can be used to improve and score the obtained samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hand, D.J.: Pattern detection and discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 1–12. Springer, Heidelberg (2002)
Mannila, H.: Local and global methods in data mining: Basic techniques and open problems. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 57–68. Springer, Heidelberg (2002)
: In: Goethals, B., Zaki, M.J. (eds.) Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI 2003), Melbourne Florida, USA, November 19. CEUR Workshop Proceedings, vol. 90 (2003), http://CEUR-WS.org/Vol-90/
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 189–194. AAAI Press, Menlo Park (1996)
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2003)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of The ACM 39, 58–64 (1996)
Mannila, H.: Inductive databases and condensed representations for data mining. In: Maluszynski, J. (ed.) Logic Programming, pp. 21–30. MIT Press, Cambridge (1997)
Mannila, H.: Theoretical frameworks for data mining. SIGKDD Explorations 1, 30–32 (2000)
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transactions on Database Systems 28, 140–174 (2003)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings of the Twenteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM, New York (2001)
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunctionfree generators. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 305–312. IEEE Computer Society, Los Alamitos (2001)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–865. Springer, Heidelberg (2002)
Pei, J., Dong, G., Zou, W., Han, J.: On computing condensed pattern bases. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pp. 378–385. IEEE Computer Society, Los Alamitos (2002)
Mielikäinen, T., Mannila, H.: The pattern ordering problem. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 327–338. Springer, Heidelberg (2003)
Mielikäinen, T.: Chaining patterns. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 232–243. Springer, Heidelberg (2003)
Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) VLDB 1996, Proceedings of 22nd International Conference on Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)
Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Alt, H., Ferreira, A. (eds.) STACS 2002. LNCS, vol. 2285, pp. 133–141. Springer, Heidelberg (2002)
Ausiello, G., Crescenzi, P., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999)
Alon, N., Awerbuch, B., Azar, Y., Buchbinder, N., Naor, J.S.: The online set cover problem. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pp. 100–105. ACM, New York (2003)
Ioannidis, Y.: Approximations in database systems. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 16–30. Springer, Heidelberg (2002)
Mielikäinen, T.: Finding all occurring sets of interest. In: Boulicaut, J.F., Džeroski, S. (eds.) 2nd International Workshop on Knowledge Discovery in Inductive Databases, pp. 97–106 (2003)
Matoušek, J.: Geometric Discrepancy: An Illustrated Guide. Algorithms and Combinatorics, vol. 18. Springer, Heidelberg (1999)
Chazelle, B.: The Discrepancy Method: Randomness and Complexity. Paperback edn. Cambridge University Press, Cambridge (2001)
Ben-Tal, A., Nemirovksi, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. MPS-SIAM Series on Optimization, vol. 2. SIAM, Philadelphia (2001)
de Farias Jr., I.R., Nemhauser, G.L.: A polyhedral study of the cardinality constrained knapsack problem. Mathematical Programming 96, 439–467 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mielikäinen, T. (2004). Separating Structure from Interestingness. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive