Abstract
This paper introduces an approach for incremental maximal frequent pattern (MFP) mining in sparse binary data, where instances are observed one by one. For this purpose, we propose the Augmented Itemset Tree (AIST), a data structure that incorporates features of the FP-tree into the itemset tree. In the given setting, we assume that just the data structure is maintained in main memory, and each instance is observed only once. The AIST not only stores observed frequent patterns, but also allows for quick frequency updates of relevant subpatterns. In order to quickly identify the current set of exact MFPs, potential candidates are extracted from former MFPs and patterns that occur in the new instance. The presented approach is evaluated concerning the runtime and memory requirements depending on the number of instances, minimum support and different settings of pattern properties. The obtained results suggest that AISTs are useful for mining maximal frequent itemsets in an online setting whenever larger patterns can be expected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data 1993, pp. 207–216. ACM, New York (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Cheung, D.W., Han, J., Ng, V.T., Wong, C.Y.: Maintenance of discovered association rules in large databases: An incremental updating technique. In: Proceedings of the Twelfth International Conference on Data Engineering (ICDE), pp. 106–114. IEEE Computer Society, Los Alamitos (1996)
Cheung, W., Zaiane, O.R.: Incremental mining of frequent patterns without candidate generation or support. In: IDEAS 2003: Proceedings of the 7th International Database Engineering and Applications Symposium 2003, pp. 111–116. IEEE Computer Society, Los Alamitos (2003)
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 59–66. IEEE Computer Society, Los Alamitos (2004)
Chiu, D.Y., Wu, Y.H., Chen, A.: Efficient frequent sequence mining by a dynamic strategy switching algorithm. The VLDB Journal 18, 303–327 (2009)
Floratou, A., Tata, S., Patel, J.M.: Efficient and accurate discovery of patterns in sequence datasets. In: ICDE 2010: Proceedings of the 26th International Conference on Data Engineering, pp. 461–472. IEEE Computer Society, Los Alamitos (2010)
Hafez, A., Deogun, J., Raghavan, V.V.: The item-set tree: A data structure for data mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 183–192. Springer, Heidelberg (1999)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM, New York (2000)
Lee, D., Lee, W.: Finding maximal frequent itemsets over online data streams adaptively. In: ICDM, pp. 266–273 (2005)
Lee, H.S.: Incremental association mining based on maximal itemsets. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3681, pp. 365–371. Springer, Heidelberg (2005)
Leung, C.K.S., Khan, Q.I., Li, Z., Hoque, T.: Cantree: a canonical-order tree for incremental frequent-pattern mining. Knowledge and Information Systems 11(3), 287–311 (2007)
Lian, W., Cheung, D.W., Yiu, S.M.: Maintenance of maximal frequent itemsets in large databases. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 388–392. ACM, New York (2007)
Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and Mining Frequent Patterns from Large Windows over Data Streams. In: ICDE 2008: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pp. 179–188. IEEE Computer Society, Los Alamitos (2008)
Omari, A., Langer, R., Conrad, S.: Tartool: A temporal dataset generator for market basket analysis. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 400–410. Springer, Heidelberg (2008)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Dayal, U., Gray, P.M.D., Nishio, S. (eds.) Proceedings of 21th International Conference on Very Large Data Bases, VLDB 1995, pp. 432–444. Morgan Kaufmann, San Francisco (1995)
Schmidt, J., Kramer, S.: The augmented itemset tree: A data structure for online maximum frequent pattern mining. techreport (2011), http://drehscheibe.in.tum.de/forschung/pub/reports/2011/TUM-I1114.pdf
Seeland, M., Girschick, T., Buchwald, F., Kramer, S.: Online structural graph clustering using frequent subgraph mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6323, pp. 213–228. Springer, Heidelberg (2010)
Valtchev, P., Missaoui, R., Godin, R.: A framework for incremental generation of closed itemsets. Discrete Applied Mathematics 156, 924–949 (2008)
Valtchev, P., Missaoui, R., Godin, R., Meridji, M.: Generating frequent itemsets incrementally: two novel approaches based on galois lattice theory. Journal of Experimental & Theoretical Artificial Intelligence 14(2-3), 115–142 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmidt, J., Kramer, S. (2011). The Augmented Itemset Tree: A Data Structure for Online Maximum Frequent Pattern Mining. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-24477-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)