Abstract
When we mine information for knowledge on a whole data streams it’s necessary to cope with uncertainty as only a part of the stream is available. We introduce a stastistical technique, independant from the used algorithm, for estimating the frequent itemset on a stream. This statistical support allows to maximize either the precision or the recall as choosen by the user, while it doesn’t damage the other. Experiments with various association rules databases demonstrate the potential of such technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proc. of the 29th International Colloquium on Automata, Languages, and Programming, pp. 693–703 (2002)
Cheung, D., Han, J., Ng, V., Wong, C.: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In: Proc. of the 12th International Conference on Data Engineering, February 1996, pp. 106–114. New Orleans, Louisiana (1996)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proc. of the 22nd ACM Symposium on the Principle of Database Systems, pp. 296–306. ACM Press, New York (2003)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
Fan, W., Huang, Y.-A., Wang, H., Yu, P.-S.: Active mining of data streams. In: Proc. of the 4th SIAM International Conference on Data Mining, pp. 457–461 (2004)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.-S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Karguta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, ch. 6. MIT/AAAI Press (2004)
Golab, L., Tamer Ozsu, M.: Issues in Data Stream Management. ACM SIGMOD Record 2(2), 5–14 (2003)
Gollapudi, S., Sivakumar, D.: Framework and Algorithms for Trend Analysis in Massive Temporal Data Sets. In: Proc. of the 13th International Conference on Information and Knowledge Management, pp. 168–177 (2004)
Frequent itemset mining dataset repository (2005), http://fimi.cs.helsinki.fi/data
Jin, C., Qian, W., Sha, C., Yu, J.-X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: Proc. of the 12th International Conference on Information and Knowledge Management, pp. 287–294. ACM Press, New York (2003)
Kearns, M.J., Mansour, Y.: A Fast, Bottom-up Decision Tree Pruning algorithm with Near-Optimal generalization. In: Proc. of the 15th International Conference on Machine Learning, pp. 269–277 (1998)
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28th International Conference on Very Large Databases, Hong Kong, China, pp. 346–357 (2002)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Nock, R., Nielsen, F.: Statistical Region Merging. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(11), 1452–1458 (2004)
Orlando, S., Palmerini, P., Perego, R., Silvestri, C., Silvestri, F.: kDCI: a multi-strategy algorithm for mining frequent sets. In: Proc. of the Workshop on Frequent Itemset Mining Implementations, in conjunction with ICDM 2003 (2003)
Rizvi, S.-J., Haritsa, J.-R.: Maintaining Data Privacy in Association Rule Mining. In: Proc. of the 28th International Conference on Very Large Databases, pp. 682–693 (2002)
Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)
Veloso, A., Gusmao, B., Meira, W., Carvalho, M., Parthasarathy, S., Zaki, M.-J.: Efficiently Mining Approximate Models of Associations in Evolving Databases. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 435–448. Springer, Heidelberg (2002)
Veloso, A., Meira, W., Carvalho, M., Possas, B., Parthasarathy, S., Zaki, M.-J.: Mining Frequent Itemsets in Evolving Databases. In: Proc. of the 2nd SIAM International Conference on Data Mining, Arlington, April 2002, pp. 31–41 (2002)
Wang, H., Fan, W., Yu, P.-S., Han, J.: Mining concept-drifting data streams with ensemble classifiers. In: Proc. of the 9th International Conference on Knowledge Discovery in Databases, pp. 226–235 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laur, PA., Symphor, JE., Nock, R., Poncelet, P. (2005). Statistical Supports for Frequent Itemsets on Data Streams. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_39
Download citation
DOI: https://doi.org/10.1007/11510888_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26923-6
Online ISBN: 978-3-540-31891-0
eBook Packages: Computer ScienceComputer Science (R0)