Statistical Supports for Frequent Itemsets on Data Streams

Laur, Pierre-Alain; Symphor, Jean-Emile; Nock, Richard; Poncelet, Pascal

doi:10.1007/11510888_39

Pierre-Alain Laur²⁰,
Jean-Emile Symphor²⁰,
Richard Nock²⁰ &
…
Pascal Poncelet²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3587))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2065 Accesses
1 Citations

Abstract

When we mine information for knowledge on a whole data streams it’s necessary to cope with uncertainty as only a part of the stream is available. We introduce a stastistical technique, independant from the used algorithm, for estimating the frequent itemset on a stream. This statistical support allows to maximize either the precision or the recall as choosen by the user, while it doesn’t damage the other. Experiments with various association rules databases demonstrate the potential of such technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proc. of the 29^th International Colloquium on Automata, Languages, and Programming, pp. 693–703 (2002)
Google Scholar
Cheung, D., Han, J., Ng, V., Wong, C.: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In: Proc. of the 12^th International Conference on Data Engineering, February 1996, pp. 106–114. New Orleans, Louisiana (1996)
Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proc. of the 22^nd ACM Symposium on the Principle of Database Systems, pp. 296–306. ACM Press, New York (2003)
Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
MATH Google Scholar
Fan, W., Huang, Y.-A., Wang, H., Yu, P.-S.: Active mining of data streams. In: Proc. of the 4^th SIAM International Conference on Data Mining, pp. 457–461 (2004)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.-S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Karguta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, ch. 6. MIT/AAAI Press (2004)
Google Scholar
Golab, L., Tamer Ozsu, M.: Issues in Data Stream Management. ACM SIGMOD Record 2(2), 5–14 (2003)
Article Google Scholar
Gollapudi, S., Sivakumar, D.: Framework and Algorithms for Trend Analysis in Massive Temporal Data Sets. In: Proc. of the 13^th International Conference on Information and Knowledge Management, pp. 168–177 (2004)
Google Scholar
Frequent itemset mining dataset repository (2005), http://fimi.cs.helsinki.fi/data
Jin, C., Qian, W., Sha, C., Yu, J.-X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: Proc. of the 12^th International Conference on Information and Knowledge Management, pp. 287–294. ACM Press, New York (2003)
Google Scholar
Kearns, M.J., Mansour, Y.: A Fast, Bottom-up Decision Tree Pruning algorithm with Near-Optimal generalization. In: Proc. of the 15^th International Conference on Machine Learning, pp. 269–277 (1998)
Google Scholar
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28^th International Conference on Very Large Databases, Hong Kong, China, pp. 346–357 (2002)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Nock, R., Nielsen, F.: Statistical Region Merging. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(11), 1452–1458 (2004)
Article Google Scholar
Orlando, S., Palmerini, P., Perego, R., Silvestri, C., Silvestri, F.: kDCI: a multi-strategy algorithm for mining frequent sets. In: Proc. of the Workshop on Frequent Itemset Mining Implementations, in conjunction with ICDM 2003 (2003)
Google Scholar
Rizvi, S.-J., Haritsa, J.-R.: Maintaining Data Privacy in Association Rule Mining. In: Proc. of the 28^th International Conference on Very Large Databases, pp. 682–693 (2002)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)
MATH Google Scholar
Veloso, A., Gusmao, B., Meira, W., Carvalho, M., Parthasarathy, S., Zaki, M.-J.: Efficiently Mining Approximate Models of Associations in Evolving Databases. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 435–448. Springer, Heidelberg (2002)
Chapter Google Scholar
Veloso, A., Meira, W., Carvalho, M., Possas, B., Parthasarathy, S., Zaki, M.-J.: Mining Frequent Itemsets in Evolving Databases. In: Proc. of the 2^nd SIAM International Conference on Data Mining, Arlington, April 2002, pp. 31–41 (2002)
Google Scholar
Wang, H., Fan, W., Yu, P.-S., Han, J.: Mining concept-drifting data streams with ensemble classifiers. In: Proc. of the 9^th International Conference on Knowledge Discovery in Databases, pp. 226–235 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

GRIMAAG-Dépt Scientifique Interfacultaire, Université Antilles-Guyane, Campus de Schoelcher, B.P. 7209, 97275, Schoelcher Cedex, Martinique, France
Pierre-Alain Laur, Jean-Emile Symphor & Richard Nock
LG2IP-Ecole des Mines d’Alès, Site EERIE, parc scientifique Georges Besse, 30035, Nîmes Cedex, France
Pascal Poncelet

Authors

Pierre-Alain Laur
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Emile Symphor
View author publications
You can also search for this author in PubMed Google Scholar
Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Poncelet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Petra Perner
Institute of Media and Information Technology, Chiba University, Japan
Atsushi Imiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laur, PA., Symphor, JE., Nock, R., Poncelet, P. (2005). Statistical Supports for Frequent Itemsets on Data Streams. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_39

Download citation

DOI: https://doi.org/10.1007/11510888_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26923-6
Online ISBN: 978-3-540-31891-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics