Skip to main content

Statistical Supports for Frequent Itemsets on Data Streams

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3587))

Abstract

When we mine information for knowledge on a whole data streams it’s necessary to cope with uncertainty as only a part of the stream is available. We introduce a stastistical technique, independant from the used algorithm, for estimating the frequent itemset on a stream. This statistical support allows to maximize either the precision or the recall as choosen by the user, while it doesn’t damage the other. Experiments with various association rules databases demonstrate the potential of such technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proc. of the 29th International Colloquium on Automata, Languages, and Programming, pp. 693–703 (2002)

    Google Scholar 

  2. Cheung, D., Han, J., Ng, V., Wong, C.: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In: Proc. of the 12th International Conference on Data Engineering, February 1996, pp. 106–114. New Orleans, Louisiana (1996)

    Google Scholar 

  3. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proc. of the 22nd ACM Symposium on the Principle of Database Systems, pp. 296–306. ACM Press, New York (2003)

    Google Scholar 

  4. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  5. Fan, W., Huang, Y.-A., Wang, H., Yu, P.-S.: Active mining of data streams. In: Proc. of the 4th SIAM International Conference on Data Mining, pp. 457–461 (2004)

    Google Scholar 

  6. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.-S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Karguta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, ch. 6. MIT/AAAI Press (2004)

    Google Scholar 

  7. Golab, L., Tamer Ozsu, M.: Issues in Data Stream Management. ACM SIGMOD Record 2(2), 5–14 (2003)

    Article  Google Scholar 

  8. Gollapudi, S., Sivakumar, D.: Framework and Algorithms for Trend Analysis in Massive Temporal Data Sets. In: Proc. of the 13th International Conference on Information and Knowledge Management, pp. 168–177 (2004)

    Google Scholar 

  9. Frequent itemset mining dataset repository (2005), http://fimi.cs.helsinki.fi/data

  10. Jin, C., Qian, W., Sha, C., Yu, J.-X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: Proc. of the 12th International Conference on Information and Knowledge Management, pp. 287–294. ACM Press, New York (2003)

    Google Scholar 

  11. Kearns, M.J., Mansour, Y.: A Fast, Bottom-up Decision Tree Pruning algorithm with Near-Optimal generalization. In: Proc. of the 15th International Conference on Machine Learning, pp. 269–277 (1998)

    Google Scholar 

  12. Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28th International Conference on Very Large Databases, Hong Kong, China, pp. 346–357 (2002)

    Google Scholar 

  13. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  14. Nock, R., Nielsen, F.: Statistical Region Merging. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(11), 1452–1458 (2004)

    Article  Google Scholar 

  15. Orlando, S., Palmerini, P., Perego, R., Silvestri, C., Silvestri, F.: kDCI: a multi-strategy algorithm for mining frequent sets. In: Proc. of the Workshop on Frequent Itemset Mining Implementations, in conjunction with ICDM 2003 (2003)

    Google Scholar 

  16. Rizvi, S.-J., Haritsa, J.-R.: Maintaining Data Privacy in Association Rule Mining. In: Proc. of the 28th International Conference on Very Large Databases, pp. 682–693 (2002)

    Google Scholar 

  17. Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)

    MATH  Google Scholar 

  18. Veloso, A., Gusmao, B., Meira, W., Carvalho, M., Parthasarathy, S., Zaki, M.-J.: Efficiently Mining Approximate Models of Associations in Evolving Databases. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 435–448. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Veloso, A., Meira, W., Carvalho, M., Possas, B., Parthasarathy, S., Zaki, M.-J.: Mining Frequent Itemsets in Evolving Databases. In: Proc. of the 2nd SIAM International Conference on Data Mining, Arlington, April 2002, pp. 31–41 (2002)

    Google Scholar 

  20. Wang, H., Fan, W., Yu, P.-S., Han, J.: Mining concept-drifting data streams with ensemble classifiers. In: Proc. of the 9th International Conference on Knowledge Discovery in Databases, pp. 226–235 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Laur, PA., Symphor, JE., Nock, R., Poncelet, P. (2005). Statistical Supports for Frequent Itemsets on Data Streams. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_39

Download citation

  • DOI: https://doi.org/10.1007/11510888_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26923-6

  • Online ISBN: 978-3-540-31891-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics