Skip to main content

Privacy-Preserving Data Stream Classification

  • Chapter
Privacy-Preserving Data Mining

Part of the book series: Advances in Database Systems ((ADBS,volume 34))

In a wide range of applications, multiple data streams need to be examined together in order to discover trends or patterns existing across several data streams. One common practice is to redirect all data streams into a central place for joint analysis. This “centralized” practice is challenged by the fact that data streams often are private in that they come from different owners. In this paper, we focus on the problem of building a classifier in this context and assume that classification evolves as the current window of streams slides forward. This problem faces two major challenges. First, the many-to-many join relationship of streams will blow up the already fast arrival rate of data streams. Second, the privacy requirement implies that data exchange among owners should be minimal. These considerations rule out all classification methods that require producing the join in the current window.We show that Naive Bayesian Classification (NBC) presents a unique opportunity to address this problem. Our main contribution is to adopt NBC to solve the classification problem for private data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Aggarwal, J. Han, J. Wang, and P. Yu. (2006). A Framework for On-Demand Classification of Evolving Data Streams. IEEE TKDE, Vol. 18, No. 5, Page:577–589.

    Google Scholar 

  2. R. Agrawal, A. Evfimievski and R. Srikant. (2003). Information sharing across private databases. In Proc. SIGMOD.

    Google Scholar 

  3. R. Agrawal, and R. Srikant. (2000). Privacy-preserving data mining. In Proc. SIGMOD.

    Google Scholar 

  4. C. Agarwal and P. Yu. (2004). A condensation Approach to Privacy Preserving Data Mining. In Proc. EDBT.

    Google Scholar 

  5. Noga Alon, Phillip B. Gibbons, Yossi Matias, and Mario Szegedy. (1999). Tracking Join and Self-Join Sizes in Limited Storage. In ACM PODS.

    Google Scholar 

  6. B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Model and issues in data stream systems. (2002). In ACM PODS, Madison, Wisconsin.

    Google Scholar 

  7. J. Beringer and E. Hullermeier. (2005). Online clustering of parallel data streams. In press for Data & Knowledge Engineering.

    Google Scholar 

  8. J. Bethencourt, D. Song, and B. Waters. (2006). Constructions and Practical Applications for Private Stream Searching. In IEEE Symposium on Security and Privacy.

    Google Scholar 

  9. Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge and L. Auvil. (2004). MAIDS: Mining alarming incidents from data streams. In Proc. SIGMOD, demonstration paper.

    Google Scholar 

  10. D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. (2002). Monitoring streams - a new class of data management applications. In Proc. VLDB.

    Google Scholar 

  11. S. Chaudhuri, R. Motwani, and V. R. Narasayya. (1999). On random sampling over joins. In Proc. SIGMOD.

    Google Scholar 

  12. K. Chen and L. Liu. (2005). Privacy preserving data classification with rotation perturbation. In ICDM.

    Google Scholar 

  13. G. Chen, X. Wu, X. Zhu. (2005). Sequential pattern mining in multiple streams, In Proc. ICDM.

    Google Scholar 

  14. A. Das, J. Gehrke and M.Riedewald. (2003). Approximate join processing over data streams. In Proc. SIGMOD, Madison, Wisconsin.

    Google Scholar 

  15. A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. (2002). Processing complex aggregate queries over data streams. In Proc. SIGMOD, Madison, Wisconsin.

    Google Scholar 

  16. P. Domingos and G. Hulten. (2000). Mining high-speed data streams. In Proc. SIGKDD.

    Google Scholar 

  17. Pedro Domingos and Michael Pazzani. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103-130.

    Article  Google Scholar 

  18. W. Du and Z. Zhan. (2002). Building decision tree classifier on private data. ICDM Workshop on Privacy, Security and Data Mining.

    Google Scholar 

  19. R. O. Duda and P. E. Hart. (1973). Pattern classification and scene analysis. New York: John Wiley & Sons.

    MATH  Google Scholar 

  20. J. Gama, R. Racha, P.Medas. (2003). Accurate decision trees for mining high-speed data streams. In Proc. SIGKDD.

    Google Scholar 

  21. S. Ganguly, M. Garofalakis, A. Kumar and R. Rastogj. (2005). Join-distinct aggregate estimation over update streams. In Proc. ACM PODS, Baltimore, Maryland.

    Google Scholar 

  22. L. Golab and M. Tamer Ozsu. (2003) Processing sliding window multi-joins in continuous queries over data streams. In Proc. VLDB.

    Google Scholar 

  23. O. Goldreich. (2001). Secure multi-party computation. Working Draft, Version 1.3.

    Google Scholar 

  24. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. (2000). Clustering data streams. In FOCS.

    Google Scholar 

  25. D. J. Hand and K. Yu. (2001). Idiot’s Bayes - not so stupid after all? International Statistical Review. 69(3), 385-399.

    Article  MATH  Google Scholar 

  26. M. Levene and G. Loizou. (2003). Why is the snowflake schema a good data warehouse design? Information Systems 28(3).

    Google Scholar 

  27. F. Li, J. Sun, S. Papadimitriou, G. Mihala and I. Stanoi. (2007). Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking. In Proc. ICDE.

    Google Scholar 

  28. Y. Lindell and B. Pinkas. (2000). Privacy preserving data mining. In Proc. CRYPTO.

    Google Scholar 

  29. A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. (2006). l-Diversity: Privacy beyond k-anonymity. ICDE.

    Google Scholar 

  30. R. Ostrovsky and W. Skeith. (2005). Private Searching on Streaming Data. In CRYPTO.

    Google Scholar 

  31. Irina Rish. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.

    Google Scholar 

  32. U. Srivastava, J. Widom. (2004). Memory-limited execution of windowed stream joins. In Proc. VLDB.

    Google Scholar 

  33. L. Sweeney. (2002). k-Anonymity: A Model for Protecting Privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5).

    Google Scholar 

  34. J. Vaidya and C. W. Clifton. (2002). Privacy preserving association rule mining in vertically partitioned data. In SIGKDD.

    Google Scholar 

  35. H. Wang, W. Fan, P. Yu and J. Han. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proc. SIGKDD.

    Google Scholar 

  36. K. Wang, Y. Xu, R. She, P. Yu. (2006). Classification Spanning Private Databases. AAAI.

    Google Scholar 

  37. Y. Zhu and D. Shasha. (2002). Statstream: Statistical monitoring of thousands of data streams in real time. In Proc. VLDB.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Xu, Y., Wang, K., Fu, A.WC., She, R., Pei, J. (2008). Privacy-Preserving Data Stream Classification. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-70992-5_20

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-70991-8

  • Online ISBN: 978-0-387-70992-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics