Privacy-Preserving Data Stream Classification

Xu, Yabo; Wang, Ke; Fu, Ada Wai-Chee; She, Rong; Pei, Jian

doi:10.1007/978-0-387-70992-5_20

Yabo Xu⁵,
Ke Wang⁵,
Ada Wai-Chee Fu⁶,
Rong She⁵ &
…
Jian Pei⁵

Part of the book series: Advances in Database Systems ((ADBS,volume 34))

4977 Accesses
5 Citations

In a wide range of applications, multiple data streams need to be examined together in order to discover trends or patterns existing across several data streams. One common practice is to redirect all data streams into a central place for joint analysis. This “centralized” practice is challenged by the fact that data streams often are private in that they come from different owners. In this paper, we focus on the problem of building a classifier in this context and assume that classification evolves as the current window of streams slides forward. This problem faces two major challenges. First, the many-to-many join relationship of streams will blow up the already fast arrival rate of data streams. Second, the privacy requirement implies that data exchange among owners should be minimal. These considerations rule out all classification methods that require producing the join in the current window.We show that Naive Bayesian Classification (NBC) presents a unique opportunity to address this problem. Our main contribution is to adopt NBC to solve the classification problem for private data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Aggarwal, J. Han, J. Wang, and P. Yu. (2006). A Framework for On-Demand Classification of Evolving Data Streams. IEEE TKDE, Vol. 18, No. 5, Page:577–589.
Google Scholar
R. Agrawal, A. Evfimievski and R. Srikant. (2003). Information sharing across private databases. In Proc. SIGMOD.
Google Scholar
R. Agrawal, and R. Srikant. (2000). Privacy-preserving data mining. In Proc. SIGMOD.
Google Scholar
C. Agarwal and P. Yu. (2004). A condensation Approach to Privacy Preserving Data Mining. In Proc. EDBT.
Google Scholar
Noga Alon, Phillip B. Gibbons, Yossi Matias, and Mario Szegedy. (1999). Tracking Join and Self-Join Sizes in Limited Storage. In ACM PODS.
Google Scholar
B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Model and issues in data stream systems. (2002). In ACM PODS, Madison, Wisconsin.
Google Scholar
J. Beringer and E. Hullermeier. (2005). Online clustering of parallel data streams. In press for Data & Knowledge Engineering.
Google Scholar
J. Bethencourt, D. Song, and B. Waters. (2006). Constructions and Practical Applications for Private Stream Searching. In IEEE Symposium on Security and Privacy.
Google Scholar
Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge and L. Auvil. (2004). MAIDS: Mining alarming incidents from data streams. In Proc. SIGMOD, demonstration paper.
Google Scholar
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. (2002). Monitoring streams - a new class of data management applications. In Proc. VLDB.
Google Scholar
S. Chaudhuri, R. Motwani, and V. R. Narasayya. (1999). On random sampling over joins. In Proc. SIGMOD.
Google Scholar
K. Chen and L. Liu. (2005). Privacy preserving data classification with rotation perturbation. In ICDM.
Google Scholar
G. Chen, X. Wu, X. Zhu. (2005). Sequential pattern mining in multiple streams, In Proc. ICDM.
Google Scholar
A. Das, J. Gehrke and M.Riedewald. (2003). Approximate join processing over data streams. In Proc. SIGMOD, Madison, Wisconsin.
Google Scholar
A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. (2002). Processing complex aggregate queries over data streams. In Proc. SIGMOD, Madison, Wisconsin.
Google Scholar
P. Domingos and G. Hulten. (2000). Mining high-speed data streams. In Proc. SIGKDD.
Google Scholar
Pedro Domingos and Michael Pazzani. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103-130.
Article Google Scholar
W. Du and Z. Zhan. (2002). Building decision tree classifier on private data. ICDM Workshop on Privacy, Security and Data Mining.
Google Scholar
R. O. Duda and P. E. Hart. (1973). Pattern classification and scene analysis. New York: John Wiley & Sons.
MATH Google Scholar
J. Gama, R. Racha, P.Medas. (2003). Accurate decision trees for mining high-speed data streams. In Proc. SIGKDD.
Google Scholar
S. Ganguly, M. Garofalakis, A. Kumar and R. Rastogj. (2005). Join-distinct aggregate estimation over update streams. In Proc. ACM PODS, Baltimore, Maryland.
Google Scholar
L. Golab and M. Tamer Ozsu. (2003) Processing sliding window multi-joins in continuous queries over data streams. In Proc. VLDB.
Google Scholar
O. Goldreich. (2001). Secure multi-party computation. Working Draft, Version 1.3.
Google Scholar
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. (2000). Clustering data streams. In FOCS.
Google Scholar
D. J. Hand and K. Yu. (2001). Idiot’s Bayes - not so stupid after all? International Statistical Review. 69(3), 385-399.
Article MATH Google Scholar
M. Levene and G. Loizou. (2003). Why is the snowflake schema a good data warehouse design? Information Systems 28(3).
Google Scholar
F. Li, J. Sun, S. Papadimitriou, G. Mihala and I. Stanoi. (2007). Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking. In Proc. ICDE.
Google Scholar
Y. Lindell and B. Pinkas. (2000). Privacy preserving data mining. In Proc. CRYPTO.
Google Scholar
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. (2006). l-Diversity: Privacy beyond k-anonymity. ICDE.
Google Scholar
R. Ostrovsky and W. Skeith. (2005). Private Searching on Streaming Data. In CRYPTO.
Google Scholar
Irina Rish. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.
Google Scholar
U. Srivastava, J. Widom. (2004). Memory-limited execution of windowed stream joins. In Proc. VLDB.
Google Scholar
L. Sweeney. (2002). k-Anonymity: A Model for Protecting Privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5).
Google Scholar
J. Vaidya and C. W. Clifton. (2002). Privacy preserving association rule mining in vertically partitioned data. In SIGKDD.
Google Scholar
H. Wang, W. Fan, P. Yu and J. Han. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proc. SIGKDD.
Google Scholar
K. Wang, Y. Xu, R. She, P. Yu. (2006). Classification Spanning Private Databases. AAAI.
Google Scholar
Y. Zhu and D. Shasha. (2002). Statstream: Statistical monitoring of thousands of data streams in real time. In Proc. VLDB.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Science, Simon Fraser University, V5A 1S6, Burnaby, BC, Canada
Yabo Xu, Ke Wang, Rong She & Jian Pei
Department of Computer Science, Chinese University of Hong Kong, Hong Kong, China
Ada Wai-Chee Fu

Authors

Yabo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ke Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ada Wai-Chee Fu
View author publications
You can also search for this author in PubMed Google Scholar
Rong She
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM Thomas J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
Charu C. Aggarwal
Department of Computer Science, University of Illinois at Chicago, 854 South Morgan Street, 60607-7053, Chicago, IL, USA
Philip S. Yu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, Y., Wang, K., Fu, A.WC., She, R., Pei, J. (2008). Privacy-Preserving Data Stream Classification. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems, vol 34. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-70992-5_20

Download citation

DOI: https://doi.org/10.1007/978-0-387-70992-5_20
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-70991-8
Online ISBN: 978-0-387-70992-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics