Skip to main content

Privacy-Aware Data Sharing in a Tree-Based Categorical Clustering Algorithm

  • Conference paper
  • First Online:
Foundations and Practice of Security (FPS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10128))

Included in the following conference series:

  • 882 Accesses

Abstract

Despite being one of the most common approaches in unsupervised data analysis, a very small literature exists in applying formal methods to address data mining problems. This paper applies an abstract representation of a hierarchical categorical clustering algorithm (CCTree) to solve the problem of privacy-aware data clustering in distributed agents. The proposed methodology is based on rewriting systems, and automatically generates a global structure of the clusters. We prove that the proposed approach improves the time complexity. Moreover a metric is provided to measure the privacy gain after revealing the CCTree result. Furthermore, we discuss under what condition the CCTree clustering in distributed framework produces the comparable result to the centralized one.

This research has been partially supported by the EU Funded Projects H2020 C3IISP, GA #700294, H2020 NeCS, GA #675320, EIT Digital MCloudDaaS and partially by the Natural Sciences and Engineering Research Council of Canada (NSERC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4(2), 28–34 (2002)

    Article  Google Scholar 

  3. Dershowitz, N., Jouannaud, J.: Rewrite systems. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. b, pp. 243–320. MIT Press, Cambridge (1990)

    Google Scholar 

  4. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14:1–14:53 (2010)

    Article  Google Scholar 

  5. Kantarcioǧlu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 599–604. ACM, New York (2004)

    Google Scholar 

  6. Kriegel, H.P., Kroger, P., Pryakhin, A., Schubert, M.: Effective and efficient distributed model-based clustering. In: Fifth IEEE International Conference on Data Mining (2005)

    Google Scholar 

  7. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statist. 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  8. Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining (2008)

    Google Scholar 

  9. Martinelli, F., Saracino, A., Sheikhalishahi, M.: Modeling privacy aware information sharing systems: a formal and general approach. In: 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2016)

    Google Scholar 

  10. Oliveira, S.R.M., Zaïane, O.R.: Achieving privacy preservation when sharing data for clustering. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178, pp. 67–82. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30073-1_6

    Chapter  Google Scholar 

  11. Sheikhalishahi, M., Mejri, M., Tawbi, N.: Clustering spam emails into campaigns. In: Library, S.D. (ed.) 1st Conference on Information Systems Security and Privacy (2015)

    Google Scholar 

  12. Sheikhalishahi, M., Saracino, A., Mejri, M., Tawbi, N., Martinelli, F.: Fast and effective clustering of spam emails based on structural similarity. In: Garcia-Alfaro, J., Kranakis, E., Bonfante, G. (eds.) FPS 2015. LNCS, vol. 9482, pp. 195–211. Springer, Heidelberg (2016). doi:10.1007/978-3-319-30303-1_12

    Chapter  Google Scholar 

  13. Sheikhalishahi, M., Mejri, M., Tawbi, N.: On the abstraction of a categorical clustering algorithm. In: Perner, P. (ed.) MLDM 2016. LNCS (LNAI), pp. 659–675. Springer, Heidelberg (2016). doi:10.1007/978-3-319-41920-6_51

    Chapter  Google Scholar 

  14. Zhan, Z.J.: Privacy-preserving collaborative data mining. Doctoral Dissertation (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mina Sheikhalishahi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sheikhalishahi, M., Mejri, M., Tawbi, N., Martinelli, F. (2017). Privacy-Aware Data Sharing in a Tree-Based Categorical Clustering Algorithm. In: Cuppens, F., Wang, L., Cuppens-Boulahia, N., Tawbi, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2016. Lecture Notes in Computer Science(), vol 10128. Springer, Cham. https://doi.org/10.1007/978-3-319-51966-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51966-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51965-4

  • Online ISBN: 978-3-319-51966-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics