Skip to main content

Utility Aware Clustering for Publishing Transactional Data

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Included in the following conference series:

Abstract

This work aims to maximise the utility of published data for the partition-based anonymisation of transactional data. We make an observation that, by optimising the clustering i.e. horizontal partitioning, the utility of published data can significantly be improved without affecting the privacy guarantees. We present a new clustering method with a specially designed distance function that considers the effect of sensitive terms in the privacy goal as part of the clustering process. In this way, when the clustering minimises the total intra-cluster distances of the partition, the utility loss is also minimised. We present two algorithms DocClust and DetK for clustering transactions and determining the best number of clusters respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barbaro, M., Zeller, T.: A face is exposed for AOL searcher no. 4417749. The New York Times (2006)

    Google Scholar 

  2. Byun, J., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: DASFAA, pp. 188–200 (2007)

    Google Scholar 

  3. Ghinita, G., Kalnis, P., Tao, Y.: Anonymous publication of sensitive transactional data. IEEE TKDE 23(2), 161–174 (2011)

    Google Scholar 

  4. Goldberger, J., Tassa, T.: Efficient anonymizations with enhanced utility. TDP 3(2), 149–175 (2010)

    MathSciNet  Google Scholar 

  5. Liu, J., Wang, K.: Anonymizing bag-valued sparse data by semantic similarity-based clustering. KIS 35(2), 435–461 (2013)

    Google Scholar 

  6. Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., Terrovitis, M.: Disassociation for electronic health record privacy. JBI 50, 46–61 (2014)

    Google Scholar 

  7. Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., Terrovitis, M.: Utility-constrained electronic health record data publishing through generalization and disassociation. In: Medical Data Privacy Handbook, pp. 149–177 (2015)

    Google Scholar 

  8. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google Scholar 

  9. Terrovitis, M., Liagouris, J., Mamoulis, N., Skiadopoulos, S.: Privacy preservation by disassociation. PVLDB 5(10), 944–955 (2012)

    Google Scholar 

  10. Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)

    Article  Google Scholar 

  11. Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: ACM SIGKDD, pp. 401–406 (2001)

    Google Scholar 

  12. Zhu, H., Ye, X.: Achieving k-anonymity via a density-based clustering method. In: WAIM, pp. 745–752 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Bewong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bewong, M., Liu, J., Liu, L., Li, J. (2017). Utility Aware Clustering for Publishing Transactional Data. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57529-2_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57528-5

  • Online ISBN: 978-3-319-57529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics