Skip to main content

De-anonymising Set-Generalised Transactions Based on Semantic Relationships

  • Conference paper
Future Data and Security Engineering (FDSE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8860))

Included in the following conference series:

Abstract

Transaction data are important to applications such as marketing analysis and medical studies. However, such data can contain personal information, thus must be sanitised before being used. One popular approach to protecting transaction data is set-based generalisation, where an item in a transaction is replaced by a set of items. In this paper, we study how well transaction data can be protected by this approach. More specifically, we propose de-anonymisation methods that aim to reconstruct original transaction data from its set-generalised version by analysing semantic relationship that exist among the items. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 50% of the items added to the transactions during generalisation can be detected by our method with a precision greater than 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: SIGMOD 2000 Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000)

    Google Scholar 

  2. Anandan, B., Clifton, C.: Significance of Term Relationships on Anonymization. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 253–256. IEEE (2011)

    Google Scholar 

  3. Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Evaluation (1998)

    Google Scholar 

  4. Carlson, M.: A data-swapping technique for generating synthetic samples; A method for disclosure control (2000)

    Google Scholar 

  5. Cilibrasi, R.L., Vitányi, P.M.B.: The google similarity distance. In: Knowledge and Data Engineering, pp. 370–383 (2007)

    Google Scholar 

  6. Datta, A., Sharma, D., Sinha, A.: Provable de-anonymization of large datasets with sparse dimensions. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 229–248. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 715–724 (2008)

    Google Scholar 

  8. Giannella, C.R., Liu, K., Kargupta, H.: Breaching Euclidean distance-preserving data perturbation using few known inputs. Data & Knowledge Engineering (301) (2012)

    Google Scholar 

  9. He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proceedings of the VLDB Endowment (2009)

    Google Scholar 

  10. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD 2005 (2005)

    Google Scholar 

  11. Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2(2), 1–25 (2008)

    Article  Google Scholar 

  12. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Data Mining, pp. 99–106 (2003)

    Google Scholar 

  13. Kifer, D.: Attacks on privacy and deFinetti’s theorem. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, p. 127 (2009)

    Google Scholar 

  14. Liu, J., Wang, K.: Anonymizing transaction data by integrating suppression and generalization. In: Advances in Knowledge Discovery and Data Mining, vol. 1, pp. 1–10 (2010)

    Google Scholar 

  15. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: COnstraint-based anonymization of transactions. Knowledge and Information Systems (2010)

    Google Scholar 

  16. Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125 (May 2008)

    Google Scholar 

  17. Sánchez, D., Batet, M., Viejo, A.: Detecting Term Relationships to Improve Textual Document Sanitization. In: PACIS 2013 (2013)

    Google Scholar 

  18. Terrovitis, M., Mamoulis, N., Kalnis, P.: Anonymity in unstructured data. In: Very Large Data Bases (VLDB) Conference, pp. 1–21 (2008)

    Google Scholar 

  19. Xu, Y., Fung, B.C.M., Wang, K.: Publishing sensitive transactions for itemset utility. In: Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 1109 – 1114 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ong, H., Shao, J. (2014). De-anonymising Set-Generalised Transactions Based on Semantic Relationships. In: Dang, T.K., Wagner, R., Neuhold, E., Takizawa, M., Küng, J., Thoai, N. (eds) Future Data and Security Engineering. FDSE 2014. Lecture Notes in Computer Science, vol 8860. Springer, Cham. https://doi.org/10.1007/978-3-319-12778-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12778-1_9

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12777-4

  • Online ISBN: 978-3-319-12778-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics