Skip to main content

Efficient Privacy Preserving Cross-Datasets Collaborative Outlier Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11983))

Abstract

Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. It is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. The accuracy of the outlier detection depends on sufficient data. However, the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. Unfortunately, competitive advantage, privacy concerns and regulations, and issues surrounding data sovereignty and jurisdiction prevent many organizations from openly sharing their data. In this paper, we address precisely this issue. We present new and efficient protocols for privacy preserving outlier detection to find outliers from arbitrarily partitioned categorical data. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who detects on the joint data using secure two-party computation (2PC). Our method is based on Local Distance-based Outlier Factor (LDOF) using the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We provide the privacy guarantee by using secure multiparty computation techniques. We implement our system in C++ on real data. Our experiments validate that our protocols are both effective and efficient.

Supported by the National Key Research and Development Program of China under Grant 2016YFB0800601, and the Key Program of NSFC-Tongyong Union Foundation under Grant U1636209.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Vaidya, J., Clifton, C., Kantarcioglu, M.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4, 28–34 (2002)

    Article  Google Scholar 

  2. Clifton, C., Kantarcioglu, M.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16, 1026–1037 (2004)

    Article  Google Scholar 

  3. Clifton, C., Vaidya, J.: Privacy preserving association rule mining in vertically partitioned data. In: 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD02), pp. 639–644 (2002)

    Google Scholar 

  4. Wright, R., Jagannathan, G.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD05), pp 593–599 (2005)

    Google Scholar 

  5. Ghosh, J., Merugu, S.: Privacy-preserving distributed clustering using generative models. In: 3rd IEEE International Conference on Data Mining (ICDM03), pp. 211–218 (2003)

    Google Scholar 

  6. Clifton, C., Vaidya, J.: Privacy-preserving k-means clustering over vertically partitioned data. In: 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp 206–215 (2003)

    Google Scholar 

  7. Srikant, R., Agrawal, R.: Privacy-preserving data mining. In: ACMSIGMOD International Conference on Management of Data (SIGMOD 2000), pp. 439–450 (2000)

    Google Scholar 

  8. Pinkas, B., Lindell, Y.: Privacy preserving data mining. In: 20th Annual International Cryptology Conference (CRYPTO 2000), pp. 36–54 (2000)

    Google Scholar 

  9. Clifton, C., Vaidya, J.: Privacy preserving naive Bayes classifier for vertically partitioned data. In: SIAM International Conference on Data Mining (SDM 2004), pp. 522–526 (2004)

    Google Scholar 

  10. Wang, S., Zhao, W., Zhang, N.: A new scheme on privacy-preserving data classification. In: 11th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005), pp. 374–383 (2005)

    Google Scholar 

  11. Aggarwal, C.C.: Outlier Analysis. Springer-Verlag, New York (2013). https://doi.org/10.1007/978-1-4614-6396-2

    Book  MATH  Google Scholar 

  12. Rastogi, R., Shim, K., Ramaswame, S.: Efficient algorithms formining outliers from large data sets. In: ACMSIGMOD International Conference on Management of Data (SIGMOD 2000), pp. 427–438 (2000)

    Google Scholar 

  13. Atallah, M., Du, W.: Privacy-preserving cooperative statistical analysis. In: 17th Annual Computer Security Applications Conference (ACSAC 2001), pp. 102–110 (2001)

    Google Scholar 

  14. Syverson, P., Goldschlag, D., Reed, M.: Onion routing. Commun. ACM 42, 39–41 (1999)

    Google Scholar 

  15. Vaidya, J., Clifton, C.: Privacy-preserving outlier detection. In: 4th IEEE International Conference on Data Mining, pp. 233–240 (2004)

    Google Scholar 

  16. Asif, H., Talukdar, T., Vaidya, J.: Differentially private outlier detection in a collaborative environment. Int. J. Coop. Inf. Syst. 27(03), 1850005 (2018)

    Article  Google Scholar 

  17. Nikolaenko, V., Ioannidis, S., Weinsberg, U., Joye, M., Taft, N., Boneh, D.: Privacy-preserving matrix factorization. In: ACM SIGSAC Conference on Computer Communications Security, pp. 801–812 (2013)

    Google Scholar 

  18. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: IEEE Symposium on Security and Privacy, pp. 334–348 (2013)

    Google Scholar 

  19. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84

    Chapter  Google Scholar 

  20. Lindell, Y., Pinkas, B.: A proof of security of Yaos protocol for two-party computation. J. Cryptology 22, 161–188 (2009)

    Article  MathSciNet  Google Scholar 

  21. Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 145–161. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4_9

    Chapter  Google Scholar 

  22. Schneider, T., Asharov, G., Lindell, Y., Zohner, M.: More efficient oblivious transfer and extensions for faster secure computation. In: ACM CCS 2013, pp. 535–548 (2013)

    Google Scholar 

  23. Nielsen, J.B., Nordholt, P.S., Orlandi, C., Burra, S.S.: A new approach to practical active-secure two-party computation. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 681–700. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_40

    Chapter  Google Scholar 

  24. Demmler, D., Schneider, T., Zohner, M.: ABY-a framework for efficient mixed-protocol secure two-party computation. In: NDSS (2015)

    Google Scholar 

  25. Nisan, N., Pinkas, B., Sella, Y., Malkhi, D., et al. : Fairplay a secure two-party computation system. In: 13th Conference on USENIX Security Symposium, vol. 13, pp. 20–20 (2004)

    Google Scholar 

  26. Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 420–432. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-46766-1_34

    Chapter  Google Scholar 

  27. Kamara, S., Mohassel, P., Raykova, M.: Outsourcing multiparty computation. IACR Cryptology ePrint Archive 272 (2011)

    Google Scholar 

  28. Canetti, R.: Universally composable security: a new paradigm for cryptographic protocols. In: 42nd IEEE Symposium on Foundations of Computer Science (FOCS 2001), pp. 136–145 (2001)

    Google Scholar 

  29. Zhang, Y., Mohassel, P.: SecureML: a system for scalable privacy-preserving machine learning. In: IEEE Symposium on Security and Privacy (SP), pp. 19–38 (2017)

    Google Scholar 

  30. GMP library. https://gmplib.org

  31. NTL library. http://www.shoup.net/ntl

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China under Grant 2016YFB0800601, and the Key Program of NSFC-Tongyong Union Foundation under Grant U1636209.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingqi Pei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, Z., Pei, Q., Liu, X., Ma, L. (2019). Efficient Privacy Preserving Cross-Datasets Collaborative Outlier Detection. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37352-8_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37351-1

  • Online ISBN: 978-3-030-37352-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics