Abstract
LOF is a well-known approach for density-based outlier detection and has received much attention recently. It is important to design a privacy-preserving LOF outlier detection algorithm as the data on which LOF runs is typically spilt among multiple participants and no one is willing to disclose his sensitive information due to legal or moral considerations. This is, however, a hard problem since participants need to find the maximum one of the distances between an object and its k-Nearest Neighbors (k-NN) without learning the information of these objects. In this paper, we propose an efficient protocol for privacy-preserving LOF outlier detection. We first employ a shuffle protocol to permute the distance vectors owned by different participants. Then, we design a secure selection method to obtain the garbled k-NN indexes and shares of k-distance for given objects. For each object, we make use of the k-distance of all objects to construct a vector, based on which the permute protocol is executed again to obtain new shares of k-distance. Finally, the shares corresponding to the garbled k-NN indexes are selected as the expected result. Our protocol ensures that all the intermediates are shared between multiple participants and thus avoid information leaking. In addition, our protocol is efficient as we prove that the computation and communication complexity of our protocol is bounded by \(O(n^2)\).





Similar content being viewed by others
Notes
References
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD’00), pp 439–450
Amirbekyan A, Estivill-Castro V (2009) Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-nn for large data sets. Knowl Inf Syst 21:327–363
Beaver D (1991) Secure multiparty protocols and zero-knowledge proof systems tolerating a faulty minority. J Cryptol 4:75–122
Bellare M, Hoang VT, Rogaway P (2012) Foundations of garbled circuits. In: Proceedings of the 2012 ACM conference on computer and communications, security (CCS’12), pp 784–796
Ben-David A, Nisan N, Pinkas B (2008) Fariplaymp: a system for secure multi-party computation. In: Proceedings of the 15th ACM conference on Computer and communications, security (CCS’08), pp 257–266
Bogdanov D, Laur S, Willemson J (2008) Sharemind: a framework for fast privacy-preserving computations. In: Proceedings of 13th European symposium on research in computer, security (ESORICS’08), pp 192–206
Bogdanov D, Niitsoo M, Toft T, Willemson J (2012) High-performance secure multi-party computation for data mining applications. Int J Inf Secur 11:403–418
Breunig M, Kriegel H, Ng R et al (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD’00), pp 93–104
Canetti R (2001) Universally composable security: A new paradigm for cryptographic protocols. In: Proceedings of the 42nd IEEE symposium on foundations of Computer Science (FOCS’01), pp 136–145
Clifton C, Kantarcioglu M, Vaidya J et al (2002) Tools for privacy preserving distributed data mining. ACM SIGKDD Explor Newsl 4:28–34
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT press, Cambridge
Directive E (1995) Directive 95/46/EC of the european parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official Journal of the European Communities of 23 November 1995, p 31
Du W, Atallah M (2001) Privacy-preserving cooperative statistical analysis. In: Proceedings of the 17th annual computer security applications conference (ACSAC’01), pp 102–110
Goethals B, Laur S, Lipmaa H et al (2004) On private scalar product computation for privacy-preserving data mining. In: Proceedings of the 7th international conference on Information Security and Cryptology (ICISC’04), pp 104–120
Goldrich O (2004) Foundations of cryptography: vol 2, Basic Applications. Cambridge university press, Cambridge
Goldschlag D, Reed M, Syverson P (1999) Onion routing. Commun ACM 42:39–41
Henecka W, Sadeghi A, Schneider T et al (2010) Tasty: tool for automating secure two-party computations. In: Proceedings of the 17th ACM conference on Computer and communications, security (CCS’10), pp 451–462
Huang Y, Evans D, Katz J et al (2011) Faster secure two-party computation using garbled circuits. In: 20th USENIX Security Symposium
Jagannathan G, Wright R (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD’05), pp 593–599
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16:1026–1037
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data, bases (VLDB’98), pp 392–403
Kolesnikov V, Sadeghi A, Schneider T (2009) Improved garbled circuit building blocks and applications to auctions and computing minima. In: Proceedings of the 8th international conference on cryptology and, network security (CANS’09), pp 1–20
Kreuter B, Shelat A, Shen C (2012) Billion-gate secure computation with malicious adversaries. In: Proceedings of the 21st USENIX conference on security symposium
Laur S, Willemson J, Zhang B (2011) Round-efficient oblivious database manipulation. In: Proceedings of the 14th international conference on information, security (ISC’11), pp 262–277
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Proceedings of the 20th annual international cryptology conference (CRYPTO’00), pp 36–54
Lindell Y, Pinkas B (2004) A proof of Yao’s protocol for secure two-party computation. In: Electronic Colloquium on Computational Complexity—ECCC, No. 063
Lindell Y, Pinkas B, Smart N (2008) Implementing two-party computation efficiently with security against malicious adversaries. In: Proceedings of the 6th international conference on Security and Cryptography for Networks (SCN’08), pp 2–20
Malkhi D, Nisan N, Pinkas B (2004) Fairplay-secure two-party computation systems. In: Proceedings of the 14th USENIX conference on Security symposium, pp 287–302
McLachlan J, Tran A, Hopper N et al (2009) Scalable onion routing with torsk. In: Proceedings of the 16th ACM conference on Computer and communications security (CCS’09), pp 590–599
Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), pp 211–218
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of the 17th international conference on theory and application of cryptographic, techniques (EUROCRYPT’99), pp 223–238
Pinkas B, Schneider T, Smart N (2009) Secure two-party computation is practical. In; Proceedings of the 15th International Conference on the theory and application of cryptology and information, Security (ASIACRYPT’09), pp 250–267
Qi Y, Atallah M (2008) Efficient privacy-preserving k-nearest neighbor search. In: Proceedings of the 28th International Conference on Distributed Computing Systems (ICDCS’08), pp 311–319
Ramaswame S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD’00), pp 427–438
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’02), pp 639–644
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’03), pp 206–215
Vaidya J, Clifton C (2004) Privacy preserving naive bayes classifier for vertically partitioned data. In: Proceedings of the 2004 SIAM international conference on data mining (SDM’04), pp 522–526
Vaidya J, Clifton C (2004) Privacy-preserving outlier detection. In: Proceedings of the 4th IEEE international conference on data mining (ICDM’04), pp 233–240
Vaidya J, Clifton C (2009) Privacy-preserving kth element score over vertically partitioned data. IEEE Trans Knowl Data Eng 21:253–258
Wikstrom D (2004) A universally composable mix-net. In: Proceedings of the 1st theory of cryptography conference (TCC’04), pp 317–335
Yao A (1986) How to generate and exchange secrets. In: Proceedings of the 27th annual symposium on foundations of computer science (FOCS’86), pp 162–167
Zhang N, Wang S, Zhao W (2005) A new scheme on privacy-preserving data classification. Proceedings of the 11th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’05), pp 374–383
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work was supported by the National Natural Science Foundation of China (Nos. 60903217 & 61202407 & 61003044), the Fundamental Research Funds for the Central Universities (Nos. WK0110000027 & WK0110000033), the Guangdong Province Strategic Cooperation Project with the Chinese Academy of Sciences (No. 2012B090400013) and the Natural Science Foundation of Jiangsu Province of China (No. BK2011357).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, L., Huang, L., Yang, W. et al. Privacy-preserving LOF outlier detection. Knowl Inf Syst 42, 579–597 (2015). https://doi.org/10.1007/s10115-013-0692-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0692-0