Skip to main content
Log in

Privacy-preserving hybrid collaborative filtering on cross distributed data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Data collected for collaborative filtering (CF) purposes might be cross distributed between two online vendors, even competing companies. Such corporations might want to integrate their data to provide more precise and reliable recommendations. However, due to privacy, legal, and financial concerns, they do not desire to disclose their private data to each other. If privacy-preserving measures are introduced, they might decide to generate predictions based on their distributed data collaboratively. In this study, we investigate how to offer hybrid CF-based referrals with decent accuracy on cross distributed data (CDD) between two e-commerce sites while maintaining their privacy. Our proposed schemes should prevent data holders from learning true ratings and rated items held by each other while still allowing them to provide accurate CF services efficiently. We perform real data-based experiments to evaluate our proposals in terms of accuracy. The results show that the proposed methods are able to provide precise predictions. Moreover, we analyze our schemes in terms of privacy and supplementary costs. We demonstrate that our schemes are secure, and online overhead costs due to privacy concerns are insignificant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C, Yu, PS (eds) (2008) Privacy-preserving data mining: models and algorithms. Springer Science + Business Media, NY

    Google Scholar 

  2. Amirbekyan A, Estivill-Castro V (2009) Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets. Knowl Inf Syst 21(3): 327–363

    Article  Google Scholar 

  3. Bansal A, Chen T, Zhong S (2010) Privacy-preserving back-propagation neural network learning over arbitrarily partitioned data. Neural Comput Appl 20(1): 143–150

    Google Scholar 

  4. Bhowmick SS, Gruenwald L, Iwaihara M et al. (2006) PRIVATE-IYE: a framework for privacy-preserving data integration. In: Proceedings of the 22nd international conference on data engineering workshops. Atlanta, GA, April 2006, p 91

  5. Canny J (2002) Collaborative filtering with privacy via factor analysis. In: Proceedings of the international ACM SIGIR conference. Tampere, Finland, August 2002, pp 238–245

  6. Canny J (2002a) Collaborative filtering with privacy. In: Proceedings of the IEEE symposium on security and privacy. Oakland, CA, pp 45–57

  7. Chang J, Hung LP, Ho CL (2007) An anticipation model of potential customers’ purchasing behavior based on clustering analysis and association rules analysis. Expert Syst Appl 32(3): 753–764

    Article  Google Scholar 

  8. Clifton C, Doan A, Elmagarmid A et al. (2004) Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. Paris, France, June 2004, pp 19–26

  9. Duan Y, Canny J (2008) Practical private computation and zero-Knowledge tools for privacy-preserving distributed data mining. In: Proceedings of SDM 2008 confererence. Atlanta, GA, USA, April 2008, pp 265–276

  10. Even S, Goldreich O, Lempel A (1985) A randomized protocol for signing contracts. Commun ACM 28: 637–647

    Article  MathSciNet  Google Scholar 

  11. Evfimievski A (2002) Randomization in privacy-preserving data mining. SIGKDD Explor 4(2): 43–48

    Article  Google Scholar 

  12. Goldberg K, Roeder T, Gupta D et al (2001) Eigentaste: a constant time collaborative filtering algorithm. Inf Retr 4(2): 133–151

    Article  MATH  Google Scholar 

  13. Gupta D, Digiovanni M, Narita H et al (1999) Jester 2.0: a new linear-time collaborative filtering algorithm applied to jokes. In: Proceedings of the workshop on recommender systems: algorithms and evaluation, international ACM SIGIR conference. Berkeley, CA, USA, August 1999, pp 291–292

  14. Han S, Ng WK (2007) Multiparty privacy-preserving decision trees for arbitrarily partitioned data. Int J Intell Control Syst 12(4): 351–358

    Google Scholar 

  15. Herlocker JL, Konstan JA, Borchers A et al (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the ACM SIGIR conference. Berkeley, CA, USA, pp 230–237

  16. Huang CY, Shen YC, Chiang IP et al (2007) Characterizing web users’ online information behavior. J Am Soc Inf Sci Technol 58(13): 1988–1997

    Article  Google Scholar 

  17. Inan A, Kaya SV, Saygin Y et al (2007) Privacy-preserving clustering on horizontally partitioned data. Data Knowl Eng 63(3): 646–666

    Article  Google Scholar 

  18. Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining. Chicago, IL, USA, August 2005, pp 593–599

  19. Kaleli C, Polat H (2007) Providing naïve Bayesian classifier-based private recommendations on partitioned data. Lecture Notes in Computer Science 4702: 515–522

    Article  Google Scholar 

  20. Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. Trans Knowl Data Eng 16(9): 1026–1037

    Article  Google Scholar 

  21. Kantarcioglu M, Clifton C (2004) privately computing a distributed knn classifier. Lecture Notes in Computer Science 3202: 279–290

    Article  Google Scholar 

  22. Kantarcioglu M, Vaidya JS (2003) Privacy-preserving naïve bayes classifier for horizontally partitioned data. In: Proceedings of the IEEE ICDM workshop on privacy preserving data mining. Melbourne, FL, USA, November 2003, pp 3–9

  23. Kargupta H, Das K, Liu K (2007) Multi-party privacy-preserving distributed data mining using a game theoretic framework. Lecture Notes in Computer Science 4702: 523–531

    Article  Google Scholar 

  24. Kaya SV, Pedersen TB, Savas E et al (2009) Efficient Privacy-preserving Distributed Clustering based on Secret Sharing. Lecture Notes in Computer Science 4819: 280–291

    Article  Google Scholar 

  25. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of KDD 2008, Las Vegas, NV, USA, August 2008, pp 426–434

  26. Liang Z, Bo X, Jun G (2008) A hybrid approach to collaborative filtering for overcoming data sparsity. In: Proceedings of the 9th international conference on signal processing. Beijing, China, October 2008, pp 1595–1599

  27. Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1): 68–81

    Article  Google Scholar 

  28. Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy-preserving distributed data mining. Trans Knowl Data Eng 18(1): 92–106

    Article  Google Scholar 

  29. Liu P, Chetal A (2005) Trust-based secure information sharing between federal government agencies. J Am Soc Inf Sci Technol 56(3): 283–298

    Article  Google Scholar 

  30. Luo H, Fan J, Lin X et al (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inf Syst 20(2): 157–185

    Article  Google Scholar 

  31. Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, FL, USA, November 2003, pp 211–218

  32. Naor M, Pinkas B (1999) Oblivious transfer and polynomial evaluation. In: Proceedings of the 31st ACM symposium on theory of computing. Atlanta, GA, USA, May 1999, pp 245–254

  33. Paillier P (1999) Public-key cryptosystems based on composite degree residue classes. Lecture Notes in Computer Science 1592: 223–238

    Article  MathSciNet  Google Scholar 

  34. Pennock DM, Horvitz E, Lawrence S et al (2000) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: Proceedings of the 16th conference on uncertainty in artificial intelligence. Stanford, CA, USA, July 2000, pp 473–480

  35. Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor 4(2): 12–19

    Article  Google Scholar 

  36. Prasad PK, Rangan CP (2007) Privacy-preserving BIRCH algorithm for clustering over arbitrarily partitioned databases. In: Proceedings of the ADMA 2007 conference. Harbin, China, August 2007, pp 146–157

  37. Polat H, Du W (2005) Privacy-preserving collaborative filtering on vertically partitioned data. Lecture Notes in Computer Science 3721: 651–658

    Article  Google Scholar 

  38. Polat H, Du W (2008) Privacy-preserving top-N recommendation on distributed data. J Am Soc Inf Sci Technol 59(7): 1093–1108

    Article  Google Scholar 

  39. Qiu L, Li Y, Wu X (2008) Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl Inf Syst 17(1): 99–120

    Article  Google Scholar 

  40. Rozenberg B, Gudes E (2006) Association rules mining in vertically partitioned databases. Data Knowl Eng 59(2): 378–396

    Article  Google Scholar 

  41. Sarwar BM, Karypis G, Konstan JA et al (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international world wide web conference. Hong Kong, May 2001, pp 285–295

  42. Shapira B, Elovici Y, Meshiach A et al (2005) PRAW—a PRivAcy model for the web. J Am Soc Inf Sci Technol 56(2): 159–172

    Article  Google Scholar 

  43. Su C, Bao F, Zhou J et al (2007) Privacy-preserving two-party k-means clustering via secure approximation. In: Proceedings of the 21st international conference on advanced information networking and applications workshops. Niagara Falls, Ontario, Canada, May 2007, pp 385–391

  44. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell Vol:2009

  45. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-based Syst 10(5): 557–570

    Article  MATH  MathSciNet  Google Scholar 

  46. Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19(2): 133–157

    Article  Google Scholar 

  47. Vaidya JS, Clifton C, Kantarcioglu M et al (2008) Privacy-preserving decision trees over vertically partitioned data. ACM Trans Knowl Discov Data 2(3): 1–27

    Article  Google Scholar 

  48. Vaidya JS, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference. Edmonton, Alberta, Canada, July 2002, pp 639–644

  49. Vaidya JS (2004) Privacy-preserving data mining over vertically partitioned data. PhD thesis, Purdue University, West Lafayette, IN, USA

  50. Van den Poel D, Buckinx W (2005) Predicting online purchasing behavior. Eur J Oper Res 166: 557–575

    Article  MATH  MathSciNet  Google Scholar 

  51. Wright RN, Yang Z (2004) Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference, Seattle, WA, USA, August 2004, pp 703–718

  52. Yakut I, Polat H (2010) Privacy-preserving SVD-based collaborative filtering on partitioned data. Int J Inf Tech Decis Mak 9(3): 473–502

    Article  MATH  Google Scholar 

  53. Yang W, Huang S (2008) Data privacy protection in multi-party clustering. Data Knowl Eng 67: 185–199

    Article  MathSciNet  Google Scholar 

  54. Yi X, Zhang Y (2009) Privacy-preserving naïve bayes classification on distributed data via semi-trusted mixers. Inf Syst 34(3): 371–380

    Article  Google Scholar 

  55. Yi X, Zhang Y (2007) Privacy-preserving distributed association rule mining via semi-trusted mixer. Data Knowl Eng 63(2): 550–567

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huseyin Polat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yakut, I., Polat, H. Privacy-preserving hybrid collaborative filtering on cross distributed data. Knowl Inf Syst 30, 405–433 (2012). https://doi.org/10.1007/s10115-011-0395-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0395-3

Keywords

Navigation