ABSTRACT
Decentralized online social networks enhance users’ privacy by empowering them to control their data. However, these networks mostly lack for practical solutions for building recommender systems in a privacy-preserving manner that help to improve the network’s services. Association rule mining is one of the basic building blocks for many recommender systems. In this paper, we propose an efficient approach enabling rule mining on distributed data. We leverage the Metropolis-Hasting random walk sampling and distributed FP-Growth mining algorithm to maintain the users’ privacy. We evaluate our approach on three real-world datasets. Results reveal that the approach achieves high average precision scores () for as low as 1% sample size in well-connected social networks with remarkable reduction in communication and computational costs.
- Ziv Bar-Yossef, Alexander Berg, Steve Chien, Jittat Fakcharoenphol, and Dror Weitz. 2000. Approximating aggregate queries about web pages via random walks. In VLDB. 535–544.Google Scholar
- Salvatore A Catanese, Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, and Alessandro Provetti. 2011. Crawling facebook for social network analysis purposes. In Proceedings of the international conference on web intelligence, mining and semantics. 1–8.Google ScholarDigital Library
- Harendra Chahar, B. N. Keshavamurthy, and Chirag Modi. 2017. Privacy-preserving distributed mining of association rules using Elliptic-curve cryptosystem and Shamir’s secret sharing scheme. Sadhana - Academy Proceedings in Engineering Sciences 42, 12 (2017), 1997–2007. https://doi.org/10.1007/s12046-017-0743-4Google ScholarCross Ref
- Venkatesan T Chakaravarthy, Vinayaka Pandit, and Yogish Sabharwal. 2009. Analysis of sampling techniques for association rule mining. In Proceedings of the 12th international conference on database theory. ACM, 276–283.Google ScholarDigital Library
- Kun-Ta Chuang, Ming-Syan Chen, and Wen-Chieh Yang. 2005. Progressive sampling for association rules based on sampling error estimation. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 505–515.Google ScholarDigital Library
- Jörg Daubert, Leon Bock, Panayotis Kikirasy, Max Mühlhauser, and Mathias Fischer. 2014. Twitterize: Anonymous micro-blogging. In 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA). IEEE, 817–823.Google ScholarCross Ref
- Aggelos Delis, Vassilios S Verykios, and Achilleas A Tsitsonis. 2010. A data perturbation approach to sensitive classification rule hiding. In Proceedings of the 2010 ACM Symposium on Applied Computing. ACM, 605–609.Google ScholarDigital Library
- Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke. 2004. Privacy preserving mining of association rules. Information Systems 29, 4 (2004), 343–364.Google ScholarDigital Library
- Vikram Garg, Anju Singh, and Divakar Singh. 2014. A survey of association rule hiding algorithms. Proceedings - 2014 4th International Conference on Communication Systems and Network Technologies, CSNT 2014(2014), 404–407. https://doi.org/10.1109/CSNT.2014.86Google ScholarDigital Library
- Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. 2010. Walking in facebook: A case study of unbiased sampling of osns. In 2010 Proceedings IEEE Infocom. Ieee, 1–9.Google ScholarCross Ref
- The Guardian. 2018. Facebook to contact 87 million users affected by data breach. https://www.theguardian.com/technology/2018/apr/08/facebook-to-contact-the-87-million-users-affected-by-data-breach. [Online; accessed 11-Dec-2018].Google Scholar
- Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM sigmod record 29, 2 (2000), 1–12.Google Scholar
- Pili Hu and Wing Cheong Lau. 2013. A survey and taxonomy of graph sampling. arXiv preprint arXiv:1308.5865(2013).Google Scholar
- Murat Kantarcioglu and Chris Clifton. 2004. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering 16, 9(2004), 1026–1037. https://doi.org/10.1109/TKDE.2004.45Google ScholarDigital Library
- Sotiris Kotsiantis and Dimitris Kanellopoulos. 2006. Association Rules Mining: A Recent Overview. Greece - Science 32, 1 (2006), 71–82. https://doi.org/10.4103/0377-4929.94858Google Scholar
- Michal Kryczka, Ruben Cuevas, Carmen Guerrero, Eiko Yoneki, and Arturo Azcorra. 2010. A first step towards user assisted online social networks. In Proceedings of the 3rd workshop on social network systems. 1–6.Google ScholarDigital Library
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 631–636.Google ScholarDigital Library
- Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, and Edward Y Chang. 2008. Pfp: parallel fp-growth for query recommendation. In Proceedings of the 2008 ACM conference on Recommender systems. 107–114.Google ScholarDigital Library
- Rong-Hua Li, Jeffrey Xu Yu, Lu Qin, Rui Mao, and Tan Jin. 2015. On random walk based graph sampling. In 2015 IEEE 31st international conference on data engineering. IEEE, 927–938.Google ScholarCross Ref
- Wanying Luo, Qi Xie, and Urs Hengartner. 2009. Facecloak: An architecture for user privacy on social networking sites. In Computational Science and Engineering, 2009. CSE’09. International Conference on, Vol. 3. IEEE, 26–33.Google ScholarDigital Library
- David McCandless. 2019. World’s Biggest Data Breaches & Hacks. https://informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/. [Online; accessed 02-Jan-2019].Google Scholar
- Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. Equation of state calculations by fast computing machines. The journal of chemical physics 21, 6 (1953), 1087–1092.Google ScholarCross Ref
- Alan Mislove, Massimiliano Marcon, Krishna P Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, 29–42.Google ScholarDigital Library
- Miquel Montaner, Beatriz López, and Josep Lluís De La Rosa. 2003. A taxonomy of recommender agents on the internet. Artificial intelligence review 19, 4 (2003), 285–330.Google Scholar
- Srinivasan Parthasarathy. 2002. Efficient progressive sampling for association rules. In null. IEEE, 354.Google ScholarDigital Library
- Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. ACM, 390–403.Google ScholarDigital Library
- Matthew J Salganik and Douglas D Heckathorn. 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological methodology 34, 1 (2004), 193–240.Google Scholar
- Maxwell Salzberg. 2010. Kickstarter Pitch. https://web.archive.org/web/20110814222702http://blog.joindiaspora.com/2010/04/27/kickstarter-pitch.html. https://web.archive.org/web/20110814222702http://blog.joindiaspora.com/2010/04/27/kickstarter-pitch.html Online, accessed 21.03.2019.Google Scholar
- Richard L Scheaffer, William Mendenhall III, R Lyman Ott, and Kenneth G Gerow. 2011. Elementary survey sampling. Cengage Learning.Google Scholar
- Xiujin Shi, Shaozong Chen, and Hui Yang. 2017. DFPS: Distributed FP-growth algorithm based on Spark. In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, 1725–1731.Google Scholar
- Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. 2006. Sampling techniques for large, dynamic graphs. In Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications. IEEE, 1–6.Google ScholarCross Ref
- Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. 2008. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Transactions on Networking 17, 2 (2008), 377–390.Google ScholarDigital Library
- Tamir Tassa. 2014. Secure mining of association rules in horizontally distributed databases. IEEE Transactions on Knowledge and Data Engineering 26, 4(2014), 970–983. https://doi.org/10.1109/TKDE.2013.41 arxiv:1106.5113Google ScholarDigital Library
- Hannu Toivonen. 1996. Sampling large databases for association rules. In VLDB, Vol. 96. 134–145.Google Scholar
- Theja Tulabandhula, Shailesh Vaya, and Aritra Dhar. 2017. Privacy-preserving Targeted Advertising. arXiv preprint arXiv:1710.03275(2017).Google Scholar
- Aidmar Wainakh, Tim Grube, Jörg Daubert, and Max Mühlhäuser. 2019. Efficient privacy-preserving recommendations based on social graphs. In Proceedings of the 13th ACM Conference on Recommender Systems. 78–86.Google ScholarDigital Library
- Aidmar Wainakh, Tim Grube, Jorg Daubert, Carsten Porth, and Max Muhlhauser. 2019. Tweet beyond the Cage: A Hybrid Solution for the Privacy Dilemma in Online Social Networks. In 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 1–6.Google ScholarDigital Library
- Tianyi Wang, Yang Chen, Zengbin Zhang, Peng Sun, Beixing Deng, and Xing Li. 2010. Unbiased sampling in directed social graph. In Proceedings of the ACM SIGCOMM 2010 conference. 401–402.Google ScholarDigital Library
- Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui, Beixing Deng, and Xing Li. 2011. Understanding graph sampling algorithms for social network analysis. In 2011 31st international conference on distributed computing systems workshops. IEEE, 123–128.Google ScholarDigital Library
- Yongqing Wang and Yan Chen. 2012. A new association rules mining method based on ontology theory. In 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI). IEEE, 287–291.Google ScholarCross Ref
- Zhongjie Zhang, Witold Pedrycz, and Jian Huang. 2017. Efficient frequent itemsets mining through sampling and information granulation. Engineering Applications of Artificial Intelligence 65 (2017), 119–136.Google ScholarDigital Library
Recommendations
Survey on Privacy Preserving Association Rule Data Mining
The progress in the development of data mining techniques achieved in the recent years is gigantic. The collative data mining techniques makes the privacy preserving an important issue. The ultimate aim of the privacy preserving data mining is to ...
Privacy preserving frequent itemset mining: Maximizing data utility based on database reconstruction
AbstractThe process of frequent itemset mining (FIM) within large-scale databases plays a significant part in many knowledge discovery tasks, where, however, potential privacy breaches are possible. Privacy preserving frequent itemset mining (...
Collusion-Free Privacy Preserving Data Mining
Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources ...
Comments