ABSTRACT
In the last few years, Hadoop become a "de facto" standard to process large scale data as an open source distributed system. With combination of data mining techniques, Hadoop improve data analysis utility. That is why, there are amount of research is studied to apply data mining technique to mapreduce framework in Hadoop. However, data mining have a possibility to cause a privacy violation and this threat is a huge obstacle for data mining using Hadoop. To solve this problem, numerous studies have been conducted. However, existing studies were insufficient and had several drawbacks. In this paper, we propose the privacy preserving data mining technique in Hadoop that is solve privacy violation without utility degradation. We focus on association rule mining algorithm that is representative data mining algorithm. We validate the proposed technique to satisfy performance and preserve data privacy through the experimental results.
- R. Agrawal and R. Srikant, "Privacy-Preserving Data Mining," In Proc. of Conf. on Management of Data, ACM SIGMOD, Dallas, TX, pp.439--450, 2000 Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu, "Privacy-Preserving Data Mining: A Survey," Handbook of Database Security : Application and Trends, Gertz, M. and Jajodia, S. (Eds.), pp.431--460, Springer, 2008.Google Scholar
- K. Chen and L. Liu, "Privacy Preserving Data Classification with Rotation Perturbation," In Proc. of the 5th IEEE Int'l Conf. on Data Minig, Atlanta GA, pp.589--592,. 2005. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu,"Privacy-Preserving Data Publishing: A Survey of Recent Developments,"ACM Computing Surveys, vol.42, no.4, pp.14--53,. 2010. Google ScholarDigital Library
- Paranthaman, J., and T. Aruldoss Albert Victoire. "Hybrid techniques for Privacy preserv-ing in Data Mining." Life Science Journal 10.7s, 2013.Google Scholar
- A. Friedman and A. Schuster, "Data Mining with Differential Privacy," In Proc. of the 16th ACM Int'l Conf. on Knowledge Discovery and Data Mining, Washington, DC, pp.493--502, Jul. 2010 Google ScholarDigital Library
- 9. Gurusamy, Kirubhakar, and Venkatesh Chakrapani. "An assessment of Identity Security in Data Mining." International Journa of Science and Modern Engineering (IJISME) 1.7, 29--31, 2013.Google Scholar
- Dwork, Cynthia. "Differential privacy", In: Automata, languages and programming. Springer Berlin Heidelberg, pp. 1--12, 2006. Google ScholarDigital Library
- Li, Haoran, et al. "Privacy Preserving RBF Kernel Support Vector Machine." BioMed re-search international, 2014.Google Scholar
- Wong KokSeng, KIM MyungHo. "Preserving Differential Privacy for Similarity Meas-urement in Smart Environments", The Scientific World Journal, 2014.Google Scholar
- A. C. Yao, "Protocols for Secure Computations,"In Proc. of the 23th IEEE Symp. on Foundations of Computer Science, Chicago, Illinois, pp.160--164, Nov. 1982. Google ScholarDigital Library
- J. Vaidya and C. Clifton, "Privacy Preserving Association Rule Mining in Vertically Partitioned Data," In Proc. of the 8th ACM Int'l Conf. on Knowledge Discovery and Data Min-ing, Alberta, Canada, pp.639--644, Jul. 2002. Google ScholarDigital Library
- W. Du and M. J. Atallah, "Privacy-Preserving Cooperative Statistical Analysis," In Proc. of the 17th Conf. on Annual Computer Security Applications, New Orleans, Louisiana, pp.102--110, Dec. 2001. Google ScholarDigital Library
- A.Evfimievski, R. Srikant, R. Agrawal, J.Gehrke, "Privacy Preserving Mining of Association Rules", Information Systems, VOl. 29, pp.343--364. 2004. Google ScholarDigital Library
- ApacheMahout machine learning library. http://mahout.apache.org/. Accessed on 10 Mar 2013.Google Scholar
- Ko SY, Jeon K, Morales R, "The hybrex model for confidentiality and privacy in cloud computing", In Proceedings of the 3rd USENIX conference on hot topics in cloud compu-ting (HotCloud'11), pp.1--5, 2011 Google ScholarDigital Library
- X Zhang, C Liu, S Nepal, C Yang, J Chen, "Privacy Preservation over Big Data in Cloud Systems", Security, Privacy and Trust in Cloud Systems, pp.239--257, 2014.Google ScholarCross Ref
Index Terms
- Hiding a Needle in a Haystack: Privacy Preserving Apriori algorithm inMapReduce Framework
Recommendations
DARM: a privacy-preserving approach for distributed association rules mining on horizontally-partitioned data
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications SymposiumExtracting association rules helps data owners to unveil hidden patterns from their data for the purpose of analyzing and predicting the behavior of their clients. However, mining association rules in a distributed environment is not a trivial task due ...
Compressed Bitmaps Based Frequent Itemsets Mining on Hadoop
INFOS '16: Proceedings of the 10th International Conference on Informatics and SystemsFrequent itemsets mining is one of the interesting applications of data mining. Recently data mining has got a great deal of attention due to the explosive growth in data and the economic and scientific need for turning such data into useful ...
Frequent pattern mining on stream data using Hadoop CanTree-GTree
The need for knowledge discovery from real-time stream data is continuously increasing nowadays and processing of transactions for mining patterns needs efficient data structures and algorithms. We propose a time-efficient Hadoop CanTree-GTree algorithm,...
Comments