Abstract
Data analytics scenario such as a classification algorithm plays an important role in data mining to identify a category of a new observation and is often used to drive new knowledge. However, classification algorithm on a big data analytics platform such as MapReduce and Spark, often runs on plain text without an appropriate privacy protection mechanism. This leaves user’s data to be vulnerable from unauthorized access and puts the data at a great privacy risk. To address such concern, we propose a new novel k-NN classifier which can run on an anonymized dataset on MapReduce platform. We describe new Map and Reduce algorithms to produce different anonymized datasets for k-NN classifier. We also illustrate the details of experiments we performed on the multiple anonymized data sets to understand the effects between the level of privacy protection (data privacy) and the high-value insights (data utility) trade-off before and after data anonymization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Sweeney, L.: K-anonymity: a model for protecting privacy 1. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10, 557–570 (2002)
Zhang, X., Yang, L.T., Liu, C., Chen, J.: A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans. Parallel Distrib. Syst. 25, 363–373 (2014)
Bazai, S.U., Jang-Jaccard, J., Zhang, X.: A privacy preserving platform for MapReduce. In: Batten, L., Kim, D.S., Zhang, X., Li, G. (eds.) ATIS 2017. CCIS, vol. 719, pp. 88–99. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-5421-1_8
Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J.: Proximity-aware local-recoding anonymization with MapReduce for scalable big data privacy preservation in cloud. IEEE Trans. Comput. 64, 2293–2307 (2015)
Stupar, A., Michel, S., Schenkel, R.: RankReduce - processing K-nearest neighbor queries on top of mapreduce. In: CEUR Workshop Proceedings. vol. 630, pp. 13–18 (2010)
Zhang, C., Li, F., Jestes, J.: Efficient parallel k NN joins for large data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology - EDBT 2012, p. 38 (2012)
Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: Proceedings - International Conference on Data Engineering, pp. 429–440 (2009)
Baryalai, M., Jang-Jaccard, J., Liu, D.: Towards privacy-preserving classification in neural networks. In: 2016 14th Annual Conference on Privacy, Security and Trust, PST 2016, pp. 392–399 (2016)
Xia, D., Li, H., Wang, B., Li, Y., Zhang, Z.: A map reduce-based nearest neighbor approach for big-data-driven traffic flow prediction. IEEE Access. 4, 2920–2934 (2016)
Zhou, L., Wang, H., Wang, W.: Parallel implementation of classification algorithms based on cloud computing environment. TELKOMNIKA Indones. J. Elect. Eng. 10, 1087–1092 (2012)
Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX conference on Networked systems design and implementation, p. 20 (2010)
Frank, A., Asuncion, A.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, Irvine, CA. 2008, (2010)
Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the 13th International Conference on Extending Database Technology - EDBT 2010, p. 123 (2010)
Maillo, J., Triguero, I., Herrera, F.: A MapReduce-based k-Nearest neighbor approach for big data classification. In: Proceedings - 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom. vol. 2, pp. 167–172 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Bazai, S.U., Jang-Jaccard, J., Wang, R. (2018). Anonymizing k-NN Classification on MapReduce. In: Hu, J., Khalil, I., Tari, Z., Wen, S. (eds) Mobile Networks and Management. MONAMI 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 235. Springer, Cham. https://doi.org/10.1007/978-3-319-90775-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-90775-8_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90774-1
Online ISBN: 978-3-319-90775-8
eBook Packages: Computer ScienceComputer Science (R0)