Skip to main content

Anonymizing k-NN Classification on MapReduce

  • Conference paper
  • First Online:
Book cover Mobile Networks and Management (MONAMI 2017)

Abstract

Data analytics scenario such as a classification algorithm plays an important role in data mining to identify a category of a new observation and is often used to drive new knowledge. However, classification algorithm on a big data analytics platform such as MapReduce and Spark, often runs on plain text without an appropriate privacy protection mechanism. This leaves user’s data to be vulnerable from unauthorized access and puts the data at a great privacy risk. To address such concern, we propose a new novel k-NN classifier which can run on an anonymized dataset on MapReduce platform. We describe new Map and Reduce algorithms to produce different anonymized datasets for k-NN classifier. We also illustrate the details of experiments we performed on the multiple anonymized data sets to understand the effects between the level of privacy protection (data privacy) and the high-value insights (data utility) trade-off before and after data anonymization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  Google Scholar 

  2. Sweeney, L.: K-anonymity: a model for protecting privacy 1. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10, 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  3. Zhang, X., Yang, L.T., Liu, C., Chen, J.: A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans. Parallel Distrib. Syst. 25, 363–373 (2014)

    Article  Google Scholar 

  4. Bazai, S.U., Jang-Jaccard, J., Zhang, X.: A privacy preserving platform for MapReduce. In: Batten, L., Kim, D.S., Zhang, X., Li, G. (eds.) ATIS 2017. CCIS, vol. 719, pp. 88–99. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-5421-1_8

    Chapter  Google Scholar 

  5. Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J.: Proximity-aware local-recoding anonymization with MapReduce for scalable big data privacy preservation in cloud. IEEE Trans. Comput. 64, 2293–2307 (2015)

    Article  MathSciNet  Google Scholar 

  6. Stupar, A., Michel, S., Schenkel, R.: RankReduce - processing K-nearest neighbor queries on top of mapreduce. In: CEUR Workshop Proceedings. vol. 630, pp. 13–18 (2010)

    Google Scholar 

  7. Zhang, C., Li, F., Jestes, J.: Efficient parallel k NN joins for large data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology - EDBT 2012, p. 38 (2012)

    Google Scholar 

  8. Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: Proceedings - International Conference on Data Engineering, pp. 429–440 (2009)

    Google Scholar 

  9. Baryalai, M., Jang-Jaccard, J., Liu, D.: Towards privacy-preserving classification in neural networks. In: 2016 14th Annual Conference on Privacy, Security and Trust, PST 2016, pp. 392–399 (2016)

    Google Scholar 

  10. Xia, D., Li, H., Wang, B., Li, Y., Zhang, Z.: A map reduce-based nearest neighbor approach for big-data-driven traffic flow prediction. IEEE Access. 4, 2920–2934 (2016)

    Article  Google Scholar 

  11. Zhou, L., Wang, H., Wang, W.: Parallel implementation of classification algorithms based on cloud computing environment. TELKOMNIKA Indones. J. Elect. Eng. 10, 1087–1092 (2012)

    Google Scholar 

  12. Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX conference on Networked systems design and implementation, p. 20 (2010)

    Google Scholar 

  13. Frank, A., Asuncion, A.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, Irvine, CA. 2008, (2010)

  14. Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the 13th International Conference on Extending Database Technology - EDBT 2010, p. 123 (2010)

    Google Scholar 

  15. Maillo, J., Triguero, I., Herrera, F.: A MapReduce-based k-Nearest neighbor approach for big data classification. In: Proceedings - 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom. vol. 2, pp. 167–172 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sibghat Ullah Bazai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bazai, S.U., Jang-Jaccard, J., Wang, R. (2018). Anonymizing k-NN Classification on MapReduce. In: Hu, J., Khalil, I., Tari, Z., Wen, S. (eds) Mobile Networks and Management. MONAMI 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 235. Springer, Cham. https://doi.org/10.1007/978-3-319-90775-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90775-8_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90774-1

  • Online ISBN: 978-3-319-90775-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics