Skip to main content

Advertisement

Log in

Data security rules/regulations based classification of file data using TsF-kNN algorithm

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Personal and organizational data are getting larger in volume with respect to time. Due to the importance of data for organisations, effective and efficient management and categorization of data need a special focus. Understanding and applying data security policies to the appropriate data types therefore is one of the core concerns in large organisations such as cloud service providers. With data classification, the identification of security requirements for the data can be accomplished without manual intervention where the encryption process is applied only to the confidential data thus saving encryption time, decryption time, storage and processing power. The proposed data classification approach is to reduce the network traffic, the additional data movement, the overload, and the storage place for confidential data can be decided where security requirements of the confidential data are fulfilled. In this paper, an intelligent data classification approach is presented for predicting the confidentiality/sensitivity level of the data in a file based on the corporate objective and government policies/rules. An enhanced version of the k-NN algorithm is also proposed to reduce the computational complexity of the traditional k-NN algorithm at data classification phase. The proposed algorithm is called Training dataset Filtration-kNN (TsF-kNN). The experimental results show that data in a file can be classified into confidential and non-confidential classes and TsF-kNN algorithm has better performance against the traditional k-NN and Naïve Bayes algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Certz, M., Jajodia, S. (eds.): Security re-egnineering for databases: concepts and techinques. Handbook of Database Security- Applications and Trends. Sringer, USA (2008)

    Google Scholar 

  2. Chen, K., Guo, S.: PerturBoost: Practical Confidential Classifier Learning in the cloud. In: 13th International Conference on Data Mining, pp. 991–996, (2013)

  3. Chen, K., Guo, S.: RASP-Boost: confidential boosting-model learning with purturbed data in the cloud. IEEE Trans. Cloud Comput. 1(1) (2015)

  4. Chiang, T., Lo, H., Lin, S.D.: A ranking-based KNN approach for multi-label classification. In: Asia Conference on Machine Learning, pp. 81–96 (2012)

  5. Choi, Y.B., Crowgey, R.L., Price, J.M., VanPelt, J.S.: The State-of-the-art of mobile payment architecture and emerging issues. Int. J. Electron. Financ. 1(1), 94–103 (2006)

    Article  Google Scholar 

  6. Chung, M., Gertz, M., Levitt, K.: Demids: a misuse detection system for database systems, Integrity and Internal Control in Information Systems, Strategic views on the Need of Control. IFIP TCII WG11.5 Third working conference on Integrity and Internal Control in Information Systerms, Amsterdam, The Natherland, pp. 159–178. Kluwer Academic Publishers, Norwell, (2000)

  7. Clark, D.L.: The Manager’s Defense Guide. Addison-Wesly, USA (2003)

  8. Fabrizio, A.: Fast condensed nearest neighbor rule, Technical report, In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 25–32, (2005)

  9. Frank, S.: Data Classification for Cloud Readiness, pp. 1–19. Published by, Microsoft (2014)

  10. Gates, G.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18, 431–433 (1972)

    Article  Google Scholar 

  11. Gibbon, D., Moore, R. K., Winski, R. (Eds.) Handbook of Standards and Resources for Spoken Language Systems—Google Books, (1997)

  12. Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)

    Article  Google Scholar 

  13. Hayes, B.: Cloud computing. Commun. ACM 51(5), 9–11 (2008)

    Article  Google Scholar 

  14. He, J., Tan, A., Tan, C.: A Comparative Study on Chinese Text Categorization Methods. In: PRICAI 200 Workshop on Text and Web Mining, pp. 24–35, (2000)

  15. Hosmer, H.: Using fuzzy logic to represent security policies in the multipolicy paradigm. ACM SIGSAC Rev. 10(4), 12–21 (1992)

    Article  Google Scholar 

  16. Hosmer, H.: Using fuzzy logic to represent security policies in the multipolicies paradigm. ACM SIGSAC Rev. 10(4), 12–21 (1992)

    Article  Google Scholar 

  17. Jain, A.K., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (2002)

  18. Jiang, W., Liu, Q.: Dependency Parsing and Projection Based on Word-Pair Classification. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 12–20, (2009)

  19. Kamra, A., Terzi, E., Bertino, E.: Detecting anomalous access patterns in relational databases. VLDB J. 17(5), 1063–1077 (2007)

    Article  Google Scholar 

  20. Kerdprasop, N., Kerdprasop, K.: Discrete decision tree induction to avoud overfiting on categorical. ... and intelligent systems, and 10th WSEAS .... pp. 247–252, (2011)

  21. Krasimir, G., Iankiev, Y., Wu, Venu, G.: Improved k-nearest neighbor classification. Pattern Recogn. 35, 2311–2318 (2002)

    Article  MATH  Google Scholar 

  22. Liu, F., Ng, K., W., Zhang, W. Encrypted associatioin rules mining for outsourced data mining. In: 29th International Conference on Advanced Information Networking and Application, pp. 550–557, (2015)

  23. Masoud, M.: Classification of data based on the a fuzzy logic system. In: International Conference on Computational Intelligence for Modelling Control and Automation, pp. 1288–1292, (2008)

  24. Masoud, M., Dimitios, H.: Data classification process for security and privacy based on a fuzzy logic classifier. Int. J. Electron. Financ. 3(4), 374–386 (2009)

    Article  Google Scholar 

  25. Michael, G., Madhavi, G.: Security re-engineering for databases: concepts and techniques, USA. In: Michael, G., Sushil, J. (eds.) Handbook of Database Security, pp. 267–296. Springer, Berlin (2008)

    Google Scholar 

  26. Randall, W., Tony, R.M.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  MATH  Google Scholar 

  27. Samanthula, B.K., Elmehdwi, Y., Jiang, W.: k-nearest neighbor classification over semantically secure encrypted relational data. IEEE Comput. Soc. 27(5), 1261–1273 (2015)

    Google Scholar 

  28. Spalka, A., Lehndardt, J. A comprehensive approach to anomaly detection in relational databases. In: Data and Applications Security 205, LNCS 3654, Springer, Germany, pp. 207–221, (2005)

  29. Spalka, A., Lehnhardt, J.: A Comprehensive Approach to Anomaly Detection in Relational Databases. Data and Applications Security. Springer, Heidelberg (2005)

    Google Scholar 

  30. Steve, S., Uwe, B., Oliver, K., Frank, L., Tobias, U.: Cloud data patterns for confidentiality. In: Proceedings of the 2nd International Conference on Cloud Computing and Service Science, pp. 387–394, (2012)

  31. Tan, C., Wang, Y., Lee, C.: The use of bigrams to enhance text categorization. Inf. Process. Manag. 38(4), 529–546 (2002)

    Article  MATH  Google Scholar 

  32. Tsuruoka, Y.: Developing a robust part-of-speech tagger for biomedical text. Lecture notes in computer science, vol. 3746, pp. 382–392 (2005)

  33. Wu, Q.H., Nikolaidis, K., Goulermas, J.Y.: A class boundary preserving algorithm for data condensation. Pattern Recogn. 44, 704–715 (2011)

    Article  MATH  Google Scholar 

  34. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49, (1999)

  35. Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. In: Procedings of the 10th International Conference on Database Systems for Advanced Applications, pp. 851–862, (2005)

  36. Zhu, Y., Xu, R., Takagi, T. Secure k-NN computation on encrypted cloud data without sharing key with query users. In: Proceedings of the 2013 International Workshop on Security in Cloud Computing—Cloud Computing ’13, pp. 55–60, (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munwar Ali Zardari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zardari, M.A., Jung, L.T. Data security rules/regulations based classification of file data using TsF-kNN algorithm. Cluster Comput 19, 349–368 (2016). https://doi.org/10.1007/s10586-016-0539-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0539-z

Keywords