Abstract
Personal and organizational data are getting larger in volume with respect to time. Due to the importance of data for organisations, effective and efficient management and categorization of data need a special focus. Understanding and applying data security policies to the appropriate data types therefore is one of the core concerns in large organisations such as cloud service providers. With data classification, the identification of security requirements for the data can be accomplished without manual intervention where the encryption process is applied only to the confidential data thus saving encryption time, decryption time, storage and processing power. The proposed data classification approach is to reduce the network traffic, the additional data movement, the overload, and the storage place for confidential data can be decided where security requirements of the confidential data are fulfilled. In this paper, an intelligent data classification approach is presented for predicting the confidentiality/sensitivity level of the data in a file based on the corporate objective and government policies/rules. An enhanced version of the k-NN algorithm is also proposed to reduce the computational complexity of the traditional k-NN algorithm at data classification phase. The proposed algorithm is called Training dataset Filtration-kNN (TsF-kNN). The experimental results show that data in a file can be classified into confidential and non-confidential classes and TsF-kNN algorithm has better performance against the traditional k-NN and Naïve Bayes algorithm.














Similar content being viewed by others
References
Certz, M., Jajodia, S. (eds.): Security re-egnineering for databases: concepts and techinques. Handbook of Database Security- Applications and Trends. Sringer, USA (2008)
Chen, K., Guo, S.: PerturBoost: Practical Confidential Classifier Learning in the cloud. In: 13th International Conference on Data Mining, pp. 991–996, (2013)
Chen, K., Guo, S.: RASP-Boost: confidential boosting-model learning with purturbed data in the cloud. IEEE Trans. Cloud Comput. 1(1) (2015)
Chiang, T., Lo, H., Lin, S.D.: A ranking-based KNN approach for multi-label classification. In: Asia Conference on Machine Learning, pp. 81–96 (2012)
Choi, Y.B., Crowgey, R.L., Price, J.M., VanPelt, J.S.: The State-of-the-art of mobile payment architecture and emerging issues. Int. J. Electron. Financ. 1(1), 94–103 (2006)
Chung, M., Gertz, M., Levitt, K.: Demids: a misuse detection system for database systems, Integrity and Internal Control in Information Systems, Strategic views on the Need of Control. IFIP TCII WG11.5 Third working conference on Integrity and Internal Control in Information Systerms, Amsterdam, The Natherland, pp. 159–178. Kluwer Academic Publishers, Norwell, (2000)
Clark, D.L.: The Manager’s Defense Guide. Addison-Wesly, USA (2003)
Fabrizio, A.: Fast condensed nearest neighbor rule, Technical report, In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 25–32, (2005)
Frank, S.: Data Classification for Cloud Readiness, pp. 1–19. Published by, Microsoft (2014)
Gates, G.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18, 431–433 (1972)
Gibbon, D., Moore, R. K., Winski, R. (Eds.) Handbook of Standards and Resources for Spoken Language Systems—Google Books, (1997)
Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)
Hayes, B.: Cloud computing. Commun. ACM 51(5), 9–11 (2008)
He, J., Tan, A., Tan, C.: A Comparative Study on Chinese Text Categorization Methods. In: PRICAI 200 Workshop on Text and Web Mining, pp. 24–35, (2000)
Hosmer, H.: Using fuzzy logic to represent security policies in the multipolicy paradigm. ACM SIGSAC Rev. 10(4), 12–21 (1992)
Hosmer, H.: Using fuzzy logic to represent security policies in the multipolicies paradigm. ACM SIGSAC Rev. 10(4), 12–21 (1992)
Jain, A.K., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (2002)
Jiang, W., Liu, Q.: Dependency Parsing and Projection Based on Word-Pair Classification. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 12–20, (2009)
Kamra, A., Terzi, E., Bertino, E.: Detecting anomalous access patterns in relational databases. VLDB J. 17(5), 1063–1077 (2007)
Kerdprasop, N., Kerdprasop, K.: Discrete decision tree induction to avoud overfiting on categorical. ... and intelligent systems, and 10th WSEAS .... pp. 247–252, (2011)
Krasimir, G., Iankiev, Y., Wu, Venu, G.: Improved k-nearest neighbor classification. Pattern Recogn. 35, 2311–2318 (2002)
Liu, F., Ng, K., W., Zhang, W. Encrypted associatioin rules mining for outsourced data mining. In: 29th International Conference on Advanced Information Networking and Application, pp. 550–557, (2015)
Masoud, M.: Classification of data based on the a fuzzy logic system. In: International Conference on Computational Intelligence for Modelling Control and Automation, pp. 1288–1292, (2008)
Masoud, M., Dimitios, H.: Data classification process for security and privacy based on a fuzzy logic classifier. Int. J. Electron. Financ. 3(4), 374–386 (2009)
Michael, G., Madhavi, G.: Security re-engineering for databases: concepts and techniques, USA. In: Michael, G., Sushil, J. (eds.) Handbook of Database Security, pp. 267–296. Springer, Berlin (2008)
Randall, W., Tony, R.M.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Samanthula, B.K., Elmehdwi, Y., Jiang, W.: k-nearest neighbor classification over semantically secure encrypted relational data. IEEE Comput. Soc. 27(5), 1261–1273 (2015)
Spalka, A., Lehndardt, J. A comprehensive approach to anomaly detection in relational databases. In: Data and Applications Security 205, LNCS 3654, Springer, Germany, pp. 207–221, (2005)
Spalka, A., Lehnhardt, J.: A Comprehensive Approach to Anomaly Detection in Relational Databases. Data and Applications Security. Springer, Heidelberg (2005)
Steve, S., Uwe, B., Oliver, K., Frank, L., Tobias, U.: Cloud data patterns for confidentiality. In: Proceedings of the 2nd International Conference on Cloud Computing and Service Science, pp. 387–394, (2012)
Tan, C., Wang, Y., Lee, C.: The use of bigrams to enhance text categorization. Inf. Process. Manag. 38(4), 529–546 (2002)
Tsuruoka, Y.: Developing a robust part-of-speech tagger for biomedical text. Lecture notes in computer science, vol. 3746, pp. 382–392 (2005)
Wu, Q.H., Nikolaidis, K., Goulermas, J.Y.: A class boundary preserving algorithm for data condensation. Pattern Recogn. 44, 704–715 (2011)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49, (1999)
Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. In: Procedings of the 10th International Conference on Database Systems for Advanced Applications, pp. 851–862, (2005)
Zhu, Y., Xu, R., Takagi, T. Secure k-NN computation on encrypted cloud data without sharing key with query users. In: Proceedings of the 2013 International Workshop on Security in Cloud Computing—Cloud Computing ’13, pp. 55–60, (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zardari, M.A., Jung, L.T. Data security rules/regulations based classification of file data using TsF-kNN algorithm. Cluster Comput 19, 349–368 (2016). https://doi.org/10.1007/s10586-016-0539-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0539-z