Data security rules/regulations based classification of file data using TsF-kNN algorithm

Zardari, Munwar Ali; Jung, Low Tang

doi:10.1007/s10586-016-0539-z

Data security rules/regulations based classification of file data using TsF-kNN algorithm

Published: 02 February 2016

Volume 19, pages 349–368, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Munwar Ali Zardari¹ &
Low Tang Jung¹

646 Accesses
10 Citations
Explore all metrics

Abstract

Personal and organizational data are getting larger in volume with respect to time. Due to the importance of data for organisations, effective and efficient management and categorization of data need a special focus. Understanding and applying data security policies to the appropriate data types therefore is one of the core concerns in large organisations such as cloud service providers. With data classification, the identification of security requirements for the data can be accomplished without manual intervention where the encryption process is applied only to the confidential data thus saving encryption time, decryption time, storage and processing power. The proposed data classification approach is to reduce the network traffic, the additional data movement, the overload, and the storage place for confidential data can be decided where security requirements of the confidential data are fulfilled. In this paper, an intelligent data classification approach is presented for predicting the confidentiality/sensitivity level of the data in a file based on the corporate objective and government policies/rules. An enhanced version of the k-NN algorithm is also proposed to reduce the computational complexity of the traditional k-NN algorithm at data classification phase. The proposed algorithm is called Training dataset Filtration-kNN (TsF-kNN). The experimental results show that data in a file can be classified into confidential and non-confidential classes and TsF-kNN algorithm has better performance against the traditional k-NN and Naïve Bayes algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Certz, M., Jajodia, S. (eds.): Security re-egnineering for databases: concepts and techinques. Handbook of Database Security- Applications and Trends. Sringer, USA (2008)
Google Scholar
Chen, K., Guo, S.: PerturBoost: Practical Confidential Classifier Learning in the cloud. In: 13th International Conference on Data Mining, pp. 991–996, (2013)
Chen, K., Guo, S.: RASP-Boost: confidential boosting-model learning with purturbed data in the cloud. IEEE Trans. Cloud Comput. 1(1) (2015)
Chiang, T., Lo, H., Lin, S.D.: A ranking-based KNN approach for multi-label classification. In: Asia Conference on Machine Learning, pp. 81–96 (2012)
Choi, Y.B., Crowgey, R.L., Price, J.M., VanPelt, J.S.: The State-of-the-art of mobile payment architecture and emerging issues. Int. J. Electron. Financ. 1(1), 94–103 (2006)
Article Google Scholar
Chung, M., Gertz, M., Levitt, K.: Demids: a misuse detection system for database systems, Integrity and Internal Control in Information Systems, Strategic views on the Need of Control. IFIP TCII WG11.5 Third working conference on Integrity and Internal Control in Information Systerms, Amsterdam, The Natherland, pp. 159–178. Kluwer Academic Publishers, Norwell, (2000)
Clark, D.L.: The Manager’s Defense Guide. Addison-Wesly, USA (2003)
Fabrizio, A.: Fast condensed nearest neighbor rule, Technical report, In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 25–32, (2005)
Frank, S.: Data Classification for Cloud Readiness, pp. 1–19. Published by, Microsoft (2014)
Gates, G.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18, 431–433 (1972)
Article Google Scholar
Gibbon, D., Moore, R. K., Winski, R. (Eds.) Handbook of Standards and Resources for Spoken Language Systems—Google Books, (1997)
Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)
Article Google Scholar
Hayes, B.: Cloud computing. Commun. ACM 51(5), 9–11 (2008)
Article Google Scholar
He, J., Tan, A., Tan, C.: A Comparative Study on Chinese Text Categorization Methods. In: PRICAI 200 Workshop on Text and Web Mining, pp. 24–35, (2000)
Hosmer, H.: Using fuzzy logic to represent security policies in the multipolicy paradigm. ACM SIGSAC Rev. 10(4), 12–21 (1992)
Article Google Scholar
Hosmer, H.: Using fuzzy logic to represent security policies in the multipolicies paradigm. ACM SIGSAC Rev. 10(4), 12–21 (1992)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (2002)
Jiang, W., Liu, Q.: Dependency Parsing and Projection Based on Word-Pair Classification. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 12–20, (2009)
Kamra, A., Terzi, E., Bertino, E.: Detecting anomalous access patterns in relational databases. VLDB J. 17(5), 1063–1077 (2007)
Article Google Scholar
Kerdprasop, N., Kerdprasop, K.: Discrete decision tree induction to avoud overfiting on categorical. ... and intelligent systems, and 10th WSEAS .... pp. 247–252, (2011)
Krasimir, G., Iankiev, Y., Wu, Venu, G.: Improved k-nearest neighbor classification. Pattern Recogn. 35, 2311–2318 (2002)
Article MATH Google Scholar
Liu, F., Ng, K., W., Zhang, W. Encrypted associatioin rules mining for outsourced data mining. In: 29th International Conference on Advanced Information Networking and Application, pp. 550–557, (2015)
Masoud, M.: Classification of data based on the a fuzzy logic system. In: International Conference on Computational Intelligence for Modelling Control and Automation, pp. 1288–1292, (2008)
Masoud, M., Dimitios, H.: Data classification process for security and privacy based on a fuzzy logic classifier. Int. J. Electron. Financ. 3(4), 374–386 (2009)
Article Google Scholar
Michael, G., Madhavi, G.: Security re-engineering for databases: concepts and techniques, USA. In: Michael, G., Sushil, J. (eds.) Handbook of Database Security, pp. 267–296. Springer, Berlin (2008)
Google Scholar
Randall, W., Tony, R.M.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Article MATH Google Scholar
Samanthula, B.K., Elmehdwi, Y., Jiang, W.: k-nearest neighbor classification over semantically secure encrypted relational data. IEEE Comput. Soc. 27(5), 1261–1273 (2015)
Google Scholar
Spalka, A., Lehndardt, J. A comprehensive approach to anomaly detection in relational databases. In: Data and Applications Security 205, LNCS 3654, Springer, Germany, pp. 207–221, (2005)
Spalka, A., Lehnhardt, J.: A Comprehensive Approach to Anomaly Detection in Relational Databases. Data and Applications Security. Springer, Heidelberg (2005)
Google Scholar
Steve, S., Uwe, B., Oliver, K., Frank, L., Tobias, U.: Cloud data patterns for confidentiality. In: Proceedings of the 2nd International Conference on Cloud Computing and Service Science, pp. 387–394, (2012)
Tan, C., Wang, Y., Lee, C.: The use of bigrams to enhance text categorization. Inf. Process. Manag. 38(4), 529–546 (2002)
Article MATH Google Scholar
Tsuruoka, Y.: Developing a robust part-of-speech tagger for biomedical text. Lecture notes in computer science, vol. 3746, pp. 382–392 (2005)
Wu, Q.H., Nikolaidis, K., Goulermas, J.Y.: A class boundary preserving algorithm for data condensation. Pattern Recogn. 44, 704–715 (2011)
Article MATH Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49, (1999)
Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. In: Procedings of the 10th International Conference on Database Systems for Advanced Applications, pp. 851–862, (2005)
Zhu, Y., Xu, R., Takagi, T. Secure k-NN computation on encrypted cloud data without sharing key with query users. In: Proceedings of the 2013 International Workshop on Security in Cloud Computing—Cloud Computing ’13, pp. 55–60, (2013)

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Malaysia
Munwar Ali Zardari & Low Tang Jung

Authors

Munwar Ali Zardari
View author publications
You can also search for this author in PubMed Google Scholar
Low Tang Jung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Munwar Ali Zardari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zardari, M.A., Jung, L.T. Data security rules/regulations based classification of file data using TsF-kNN algorithm. Cluster Comput 19, 349–368 (2016). https://doi.org/10.1007/s10586-016-0539-z

Download citation

Received: 30 October 2015
Revised: 14 January 2016
Accepted: 16 January 2016
Published: 02 February 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10586-016-0539-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data security rules/regulations based classification of file data using TsF-kNN algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Integrated Methodology of TsF-KNN-Based Automated Data Classification and Security for Mobile Cloud Computing

An Integrated Approach for Big Data Classification and Security Using Optimized Random Forest and DSSE Algorithm

File fragment recognition based on content and statistical features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Data security rules/regulations based classification of file data using TsF-kNN algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Integrated Methodology of TsF-KNN-Based Automated Data Classification and Security for Mobile Cloud Computing

An Integrated Approach for Big Data Classification and Security Using Optimized Random Forest and DSSE Algorithm

File fragment recognition based on content and statistical features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now