LettersA neural network approach for data masking
Introduction
Outsourcing of information technology is not an isolated trend but part of a bigger shift towards the globalization of business processes. As more and more organizations are joining the bandwagon of outsourcing, data privacy is gaining a lot of corporate and media attention. Countries around the world are developing regulations designed to support privacy. The ease at which data can be collected automatically, stored in databases and queried efficiently over the Internet (or otherwise) has paradoxically worsened the privacy situation and has raised numerous ethical and legal concerns. Legislations are being passed insisting that the companies which are outsourcing consumer data to foreign countries must assume responsibility for the data. The recent legislative actions, as well as frequent media reports of security breaches, have made the prevalence of such threats clear [17]. Problems arising from private data falling into malicious hands include identity theft, stalking on the web, spam, etc. [1].
A worldwide movement towards data privacy legislation has put pressure on organizations to improve their information privacy and security standards. Data privacy research indicates that more than 70% of all security incidents come from internal threats. Moreover, data breaches coming from the inside and the associated costs of such internal breaches are more than 50 times as costly when compared with external breaches [18]. Test and development teams are restricted from viewing any security sensitive information. However, this information could be structurally correct functional copies of original databases. The actual content of databases is irrelevant for test and development teams, as long as the data looks real [11]. Names, addresses, phone numbers, e-mail addresses, credit card details are good examples of this kind of data. The requirements for data masking are greatly dependent on country, environment, etc. Many countries have their own legal regulations of some form or other.
This work provides a comprehensive overview of the issues that are involved in data masking and present a method to adaptively mask the data in the databases while retaining the same look-and-feel of the original data. The letter is organized as follows. In Section 2, the fundamentals of the data masking along with various data sanitization techniques are discussed. Section 3 provides a list of limitations of traditional data masking techniques. Section 4 presents the proposed data masking technique. Some conclusions are drawn in Section 5.
Section snippets
Fundamentals of data masking
Data masking is a process whereby the information in a database is masked or ‘de-identified’. It enables the creation of realistic data in non-production environments without the risk of exposing sensitive information to unauthorized users. We now discuss various data masking techniques [11], [2] with their individual benefits and drawbacks.
NULL’ing out is a technique where a column is simply deleted or is replaced by NULL values. Even though this is the most effective way of data masking, this
Limitations of data masking techniques
Among the available variety of software tools, solutions and systems implementing data masking techniques, most of them have some major drawbacks [12] which include:
Relevant data: Resemble the look-and-feel of the original information.
Row-internal data synchronization: In many cases the contents of one column in a row are related in some way to the contents of the other columns in the same row. For example, if the gender field in the row is then the FIRST_NAME column should really have a female
Data masking using neural network
A neural network is a massively parallel distributed processor made up of simple processing units that has a natural propensity for storing experiential knowledge and making it available for use [21], [8], [22], [7]. In general, neural networks can have any number of layers, and any number of nodes per layer. Most applications use the three layer (input, hidden and output) structure with a few hundred input nodes. The hidden layer is usually about 10% the size of the input layer. The number of
Selected results
To show the applicability of this methodology to data masking task an example is presented in this section. Because bi-directional associative memory based neural network is highly suitable for learning and recall, the same is used here. Such a neural network can learn the properties of the database based on back-propagation algorithm and the same neural network can be made adaptive for future databases [20]. The implementation is done using Fast Artificial Neural Network Library (FANN) [14].
Benefits of adaptive data masking
The present approach addresses the different data privacy issues with the help of neural networks. Any new privacy policy can be implemented with an effective representation of the problem corresponding to the network. Moreover, this method is devoid of dictionaries and is self-learning from the data provided by the databases, thus making the whole process more operational efficient in terms of performance. The neural network learns in both vertical (for a single database provided) and
Conclusions
In this letter, we have proposed a new adaptive data masking technique based on neural networks. To overcome the limitations of the existing techniques, ADM is discussed based on the intelligence retaining capability of neural networks. As a result, ADM not only retains the original look-and-feel but also effectively avoids the usage of dictionaries making it more performance effective. It is worth to note that, although neural networks play a major role in the suggested masking technique in
Vishal Anjaiah Gujjary is working as Research Associate at SETLabs, Infosys Technologies Ltd., Hyderabad, India, and received his Bachelors with Honors from IIIT, Hyderabad in Computer Science and Engineering in 2007. Currently he is working in the area of speaker recognition and its applications including using speech as digital voice signature. He is also associated with development of IP corresponding to data masking and vulnerability scanners. His research interests include biometrics,
References (22)
- W. Aiello, A. Broder, J. Janssen, E. Milios, Modelling and mining of networked information spaces, in: Lecture Notes in...
- et al.
Data obfuscation: anonymity and desensitization of usable data sets
IEEE Security and Privacy
(2004) - Plato Consulting. Camouflage—a data masking software, plato group...
- et al.
Building adaptive systems a component based approach to building neural networks
- et al.
Neural Networks: Algorithms, Applications, and Programming Techniques
(1991) Neural Networks in Computer Intelligence
(1994)An Introduction to Neural Networks
(1997)Neural Networks: A Comprehensive Foundation
(2008)Bidirectional associative memories
IEEE Transactions on Systems, Man, and Cybernetics
(1988)- et al.
An Introduction to Neural Networks
(1996)
Cited by (13)
Dynamic and Private Recommendation System
2023, 2023 International Conference on Digital Applications, Transformation and Economy, ICDATE 2023Is the Privacy Waybill Really Invisible? A Study of Courier Industry's Personal Information Protection
2022, Journal of Library and Information Science in AgricultureData masking model for heterogeneous big data environment
2022, Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and AstronauticsPhishing URL Identification Using Machine Learning, Ensemble Learning and Deep Learning Techniques
2022, Smart Innovation, Systems and TechnologiesBotnet Attack Classification with Deep Learning Models
2022, Smart Innovation, Systems and Technologies
Vishal Anjaiah Gujjary is working as Research Associate at SETLabs, Infosys Technologies Ltd., Hyderabad, India, and received his Bachelors with Honors from IIIT, Hyderabad in Computer Science and Engineering in 2007. Currently he is working in the area of speaker recognition and its applications including using speech as digital voice signature. He is also associated with development of IP corresponding to data masking and vulnerability scanners. His research interests include biometrics, speech and signal processing and pattern recognition.
Ashutosh Saxena is a Principal Researcher at SETLabs, Infosys Technologies Ltd., Hyderabad, India, and received his M.Sc. (1990), M.Tech. (1992) and Ph.D. in Computer Science (1999). The Indian government awarded him the post-doctorate BOYSCAST Fellowship in 2002 to research on ‘Security Framework for E-Commerce’ at ISRC, QUT, Brisbane, Australia. He is on the Reviewing Committees of various international journals and conferences. He has authored the book entitled PKI—Concepts, Design and Deployment, published by Tata McGraw-Hill, and also co-authored more than 70 research papers. His research interests are in the areas of authentication technologies, smart cards, key management and security assurance.