Elsevier

Neurocomputing

Volume 74, Issue 9, April 2011, Pages 1497-1501
Neurocomputing

Letters
A neural network approach for data masking

https://doi.org/10.1016/j.neucom.2011.01.002Get rights and content

Abstract

In this letter we present a neural network based data masking solution, in which the database information remains internally consistent yet is not inadvertently exposed in an interpretable state. The system differs from the classic data masking in the sense that it can understand the semantics of the original data and mask it using a neural network which is a priori trained by some rules. Our adaptive data masking (ADM) concentrates on data masking techniques such as shuffling, substitution, masking and number variance in an intelligent fashion with the help of adaptive neural network. The very nature of being adaptive makes data masking easier and content agnostic, and thus finds place in various vertical domains and systems.

Introduction

Outsourcing of information technology is not an isolated trend but part of a bigger shift towards the globalization of business processes. As more and more organizations are joining the bandwagon of outsourcing, data privacy is gaining a lot of corporate and media attention. Countries around the world are developing regulations designed to support privacy. The ease at which data can be collected automatically, stored in databases and queried efficiently over the Internet (or otherwise) has paradoxically worsened the privacy situation and has raised numerous ethical and legal concerns. Legislations are being passed insisting that the companies which are outsourcing consumer data to foreign countries must assume responsibility for the data. The recent legislative actions, as well as frequent media reports of security breaches, have made the prevalence of such threats clear [17]. Problems arising from private data falling into malicious hands include identity theft, stalking on the web, spam, etc. [1].

A worldwide movement towards data privacy legislation has put pressure on organizations to improve their information privacy and security standards. Data privacy research indicates that more than 70% of all security incidents come from internal threats. Moreover, data breaches coming from the inside and the associated costs of such internal breaches are more than 50 times as costly when compared with external breaches [18]. Test and development teams are restricted from viewing any security sensitive information. However, this information could be structurally correct functional copies of original databases. The actual content of databases is irrelevant for test and development teams, as long as the data looks real [11]. Names, addresses, phone numbers, e-mail addresses, credit card details are good examples of this kind of data. The requirements for data masking are greatly dependent on country, environment, etc. Many countries have their own legal regulations of some form or other.

This work provides a comprehensive overview of the issues that are involved in data masking and present a method to adaptively mask the data in the databases while retaining the same look-and-feel of the original data. The letter is organized as follows. In Section 2, the fundamentals of the data masking along with various data sanitization techniques are discussed. Section 3 provides a list of limitations of traditional data masking techniques. Section 4 presents the proposed data masking technique. Some conclusions are drawn in Section 5.

Section snippets

Fundamentals of data masking

Data masking is a process whereby the information in a database is masked or ‘de-identified’. It enables the creation of realistic data in non-production environments without the risk of exposing sensitive information to unauthorized users. We now discuss various data masking techniques [11], [2] with their individual benefits and drawbacks.

NULL’ing out is a technique where a column is simply deleted or is replaced by NULL values. Even though this is the most effective way of data masking, this

Limitations of data masking techniques

Among the available variety of software tools, solutions and systems implementing data masking techniques, most of them have some major drawbacks [12] which include:

Relevant data: Resemble the look-and-feel of the original information.

Row-internal data synchronization: In many cases the contents of one column in a row are related in some way to the contents of the other columns in the same row. For example, if the gender field in the row is then the FIRST_NAME column should really have a female

Data masking using neural network

A neural network is a massively parallel distributed processor made up of simple processing units that has a natural propensity for storing experiential knowledge and making it available for use [21], [8], [22], [7]. In general, neural networks can have any number of layers, and any number of nodes per layer. Most applications use the three layer (input, hidden and output) structure with a few hundred input nodes. The hidden layer is usually about 10% the size of the input layer. The number of

Selected results

To show the applicability of this methodology to data masking task an example is presented in this section. Because bi-directional associative memory based neural network is highly suitable for learning and recall, the same is used here. Such a neural network can learn the properties of the database based on back-propagation algorithm and the same neural network can be made adaptive for future databases [20]. The implementation is done using Fast Artificial Neural Network Library (FANN) [14].

Benefits of adaptive data masking

The present approach addresses the different data privacy issues with the help of neural networks. Any new privacy policy can be implemented with an effective representation of the problem corresponding to the network. Moreover, this method is devoid of dictionaries and is self-learning from the data provided by the databases, thus making the whole process more operational efficient in terms of performance. The neural network learns in both vertical (for a single database provided) and

Conclusions

In this letter, we have proposed a new adaptive data masking technique based on neural networks. To overcome the limitations of the existing techniques, ADM is discussed based on the intelligence retaining capability of neural networks. As a result, ADM not only retains the original look-and-feel but also effectively avoids the usage of dictionaries making it more performance effective. It is worth to note that, although neural networks play a major role in the suggested masking technique in

Vishal Anjaiah Gujjary is working as Research Associate at SETLabs, Infosys Technologies Ltd., Hyderabad, India, and received his Bachelors with Honors from IIIT, Hyderabad in Computer Science and Engineering in 2007. Currently he is working in the area of speaker recognition and its applications including using speech as digital voice signature. He is also associated with development of IP corresponding to data masking and vulnerability scanners. His research interests include biometrics,

References (22)

  • W. Aiello, A. Broder, J. Janssen, E. Milios, Modelling and mining of networked information spaces, in: Lecture Notes in...
  • D.E. Bakken et al.

    Data obfuscation: anonymity and desensitization of usable data sets

    IEEE Security and Privacy

    (2004)
  • Plato Consulting. Camouflage—a data masking software, plato group...
  • T.L. Crnkovic-Dodig et al.

    Building adaptive systems a component based approach to building neural networks

  • J.A. Freeman et al.

    Neural Networks: Algorithms, Applications, and Programming Techniques

    (1991)
  • L.M. Fu

    Neural Networks in Computer Intelligence

    (1994)
  • K. Gurney

    An Introduction to Neural Networks

    (1997)
  • S. Haykin

    Neural Networks: A Comprehensive Foundation

    (2008)
  • B. Kosko

    Bidirectional associative memories

    IEEE Transactions on Systems, Man, and Cybernetics

    (1988)
  • B. Krose et al.

    An Introduction to Neural Networks

    (1996)
  • Net 2000 Ltd. Data sanitization techniques...
  • Cited by (13)

    View all citing articles on Scopus

    Vishal Anjaiah Gujjary is working as Research Associate at SETLabs, Infosys Technologies Ltd., Hyderabad, India, and received his Bachelors with Honors from IIIT, Hyderabad in Computer Science and Engineering in 2007. Currently he is working in the area of speaker recognition and its applications including using speech as digital voice signature. He is also associated with development of IP corresponding to data masking and vulnerability scanners. His research interests include biometrics, speech and signal processing and pattern recognition.

    Ashutosh Saxena is a Principal Researcher at SETLabs, Infosys Technologies Ltd., Hyderabad, India, and received his M.Sc. (1990), M.Tech. (1992) and Ph.D. in Computer Science (1999). The Indian government awarded him the post-doctorate BOYSCAST Fellowship in 2002 to research on ‘Security Framework for E-Commerce’ at ISRC, QUT, Brisbane, Australia. He is on the Reviewing Committees of various international journals and conferences. He has authored the book entitled PKI—Concepts, Design and Deployment, published by Tata McGraw-Hill, and also co-authored more than 70 research papers. His research interests are in the areas of authentication technologies, smart cards, key management and security assurance.

    View full text