Abstract
Today’s data owners usually resort to data anonymization tools to ease their privacy and confidentiality concerns. However, those tools are typically ready-made and inflexible, leaving a gap both between the data owner and data users’ requirements, and between those requirements and a tool’s anonymization capabilities. In this paper, we propose an interactive customizable anonymization tool, namely iCAT, to bridge the aforementioned gaps. To this end, we first define the novel concept of anonymization space to model all combinations of per-attribute anonymization primitives based on their levels of privacy and utility. Second, we leverage NLP and ontology modeling to provide an automated way to translate data owners and data users’ textual requirements into appropriate anonymization primitives. Finally, we implement iCAT and evaluate its efficiency and effectiveness with both real and synthetic network data, and we assess the usability through a user-based study involving participants from industry and research laboratories. Our experiments show an effectiveness of about 96.5% for data owners and 92.6% for data users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
This list is not meant to be exhaustive, and our model and methodology can be extended to include other anonymization primitives.
References
Rieck, K.: Pseudonymizer for solaris audit trails (2018). http://www.mlsec.org/bsmpseu/bsmpseu.1
Assila, A., Ezzedine, H., et al.: Standardized usability questionnaires: features and quality focus. Electron. J. Comput. Sci. Inf. Technol. eJCIST 6(1), 15–31 (2016)
Bell, E.D., La Padula, J.L.: Secure computer system: unified exposition and multics interpretation (1976)
Denning, D.E.: A lattice model of secure information flow. Commun. ACM 19(5), 236–243 (1976)
Donnellan, T.: Lattice Theory. Pergamon Press, Oxford (1968)
Kohler, E.: Ipsumdump tool (2015). https://read.seas.harvard.edu/~kohler/ipsumdump/
Blanton, E.: Tcpurify tool (2019). https://web.archive.org/web/20140203210616/irg.cs.ohiou.edu/~eblanton/tcpurify/
Foukarakis, M., Antoniades, D., Antonatos, S., Markatos, E.P.: Flexible and high-performance anonymization of NetFlow records using anontool. In: Third International Conference on Security and Privacy in Communications Networks and the Workshops, SecureComm 2007, pp. 33–38. IEEE (2007)
Gringoli, F.: TCPanon tool (2019). http://netweb.ing.unibs.it/~ntw/tools/tcpanon/
Google: Traces from requests processed by Google cluster management system (2019). https://github.com/google/cluster-data
Greg Minshall of Ipsilon Networks: Tcpdpriv (2005). http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html
Haag, P.: Nfdump (2010). World Wide Web. http://nfdump.sourceforge.net
IMPREVA: Camouflage data masking (2018). https://www.imperva.com/products/data-security/data-masking/
Kayaalp, M., Sagan, P., Browne, A.C., McDonald, C.J.: NLM-scrubber (2018). https://scrubber.nlm.nih.gov/files/
Li, Y., Slagell, A., Luo, K., Yurcik, W.: CANINE: a combined conversion and anonymization tool for processing netflows for security. In: International Conference on Telecommunication Systems Modeling and Analysis, vol. 21 (2005)
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Moore, D., Keys, K., Koga, R., Lagache, E., Claffy, K.C.: The CoralReef software suite as a tool for system and network administrators. In: Proceedings of the 15th USENIX Conference on System Administration, pp. 133–144. USENIX Association (2001)
Pang, R., Allman, M., Paxson, V., Lee, J.: The devil and packet trace anonymization. ACM SIGCOMM Comput. Commun. Rev. 36(1), 29–38 (2006)
Rules for the protection of personal data inside and outside the EU. Gdpr (2018). https://ec.europa.eu/info/law/law-topic/data-protection_en
Sandhu, R.S.: Lattice-based access control models. Computer 26(11), 9–19 (1993)
Slagell, A.J., Lakkaraju, K., Luo, K.: FLAIM: a multi-level anonymization framework for computer and network logs. LISA 6, 3–8 (2006)
Sys4 Consults: A generic log anonymizer (2018). https://github.com/sys4/loganon
UCIMLR: Burst Header Packet flooding attack on Optical Burst Switching Network Data Set (2019). https://archive.ics.uci.edu/ml/datasets/
Yurcik, W., Woolam, C., Hellings, G., Khan, L., Thuraisingham, B.: SCRUB-tcpdump: a multi-level packet anonymizer demonstrating privacy/analysis tradeoffs. In: 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, pp. 49–56. IEEE (2007)
Acknowledgment
The authors thank the anonymous reviewers for their valuable comments. This work is partially supported by the Natural Sciences and Engineering Research Council of Canada and Ericsson Canada under CRD Grant N01823 and by PROMPT Quebec.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendix
Appendix
The following details each module of iCAT as shown in Fig. 6B.
(A) Data Loading and Processing (DLP). This module is used to load the data, and enables filtering and cleansing operations. This module consists of following sub-modules:
Data Processing: This sub-module enables performing data pre-processing and adjustment operations. It can also automatically detect all data attributes and their types, which are needed by the Anonymization Space Manager to build the anonymization space lattice.
Data Filtering: This sub-module deploys several algorithms that can be automatically and manually used to filter and remove records from data (e.g., column deletion, row deletion, searched deletion and frequency deletion).
(B) Requirements Interpreter (RI). This module translates the data owner’s and data user’s requirements into data attributes types and anonymization primitives. It consists of the following three sub-modules:
Requirements Parser: It takes the English statement and transforms them into a set of requirements using the Stanford CoreNLP. Then, it processes and filters those requirements using the POS tool.
Requirements Mapper: This sub-module takes the parsed requirements and communicates with the Method-Ontology and the Type-Ontology databases in order to map each requirement into the related attribute type and then the corresponding anonymization primitives.
Ambiguity Solver: This sub-module is mainly responsible of communicating with the user (i.e. data owner or data user) through the Interactive Communicator (IC) sub-module in order to solve any ambiguity that occurs at the Requirement Mapper sub-module.
(C) iCAT Manager.
Identity Access Management and Permission Granter (IPG): This module associates the data user identity with the privacy-level specified by the data owner, which is needed to determine the anonymization sub-space assigned to him based on privacy-up principle.
Interactive Communicator: This sub-module is mainly responsible for interacting with the data owner or data user and handles the communications between them and the RI module.
I/O Manager: This module is responsible for configuring the data source from where the data is fetched (e.g. from a file system or a database) and the loading of the actual data to be anonymized.
(D) Anonymization Space Manager. This module is mainly responsible of generating the anonymization space and implementing the access control mechanism over the anonymization space for the data user. This module consists of the following sub-modules:
Anonymization Space Builder (ASB): This sub-module automatically builds the entire anonymization space, which consists of all available combination of anonymization primitives for each data attribute based on its type. Building the anonymization space lattice is detailed in Sect. 2.3. The resulting anonymization-space lattice will be stored in the Access Control database.
Anonymization Controller: This module implements the access control mechanism over the anonymization space for the data user. It receives the utility-level from the data user and perform an intersection/masking operation between the privacy level and utility level in order to determine the allowed combinations of anonymization primitives. It also ensures that the Data Anonymizer only accesses the allowed anonymization primitives for the user.
(E) Data Anonymizer. This module is mainly responsible for anonymizing the data with the respect to the trust-level assigned to the users. It is designed in a building-blocks manner such that if there exist new or more efficient anonymization primitives they can be easily integrated into iCAT. This module holds the following sub-modules:
Anonymization Primitives: This sub-module holds the implementation of all existing anonymization algorithms corresponding to the 12 anonymization primitives discussed in Sect. 2.
Anonymization Mapper: This sub-module is responsible of creating a mapping file that maps the plain-text data into their anonymized values for later recognition purposes (e.g., if hashing is used to anonymize IP addresses, a file contains the original IP addresses and their hashes are created).
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Oqaily, M., Jarraya, Y., Zhang, M., Wang, L., Pourzandi, M., Debbabi, M. (2019). iCAT: An Interactive Customizable Anonymization Tool. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11735. Springer, Cham. https://doi.org/10.1007/978-3-030-29959-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-29959-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29958-3
Online ISBN: 978-3-030-29959-0
eBook Packages: Computer ScienceComputer Science (R0)