Abstract
The practice of releasing individual data, usually in tabular form, is obligated to prevent privacy leakage. With rendered privacy risks, visualization techniques have greatly prompted the user-friendly data sanitization process. Yet, we point out, for the first time, the attribute order (i.e., schema) of tabular data inherently determines the risk situation and the output utility, while is ignored in previous efforts. To mitigate this gap, this work proposes the design and pipeline of a visual tool (TPA, Tabular Privacy Assistant) for nuanced privacy analysis and preservation on order-dynamic tabular data. By adapting data cube structure as the flexible backbone, TPA manages to support real-time risk analysis in response to attribute order adjustment. Novel visual designs, i.e., data abstract, risk tree, integrated privacy enhancement, are developed to explore data correlations and acquire privacy awareness. We demonstrate TPA’s effectiveness with a case study on the prototype and qualitatively discuss the pros and cons with domain experts for future improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use data holder and user interchangeably.
References
Abay, N.C., Zhou, Y., Kantarcioglu, M., Thuraisingham, B., Sweeney, L.: Privacy preserving synthetic data release using deep learning. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 510–526. Springer International Publishing, Cham (2019)
Abowd, J.M., Vilhuber, L.: How protective are synthetic data? In: Domingo-Ferrer, J., Saygın, Y. (eds.) Privacy in Statistical Databases, pp. 239–246. Springer, Berlin Heidelberg, Berlin, Heidelberg (2008)
Bhattacharjee, K., Chen, M., Dasgupta, A.: Privacy-preserving data visualization: reflections on the state of the art and research opportunities. In: Computer Graphics Forum. vol. 39, pp. 675–692. Wiley Online Library (2020)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)
Caraux, G., Pinloche, S.: Permutmatrix: a graphical environment to arrange gene expression profiles in optimal linear order. Bioinformatics 21(7), 1280–1281 (2005)
Chou, J.K., Bryan, C., Ma, K.L.: Privacy preserving visualization for social network data with ontology information. In: 2017 IEEE Pacific Visualization Symposium (PacificVis), pp. 11–20. IEEE (2017)
Chou, J.K., Wang, Y., Ma, K.L.: Privacy preserving visualization: a study on event sequence data. In: Computer Graphics Forum. vol. 38, pp. 340–355. Wiley Online Library (2019)
Dasgupta, A., Kosara, R., Chen, M.: Guess me if you can: A visual uncertainty model for transparent evaluation of disclosure risks in privacy-preserving data visualization. In: 2019 IEEE Symposium on Visualization for Cyber Security (VizSec), pp. 1–10. IEEE (2019)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) Theory and Applications of Models of Computation, pp. 1–19. Springer, Berlin Heidelberg, Berlin, Heidelberg (2008)
Elliot, M., Hundepool, A., Nordholt, E.S., Tambay, J.L., Wende, T.: Glossary on statistical disclosure control. In: Monograph on Official Statistics, pp. 381–392. Eurostat (2006)
Fernandez, N.F., et al.: Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific data 4(1), 1–12 (2017)
Furmanova, K., et al.: Taggle: combining overview and details in tabular data visualizations. Inf. Vis. 19(2), 114–136 (2020)
Furmanova, K., et al.: Taggle: Scaling table visualization through aggregation. In: Poster@ IEEE Conference on Information Visualization (InfoVis’ 17), p. 139 (2017)
Gray, J., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Disc. 1(1), 29–53 (1997)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 25–25. IEEE (2006)
Li, B., Erdin, E., Gunes, M.H., Bebis, G., Shipley, T.: An overview of anonymity technology usage. Comput. Commun. 36(12), 1269–1283 (2013)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007)
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526 (2009)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 3-es (2007)
Massey, F.J., Jr.: The kolmogorov-smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)
de Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3(1), 1376 (2013)
Pytlak, K.: Personal key indicators of heart disease. https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease/metadata (2022)
Rajabiyazdi, F., Perin, C., Oehlberg, L., Carpendale, S.: Exploring the design of patient-generated data visualizations. In: Proceedings of Graphics Interface 2020, pp. 362–373. GI 2020 (2020)
Rao, R., Card, S.K.: The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 318–322 (1994)
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40(2), 99–121 (2000)
Seo, J., Shneiderman, B.: Interactively exploring hierarchical clustering results [gene identification]. Computer 35(7), 80–86 (2002)
Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data-anonymisation groundhog day. arXiv preprint arXiv:2011.07018 (2021)
Sweeney, L.: Simple demographics often identify people uniquely (2000)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Internat. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 571–588 (2002)
Sweeney, L.: k-anonymity: a model for protecting privacy. Internat. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Thaker, P., Budiu, M., Gopalan, P., Wieder, U., Zaharia, M.: Overlook: Differentially private exploratory visualization for big data. arXiv preprint arXiv:2006.12018 (2020)
Wang, X., et al.: Graphprotector: a visual interface for employing and assessing multiple privacy preserving graph algorithms. IEEE Trans. Visual Comput. Graph. 25(1), 193–203 (2018)
Wang, X., et al.: A utility-aware visual approach for anonymizing multi-attribute tabular data. IEEE Trans. Visual Comput. Graph. 24(1), 351–360 (2017)
Wu, F.T.: Defining privacy and utility in data sets. U. Colo. L. Rev. 84, 1117 (2013)
Xiao, F., et al.: An information-aware visualization for privacy-preserving accelerometer data sharing. HCIS 8(1), 1–28 (2018). https://doi.org/10.1186/s13673-018-0137-6
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional gan. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Zhang, D., Sarvghad, A., Miklau, G.: Investigating visual analysis of differentially private data. IEEE Trans. Visual Comput. Graph. 27(2), 1786–1796 (2020)
Acknowledgment
This work is supported by National Natural Science Foundation of China (62172155, 62072465).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Liang, F., Liu, F., Zhou, T. (2022). A Visual Tool for Interactively Privacy Analysis and Preservation on Order-Dynamic Tabular Data. In: Gao, H., Wang, X., Wei, W., Dagiuklas, T. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 461. Springer, Cham. https://doi.org/10.1007/978-3-031-24386-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-24386-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24385-1
Online ISBN: 978-3-031-24386-8
eBook Packages: Computer ScienceComputer Science (R0)