skip to main content
10.1145/3581807.3581885acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

An Unstructured Data Desensitization Approach for Futures Industry

Published: 22 May 2023 Publication History

Abstract

The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.

References

[1]
Filipowicz, A., Chanyaswad, T., and Kung, S. Y., “Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning”, https://doi.org/10.48550/arXiv.1707.07770.
[2]
Bashivan, P., “Adversarial Feature Desensitization”, https://doi.org/10.48550/arXiv.2006.04621, 2020.
[3]
Xiang, Nan “High-End Equipment Data Desensitization Method Based on Improved Stackelberg GAN.” Expert systems with applications 180 (2021): 114989–. Web.
[4]
Tang, Zhenying “A Data Desensitization Algorithm for Privacy Protection Electric Power Industry.” IOP conference series. Materials Science and Engineering 768.5 (2020): 52059–. Web.
[5]
Al, M., Wan, S., and Kung, S.-Y., “Ratio Utility and Cost Analysis for Privacy Preserving Subspace Projection”, arXiv:1702.07976, 2017.
[6]
Widodo, E. K. Budiardjo, W. C. Wibowo and H. T. Y. Achsan, "An Approach for Distributing Sensitive Values in k-Anonymity," 2019 International Workshop on Big Data and Information Security (IWBIS), Bali, Indonesia, 2019, pp. 109-114.
[7]
SWEENEY, LATANYA. k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(05):557-570.
[8]
Huang Lingyu, Cong Zhongfang, Zhao Cheng, Important data protection in power big data application construction [C] / / 2018 smart grid information construction seminar.
[9]
National Center for Biotechnology Information. "PubChem Patent Summary for CN-105975870-A, Data desensitization method and system" PubChem, https://pubchem.ncbi.nlm.nih.gov/patent/CN-105975870-A. Accessed 21 July, 2022.
[10]
Zhang, 2021, X. Zhang, J. Ding, M. Wu, S.T. Wong, H. Van Nguyen, M. Pan, Adaptive privacy preserving deep learning algorithms for medical data, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021), pp. 1169-1178
[11]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, arXiv:1810.04805, 2018.

Index Terms

  1. An Unstructured Data Desensitization Approach for Futures Industry

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
    November 2022
    683 pages
    ISBN:9781450397056
    DOI:10.1145/3581807
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCPR 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 32
      Total Downloads
    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media