Skip to main content
Log in

Improving personal information detection using OCR feature recognition rate

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the recent advancements in information and communication technologies, the creation and storage of documents has become digitalized. Therefore, many documents are stored on computers. Documents containing personal information can be leaked by internal or external malicious acts, and the problem of information loss for individuals and corporations is gradually increasing. This paper proposes a method to more efficiently and quickly identify the existence of personal information among documents stored in image files on personal and corporate computers to prevent their leakage in advance. We improved the efficiency of personal information detection by classifying optical character recognition (OCR) features by recognition rate and deleting redundant ones to increase detection speed. In addition, the detection time was reduced using the reference frequency of the classified OCR features. Experiments confirm an improvement in the performance of the proposed method compared with that of the existing system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ham DS, Lee DR, Choi KU, Oh IS (2010) Printed Hangul recognition with adaptive hierarchical structures depending on 6-types. Int J Contents 10(1):10–18

    Google Scholar 

  2. Katiyar G (2016) A hybrid recognition system for off-line handwritten characters. Hum Centric Comput Inf Sci 5(1):357

    Google Scholar 

  3. Mishra TK, Majhi B, Dash R (2017) A contour descriptors-based generalized scheme for handwritten Odia numerals recognition. J Inf Process Syst 13(1):174–183

    Google Scholar 

  4. Giltae L (2003) A study on machine printed character recognition based on character type classification. J Inst Electron Eng Korea 5(4):26–39

    Google Scholar 

  5. Gerber C, Chung M (2016) Number plate detection with a multi-convolutional neural network approach with optical character recognition for mobile devices. J Inf Process Syst 12(1):100–108

    Google Scholar 

  6. Choi JH, Shin HS, Nasridinov A (2017) A comparative study on data mining classification techniques for military applications. J Converg 7(2):1–10

    Google Scholar 

  7. Youngkyung L (2016) A study on detecting personal information in the image files. Proc Korean Soc Internet Inf 17(2):213–214

    Google Scholar 

Download references

Acknowledgements

This study was supported by a research fund of Chungnam National University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoojae Won.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, Y., Song, J. & Won, Y. Improving personal information detection using OCR feature recognition rate. J Supercomput 75, 1941–1952 (2019). https://doi.org/10.1007/s11227-018-2444-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2444-0

Keywords