Skip to main content

Keyword Spotting on Korean Document Images by Matching the Keyword Image

  • Conference paper
Digital Libraries: Implementing Strategies and Sharing Experiences (ICADL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3815))

Included in the following conference series:

Abstract

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ohta, M., Takasu, A., Adach, J.: Retrieval methods for English-text width missrecognized OCR characters. In: Proceedings of 4th International Conference on Document Analysis and Recognition, vol. 2, pp. 950–955 (1997)

    Google Scholar 

  2. Marukawa, K., Hu, T., Fujisawa, H., Shima, Y.: Document retrieval tolerating character recognition errors-evaluation and application. Pattern Recognition 30(8), 1361–1371 (1997)

    Article  Google Scholar 

  3. Doermann, D.: The indexing and retrieval of document images: a survey. Computer Vision and Image Understanding 70(3), 287–298 (1998)

    Article  Google Scholar 

  4. Chen, F., Wilcox, L., Bloomberg, D.: Word spotting in scanned images using hidden Markov models. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–4 (1993)

    Google Scholar 

  5. Lu, Y., Tan, C.L.: Word searching in document images using word portion matching. In: Fifth IAPR International Workshop on Document Analysis Systems, USA, pp. 319–328 (2002)

    Google Scholar 

  6. Lu, Y., Zhang, L., Tan, C.L.: A search engine for imaged documents in PDF files. In: 27th Annual International ACM SIGIR Conference, UK (2004)

    Google Scholar 

  7. DeCurtins, J., Chen, E.: Keyword spotting via word shape recognition. In: Proc. SPIE Document Recognition II, pp. 270–277 (1995)

    Google Scholar 

  8. Chen, F.R., Wilcox, L.D., Bloomberg, D.S.: A comparison of discrete and continuous hidden Markov models for phrase spotting in text images. In: Proc. International Conference on Document Analysis and Recognition, vol. 1, pp. 398–402 (1995)

    Google Scholar 

  9. Tan, C.L., Huang, W., Yu, Z., Xu, Y.: Image document text retrieval without OCR. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(7), 838–844 (2002)

    Google Scholar 

  10. Kim, H.G., Yang, J.H., Lee, J.S., Oh, I.S.: Image-based retrieval of printed Korean words using wavelets. Journal of Korea Information Science Society 28(2), 91–103 (2001)

    Google Scholar 

  11. Oh, I.S., Choi, Y.S., Yang, J.H., Kim, S.H.: A Keyword spotting system of Korean document images. In: Proc. 5th International Conference on Asian Digital Libraries, Singapore, p. 530 (2002)

    Google Scholar 

  12. Kwag, H.K.: A Study on Word Segmentation and Attribute Extraction from Document Images, Ph.D. dissertation, Chonnam National University, Korea (2001)

    Google Scholar 

  13. Jeong, C.B., Kim, S.H.: A document image preprocessing system for keyword spotting. In: Proc. International Conference on Asian Digital Libraries, China, pp. 440–443 (December 2004)

    Google Scholar 

  14. http://www.perceptcom.com/

  15. Yates, R.B., Neto, B.R.: Modern Information Retrieval, pp. 75–82. ACM press, New York (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, S.H., Park, S.C., Jeong, C.B., Kim, J.S., Park, H.R., Lee, G.S. (2005). Keyword Spotting on Korean Document Images by Matching the Keyword Image. In: Fox, E.A., Neuhold, E.J., Premsmit, P., Wuwongse, V. (eds) Digital Libraries: Implementing Strategies and Sharing Experiences. ICADL 2005. Lecture Notes in Computer Science, vol 3815. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11599517_18

Download citation

  • DOI: https://doi.org/10.1007/11599517_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30850-8

  • Online ISBN: 978-3-540-32291-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics