Stop word location and identification for adaptive text recognition

Ho, Tin Kam

doi:10.1007/PL00013551

Stop word location and identification for adaptive text recognition

Original papers
Published: August 2000

Volume 3, pages 16–26, (2000)
Cite this article

International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

Tin Kam Ho¹

108 Accesses
7 Citations
Explore all metrics

Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the most likely candidates for those words, using only widths of the word images. The identity of each word is determined using a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on 90% of all the pages. These can serve as useful seeds to bootstrap font learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

Bell Laboratories, Lucent Technologies, 700 Mountain Avenue, 2C-425, Murray Hill, NJ 07974, USA; E-mail: tkh@bell-labs.com , , , , , , US
Tin Kam Ho

Authors

Tin Kam Ho
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

Received October 8, 1999 / Revised March 29, 2000

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ho, T. Stop word location and identification for adaptive text recognition. IJDAR 3, 16–26 (2000). https://doi.org/10.1007/PL00013551

Download citation

Issue Date: August 2000
DOI: https://doi.org/10.1007/PL00013551

Key words: OCR – Word recognition – Font learning – Keyword spotting – Adaptive recognition

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stop word location and identification for adaptive text recognition

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Representative Image Selection for Data Efficient Word Spotting

Adaptive Text Recognition Through Visual Matching

Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Stop word location and identification for adaptive text recognition

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Representative Image Selection for Data Efficient Word Spotting

Adaptive Text Recognition Through Visual Matching

Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now