Text line extraction in free style document

Xiaolu Shen; Changsong Liu; Xiaoqing Ding; Yanming Zou

doi:10.1117/12.805695

19 January 2009 Text line extraction in free style document

Xiaolu Shen, Changsong Liu, Xiaoqing Ding, Yanming Zou

Proceedings Volume 7247, Document Recognition and Retrieval XVI; 72470L (2009) https://doi.org/10.1117/12.805695
Event: IS&T/SPIE Electronic Imaging, 2009, San Jose, California, United States

Abstract

This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free style document, global property such as character size, line direction can hardly be concluded, which reveals a grave limitation in traditional layout analysis. 'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity function found on gradient information to locate text areas where gradient within a window have large magnitude and various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with cost function based on the probability model.

Citation Download Citation

Xiaolu Shen, Changsong Liu, Xiaoqing Ding, and Yanming Zou "Text line extraction in free style document", Proc. SPIE 7247, Document Recognition and Retrieval XVI, 72470L (19 January 2009); https://doi.org/10.1117/12.805695

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available