skip to main content
10.1145/2644866.2644876acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Ruling analysis and classification of torn documents

Published: 16 September 2014 Publication History

Abstract

A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.

References

[1]
Wael Abd-Almageed, Jayant Kumar, and David S. Doermann. Page Rule-Line Removal Using Linear Subspaces in Monochromatic Handwritten Arabic Documents. In Proceedings of the 10th International Conference on Document Analysis and Recognition, pages 768--772. IEEE Computer Society, 2009.
[2]
George B. Arfken, Hans J. Weber, and Frank E. Harris. Mathematical Methods for Physicists, Sixth Edition: A Comprehensive Guide. Academic Press, 6 edition, July 2005.
[3]
Jin Chen and Daniel P. Lopresti. Model-based ruling line detection in noisy handwritten documents. Pattern Recognition Letters, 35:34--45, 2014.
[4]
Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995.
[5]
Franklin C. Crow. Summed-area Tables for Texture Mapping. SIGGRAPH Comput. Graph., 18(3):207--212, January 1984.
[6]
Markus Diem, Florian Kleber, Stefan Fiel, and Robert Sablatnig. Semi-Automated Document Image Clustering and Retrie. In Document Recognition and Retrieval, 2014.
[7]
Florian Kleber, Markus Diem, and Robert Sablatnig. Scale Space Binarization Using Edge Information Weighted by a Foreground Estimation. In Proceedings of the 11th International Conference on Document Analysis and Reconstruction (ICDAR 2011), pages 854--858, Beijing, China, 2011. IEEE Computer Society CPS.
[8]
Florian Kleber, Stefan Fiel, Markus Diem, and Robert Sablatnig. CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer Identification and Word Spotting. In Proceedings of the 12th International Conference on Document Analysis and Recognition, pages 560--564, 2013.
[9]
Daniel P. Lopresti and Ergina Kavallieratou. Ruling Line Removal in Handwritten Page Images. In Proceedings of the 20th International Conference on Pattern Recognition, pages 2704--2707, 2010.
[10]
Ana Rebelo and Jaime S. Cardoso. Staff Line Detection and Removal in the Grayscale Domain. In Proceedings of the 12th International Conference on Document Analysis and Recognition, pages 57--61, 2013.
[11]
Jan Schneider and Bertram Nickolay. The Stasi puzzle. Fraunhofer Magazine, Special Issue, 1:32--33, 2008.
[12]
Nikolaos Stamatopoulos, Basilis Gatos, Georgios Louloudis, Umapada Pal, and Alireza Alaei. ICDAR 2013 Handwriting Segmentation Contest. In Proceedings of the 12th International Conference on Document Analysis and Recognition, pages 1402--1406, 2013.
[13]
Vladimir Vapnik. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1982.
[14]
Vladimir Vapnik. Estimation of Dependences Based on Empirical Data. Springer Science and Business Media, Inc., New York, USA, 2006.
[15]
Paul A. Viola and Michael J. Jones. Robust Real-Time Face Detection. International Journal of Computer Vision, 57(2):137--154, 2004.
[16]
Roy E. Welsch and Edwin Kuh. Linear Regression Diagnostics. Technical Report 923--77, Massachusetts Institute of Technology, April 1977.
[17]
Yefeng Zheng, Huiping Li, and David S. Doermann. A Parallel-Line Detection Algorithm Based on HMM Decoding. IEEE Trans. Pattern Anal. Mach. Intell., 27(5):777--792, 2005.
[18]
Yefeng Zheng, Changsong Liu, Xiaoqing Ding, and Shiyan Pan. Form Frame Line Detection with Directional Single-Connected Chain. In Proceedings of the 6th International Conference on Document.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering
September 2014
226 pages
ISBN:9781450329491
DOI:10.1145/2644866
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document analysis
  2. fourier transform
  3. ruling analysis
  4. ruling removal
  5. svm

Qualifiers

  • Research-article

Conference

DocEng '14
Sponsor:
DocEng '14: ACM Symposium on Document Engineering 2014
September 16 - 19, 2014
Colorado, Fort Collins, USA

Acceptance Rates

DocEng '14 Paper Acceptance Rate 15 of 41 submissions, 37%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 99
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media