skip to main content
10.1145/2425333.2425395acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Table detection in document images using header and trailer patterns

Authors Info & Claims
Published:16 December 2012Publication History

ABSTRACT

This paper presents a new approach to detect tabular structures present in document images and in low resolution video images. The algorithm for table detection is based on identifying the unique table start pattern and table trailer pattern. We have formulated perceptual attributes to characterize the patterns. The performance of our table detection system is tested on a set of document images picked from UW-III (University of Washington) dataset, UNLV dataset, video images of NPTEL videos, and our own dataset. Our approach demonstrates improved detection for different types of table layouts, with or without ruling lines. We have obtained correct table localization on pages with multiple tables aligned side-by-side.

References

  1. B. Gatos, D. Danatsas, I. Pratikakis, and S. J. Perantonis, Automatic Table Detection in document images, International Conference on Advances in Pattern Recognition (Path and U.K.), August 2005, pp. 612--621. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Cesarini, S. Marinai, L. Sarti, and G. Soda, Trainable Table Location in Document Images, International Conference on Pattern Recognition (ICPR) (Quebec, Canada), 2002, pp. 236--240.Google ScholarGoogle ScholarCross RefCross Ref
  3. S Chandran and R Kasturi, Structural Recognition of Tabulated Data, International Conference on Document Analysis and Recognition ICDAR, 1993.Google ScholarGoogle Scholar
  4. A. C. e Silva, Learning rich hidden markov models in document analysis: Table location, International Conference on Docment Analysis and Recognition (Barcelona and Spain), July 2009, pp. 843--847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ana Costa e Silva, Alipio M Jorge, and Luis Torgo, Design of an end-to-end method to extract information from tables, International Journal of Document Analysis and Recognition IJDAR 8 (2006), no. 2, 144--171.Google ScholarGoogle ScholarCross RefCross Ref
  6. E Green and M Krishnamoorthy, Model-based Analysis of Printed Tables, International Conference on Document Analysis and Recognition (Montreal, Canada), 2005, pp. 214--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J C Handley, Electronic Imaging Technology, ch. Document Recognition, SPIE, 1999.Google ScholarGoogle Scholar
  8. O Hori and D S Doermann, Robust Table-form Structure Analysis Based on Box-Driven Reasoning, International Conference on Document Analysis and Recognition ICDAR, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J Hu, R kashi, D Lopresti, and G Wilfong, Medium-independent table detection, SPIE Document Recognition and Retrieval VII (San Jose, USA), vol. 3967, 2000, pp. 291--302.Google ScholarGoogle Scholar
  10. Mathew Hurst and N Tetsuya, Layout and Language: Integrating spatial and linguistic knowledge for layout understanding tasks, 18th International Conference on Computational Linguistics (ICCL) (Saarbruecken, Germany), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. G. Kieninger, Table Structure Recognition Based on Robust Block Segmentation, Document Recognition V SPIE (San Jose, USA), vol. 3302, 1998, pp. 22--32.Google ScholarGoogle Scholar
  12. B Klein, G Serdar, T Kieninger, and A Dengel, Three approaches to "industrial" table spotting, International Conference on Document Analysis and Recognition ICDAR (Seattle, USA), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ying Liu, Tableseer: Automatic Table Extraction and Search and Understanding, Ph.D. thesis, The Pennsylvania State University, 2009.Google ScholarGoogle Scholar
  14. Daniel Lopresti and George Nagy, Automated table processing: An (opinionated) survey, 3rd International Workshop on Graphics Recognition (Jaipur, India), 1999, pp. 109--134.Google ScholarGoogle Scholar
  15. Daniel Lopresti and George Nagy, A tabular survey of automated table processing, LNCS (Springer Verlag), vol. 1941, 2000, pp. 93--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J Ramel, M Crucianu, N Vincent, and C Faure, Detection, Extraction and Representation of Tables, International Conference on Document Analysis and Recognition (Edinburgh, UK), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Shafait and R. Smith, Table Detection in Heterogeneous Documents, 9th International Workshop on Document Analysis Systems, 2010, pp. 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J H Shamilian, S B Henry, and L W Thomas, A retargetable table reader, International Conference on Document Analysis and Recognition (Ulm, Germany), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Mandal, S. P. Chowdhury, A. K. Das, and B. Chanda, A Simple and Effective Table Detection System from Document Images, International Journal of Document Analysis and Recognition 8 (2006), no. 2, 172--182.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Smith, Hybrid Page Layout Analysis via Tab-Stop Detection, 10th International Conference Document Analysis and Recognition, 2009, pp. 241--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W Tersteegen and C Wenzel, ScanTab - Table recognition by reference tables, Document Analysis Systems (DAS) (Nagano, Japan), 1998.Google ScholarGoogle Scholar
  22. S. Tsuruoka, K. Takao, T. Tanaka, T. Yoshikawa, and T. Shinogi, Region Segmentation for Table Image with Unknown Complex Structure, International Conference on Document Analysis and Recognition, 2001, pp. 709--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Scott Tupaj, Zhongwen Shi, C. Hwa Chang, C. Hwa Chang, and Alam Hassan, Extracting tabular information from text files, EECS Department, Tufts University, 1996, pp. 214--217.Google ScholarGoogle Scholar
  24. Y Wang, T P Ihsin, and H Robert, Improvements of zone content classification by using background analysis, Document Analysis Systems (DAS) (Rio de Janeiro, Brazil), 2000.Google ScholarGoogle Scholar
  25. Y Wang, T P Ihsin, and H Robert, Automatic ground truth generation and A background analysis based table structure extraction method, International Conference on Document Analysis and Recognition (Seattle, USA), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y Wang, T P Ihsin, and H Robert, Table detection via probability optmization, Document Analysis Systems (DAS) (Princeton, NY, USA), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Janusz Wnek and Robert J Price, An automated conversion of structured documents into SGML, SPIE (San Jose, CA), vol. 3305, 1998, pp. 141--150.Google ScholarGoogle Scholar
  28. R Zanibbi, D Blostein, and J Cordy, A Survey of Table Recognition: Models, Observations, Transformations, and Inferences, International Journal on Document Analysis and Recognition IJDAR 7 (2004), no. 1, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Table detection in document images using header and trailer patterns

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICVGIP '12: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
            December 2012
            633 pages
            ISBN:9781450316606
            DOI:10.1145/2425333

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 December 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate95of286submissions,33%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader