ABSTRACT
Tables are the most convenient way to represent structured information in a document. Understanding the table structure is critical to understanding its contents. Several deep learning-based approaches from the literature have shown promising results in understanding table structures, but they require large amounts of annotated data. However, the availability of annotated datasets to train these methods are expensive, laborious, and very limited. Moreover, human-annotated data suffers from inconsistencies in table and cell annotations. We propose BUDDI Table Factory (BTF) for synthetically generating annotated documents with a wide range of variations in table structures. We propose a heuristics-based method to generate a variety of table structures from which we generate synthetic documents using LaTeX. We propose a computer vision-based approach to localize table and cell regions and automatically generate annotations in PASCAL VOC challenge format. We empirically illustrate the advantage of adding synthetic BTF documents with limited original documents to the model training, which can significantly improve the TEDS and IoU performance of the table structure recognition tasks in public and real-world healthcare datasets.
- Madhav Agarwal, Ajoy Mondal, and C. Jawahar. 2021. CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. In CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. 9491–9498. https://doi.org/10.1109/ICPR48806.2021.9411922Google Scholar
- Azim Ahmadzadeh, Dustin J. Kempton, Yang Chen, and Rafal A. Angryk. 2021. Multiscale IoU: A Metric for Evaluation of Salient Object Detection with Fine Structures. https://doi.org/10.48550/ARXIV.2105.14572Google Scholar
- Sanket Biswas, Pau Riba, Josep Lladós, and Umapada Pal. 2021. DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis. https://doi.org/10.48550/ARXIV.2107.02638Google Scholar
- G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools(2000).Google Scholar
- Quang Anh Bui, David Mollard, and Salvatore Tabbone. 2019. Automatic Synthetic Document Image Generation using Generative Adversarial Networks: Application in Mobile-Captured Document Analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 393–400. https://doi.org/10.1109/ICDAR.2019.00070Google ScholarCross Ref
- Abhishek Dutta and Andrew Zisserman. 2019. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM ’19). Association for Computing Machinery, New York, NY, USA, 2276–2279. https://doi.org/10.1145/3343031.3350535Google ScholarDigital Library
- David Etter, Stephen Rawls, Cameron Carpenter, and Gregory Sell. 2019. A Synthetic Recipe for OCR. In A Synthetic Recipe for OCR. 864–869. https://doi.org/10.1109/ICDAR.2019.00143Google Scholar
- Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The PASCAL Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (1 Jan. 2015), 98–136. https://doi.org/10.1007/s11263-014-0733-5Google ScholarDigital Library
- Max C. Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 Table Competition. 2013 12th International Conference on Document Analysis and Recognition (2013), 1449–1453.Google Scholar
- Nicholas Journet, Muriel Visani, Boris Mansencal, Kieu Van-Cuong, and Antoine Billy. 2017. DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of Imaging 3, 4 (2017). https://doi.org/10.3390/jimaging3040062Google ScholarCross Ref
- Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2020. TableBank: Table Benchmark for Image-based Table Detection and Recognition. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 1918–1925. https://aclanthology.org/2020.lrec-1.236Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.Google Scholar
- C.H. Lun and S. Hou. 2022. Geological Document Layout Analysis via Synthetic Dataset Creation. European Association of Geoscientists & Engineers 2022, 1(2022), 1–5. https://doi.org/10.3997/2214-4609.202239022Google Scholar
- Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, and Lovekesh Vig. 2019. TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images. In TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images. https://doi.org/10.1109/ICDAR.2019.00029Google Scholar
- Nandhinee PR, Harinath Krishnamoorthy, Koushik Srivatsan, Anil Goyal, and Sudarsun Santhiappan. 2022. DEXTER: An end-to-end system to extract table contents from electronic medical health documents. https://doi.org/10.48550/ARXIV.2207.06823Google Scholar
- Natraj Raman, Sameena Shah, and Manuela Veloso. 2022. Synthetic document generator for annotation-free layout recognition. Pattern Recognition 128 (aug 2022), 108660. https://doi.org/10.1016/j.patcog.2022.108660Google ScholarDigital Library
- C V Jawahar Sachin Raja, Ajoy Mondal. 2020. Table Structure Recognition using Top-Down and Bottom-Up Cues.Google Scholar
- Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 01. 1162–1167. https://doi.org/10.1109/ICDAR.2017.192Google Scholar
- Asif Shahab, Faisal Shafait, Thomas Kieninger, and Andreas Dengel. 2010. An Open Approach towards the Benchmarking of Table Structure Recognition Systems. In An Open Approach towards the Benchmarking of Table Structure Recognition Systems (Boston, Massachusetts, USA) (DAS ’10). Association for Computing Machinery, New York, NY, USA, 113–120. https://doi.org/10.1145/1815330.1815345Google ScholarDigital Library
- Noah Siegel, Nicholas Lourie, Russell Power, and Waleed Ammar. 2018. Extracting Scientific Figures with Distantly Supervised Neural Networks. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (Fort Worth, Texas, USA) (JCDL ’18). Association for Computing Machinery, New York, NY, USA, 223–232. https://doi.org/10.1145/3197026.3197040Google ScholarDigital Library
- Lars Vögtlin, Manuel Drazyk, Vinaychandran Pondenkandath, Michele Alberti, and Rolf Ingold. 2021. Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs. https://doi.org/10.48550/ARXIV.2103.08236Google Scholar
- Lin Wan, Ju Zhou, and Bailing Zhang. 2020. Data Synthesis for Document Layout Analysis. In Data Synthesis for Document Layout Analysis.Google Scholar
- Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. 2020. Image-Based Table Recognition: Data, Model, and Evaluation. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI (Glasgow, United Kingdom). Springer-Verlag, Berlin, Heidelberg, 564–580. https://doi.org/10.1007/978-3-030-58589-1_34Google ScholarDigital Library
- Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR) (2019), 1015–1022.Google Scholar
Index Terms
- BUDDI Table Factory: A toolbox for generating synthetic documents with annotated tables and cells
Recommendations
Configurable Table Structure Recognition in Untagged PDF documents
DocEng '16: Proceedings of the 2016 ACM Symposium on Document EngineeringToday, PDF is one of the most popular document formats in the web. Many PDF documents are not images, but remain untagged. They have no tags for identifying the logical reading order, paragraphs, figures, and tables. One of the challenges with these ...
End-to-end table structure recognition and extraction in heterogeneous documents
AbstractAutomatically detecting and parsing tables into an indexable and searchable format is an important problem in document digitization. It relates to computer vision, machine learning, and optical character recognition. This paper ...
Highlights- Recognizing tables using object detection in structured and unstructured documents.
Automatic extraction of table metadata from digital documents
JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital librariesTables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and high-light a collection of results obtained from experiments and scientific ...
Comments