DeepDoT: Deep Framework for Detection of Tables in Document Images

Singh, Mandhatya; Goyal, Puneet

doi:10.1007/978-981-16-1092-9_35

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1377))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

1392 Accesses
1 Citations

Abstract

An efficient table detection process offers a solution for enterprises dealing with automated analysis of digital documents. Table detection is a challenging task due to low inter-class and high intra-class dissimilarities in document images. Further, the foreground-background class imbalance problem limits the performance of table detectors (especially single stage table detectors). The existing table detectors rely on a bottom-up scheme that efficiently captures the semantic features but fails in accounting for the resolution enriched features, thus, affecting the overall detection performance. We propose an end to end trainable framework (DeepDoT), which effectively detect the tables (of different sizes) over arbitrary scales in document images. The DeepDoT utilizes a top-down as well as a bottom-up approach, and additionally, it uses focal loss for handling the pervasive class imbalance problem for accurate predictions. We consider multiple benchmark datasets: ICDAR-2013, UNLV, ICDAR-2017 POD, and MARMOT for a thorough evaluation. The proposed approach yields comparatively better performance in terms of F1-score as compared to state-of-the-art table detection approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Object Recognition Supported by user Interaction for Service Robots, vol. 3, pp. 236–240. IEEE (2002)
Google Scholar
e Silva, A.C.: Learning rich hidden markov models in document analysis: table location. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 843–847. IEEE (2009)
Google Scholar
Tsung-Yi L., Priya G., Ross, G., Kaiming, H., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Hassan, T., Baumgartner, R.: Table recognition and understanding from pdf files. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 1143–1147. IEEE (2007)
Google Scholar
Shigarov, A., Mikhailov, A., Altaev, A.: Configurable table structure recognition in untagged pdf documents. In: Proceedings of the 2016 ACM Symposium on Document Engineering, pp. 119–122 (2016)
Google Scholar
Rastan, R., Paik, H.-Y., Shepherd, J.: Texus: a unified framework for extracting and understanding tables in pdf documents. Inf. Proc. Manage. 56(3), 895–918 (2019)
Article Google Scholar
Hao, L., Gao, L., Yi, X., Tang, Z.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292. IEEE (2016)
Google Scholar
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR 2017 competition on page object detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1417–1422. IEEE (2017)
Google Scholar
Gilani, A., Qasim, S.R., Malik, M.I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 771–776 (2017)
Google Scholar
Kavasidis, I.: A saliency-based convolutional neural network for table and chart detection in digitized documents. arXiv preprint arXiv:1804.06236 (2018)
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5315–5324 (2017)
Google Scholar
Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: Decnt: deep deformable CNN for table detection. IEEE Access 6, 74151–74161 (2018)
Google Scholar
Gobel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1449–1453. IEEE (2013)
Google Scholar
https://www.icst.pku.edu.cn/cpdp/ (2015)
https://www.iapr-tc11.org/mediawiki/ (2010)
Tran, D.N., Tran, T.A., Oh, A., Kim, S.H., Na, I.S.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 11(4), 77–85 (2015)
Article Google Scholar
Silva, A.C.: Parts that add up to a whole: a framework for the analysis of tables. Edinburgh University, UK (2010)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar

Download references

Acknowledgement

This research is supported by the IIT Ropar under ISIRD grant 9-231/2016/IIT-RPR/1395 and by the DST under CSRI grant DST/CSRI/2018/234.

Author information

Authors and Affiliations

Indian Institute of Technology Ropar, Ropar, 140001, India
Mandhatya Singh & Puneet Goyal

Authors

Mandhatya Singh
View author publications
You can also search for this author in PubMed Google Scholar
Puneet Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mandhatya Singh .

Editor information

Editors and Affiliations

Indian Institute of Information Technology Allahabad, Prayagraj, India
Satish Kumar Singh
Indian Institute of Technology Roorkee, Roorkee, India
Partha Roy
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Information Technology Allahabad, Prayagraj, India
P. Nagabhushan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, M., Goyal, P. (2021). DeepDoT: Deep Framework for Detection of Tables in Document Images. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_35

Download citation

DOI: https://doi.org/10.1007/978-981-16-1092-9_35
Published: 28 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1091-2
Online ISBN: 978-981-16-1092-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics