Abstract
Tables present important information concisely in many scientific documents. Visual features like mathematical symbols, equations, and spanning cells make structure and content extraction from tables embedded in research documents difficult. This paper discusses the dataset, tasks, participants’ methods, and results of the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. Specifically, the task of the competition is to convert a tabula r image to its corresponding
source code. We proposed two subtasks. In Subtask 1, we ask the participants to reconstruct the
structure code from an image. In Subtask 2, we ask the participants to reconstruct the
content code from an image. This report describes the datasets and ground truth specification, details the performance evaluation metrics used, presents the final results, and summarizes the participating methods. Submission by team VCGroup got the highest Exact Match accuracy score of 74% for Subtask 1 and 55% for Subtask 2, beating previous baselines by 5% and 12%, respectively. Although improvements can still be made to the recognition capabilities of models, this competition contributes to the development of fully automated table recognition systems by challenging practitioners to solve problems under specific constraints and sharing their approaches; the platform will remain available for post-challenge submissions at https://competitions.codalab.org/competitions/26979.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brischoux, F., Legagneux, P.: Don’t format manuscripts. Sci. 23(7), 24 (2009)
Côrte-Real, J., Mantadelis, T., Dutra, I., Roha, R., Burnside, E.: Skill-a stochastic inductive logic learner. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 555–558. IEEE (2015)
Deng, Y., Rosenberg, D.S., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901 (2019)
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 980–989. JMLR. org (2017)
Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006). https://doi.org/10.1007/s10032-006-0017-x
Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 445–449. IEEE (2012)
Feng, X., Yao, H., Yi, Y., Zhang, J., Zhang, S.: Scene text recognition via transformer. arXiv preprint arXiv:2003.08077 (2020)
Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (CTDAR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515. IEEE (2019)
Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1449–1453. IEEE (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
He, Y., et al.: PingAn-VCGroup’s solution for ICDAR 2021 competition on scientific table image recognition to latex (2021)
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: table benchmark for image-based table detection and recognition. In: LREC 2020, May 2020. https://www.microsoft.com/en-us/research/publication/tablebank-table-benchmark-for-image-based-table-detection-and-recognition/
Liu, L., et al.: On the variance of the adaptive learning rate and beyond. CoRR abs/1908.03265 (2019). http://arxiv.org/abs/1908.03265
Lu, N., Yu, W., Qi, X., Chen, Y., Gong, P., Xiao, R.: MASTER: multi-aspect non-local network for scene text recognition. CoRR abs/1910.02562 (2019). http://arxiv.org/abs/1910.02562
Lyu, P., Yang, Z., Leng, X., Wu, X., Li, R., Shen, X.: 2D attentional irregular scene text recognizer. arXiv preprint arXiv:1906.05708 (2019)
Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3866–3878 (2018)
Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 223–232 (2018)
Singh, M., Sarkar, R., Vyas, A., Goyal, P., Mukherjee, A., Chakrabarti, S.: Automated early leaderboard generation from comparative tables. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11437, pp. 244–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15712-8_16
Tsourtis, A., Harmandaris, V., Tsagkarogiannis, D.: Parameterization of coarse-grained molecular interactions through potential of mean force calculations and cluster expansion techniques. In: Thermodynamics and Statistical Mechanics of Small Systems, vol. 19, p. 245 (2017)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Yang, L., et al.: A simple and strong convolutional-attention network for irregular text recognition. arXiv preprint arXiv:1904.01375 (2019)
Yong, H., Huang, J., Hua, X., Zhang, L.: Gradient centralization: a new optimization technique for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 635–652. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_37
Zhang, H., et al.: Context encoding for semantic segmentation. CoRR abs/1803.08904 (2018). http://arxiv.org/abs/1803.08904
Zhang, M.R., Lucas, J., Hinton, G.E., Ba, J.: Lookahead optimizer: k steps forward, 1 step back. CoRR abs/1907.08610 (2019). http://arxiv.org/abs/1907.08610
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34
Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE, September 2019. https://doi.org/10.1109/ICDAR.2019.00166
Acknowledgments
This work was supported by The Science and Engineering Research Board (SERB), under sanction number ECR/2018/000087.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kayal, P., Anand, M., Desai, H., Singh, M. (2021). ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-86337-1_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)