skip to main content
survey

From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures

Published: 22 June 2024 Publication History

Abstract

Tables and figures are usually used to present information in a structured and visual way in scientific documents. Understanding the tables and figures in scientific documents is significant for a series of downstream tasks, such as academic search, scientific knowledge graphs, and so on. Existing studies mainly focus on detecting figures and tables from scientific documents, interpreting their semantics, and integrating them into downstream tasks. However, a systematic and comprehensive literature review on the mining and application of tables and figures in academic papers is still missing. In this article, we introduce the research framework and the whole pipeline for understanding tables and figures, including detection, structural analysis, interpretation, and application. We deliver a thorough analysis of benchmark datasets, recent techniques, and their pros and cons. Additionally, a quantitative analysis of the effectiveness of different models on popular benchmarks is presented. We further outline several important applications that exploit the semantics of scientific tables and figures. Finally, we highlight the challenges and some potential directions for future research. We believe this is the first comprehensive survey in understanding scientific tables and figures that covers the landscape from detection to application.

References

[1]
Abdelrahman Abdallah, Alexander Berendeyev, Islam Nuradin, and Daniyar Nurseitov. 2022. TNCR: Table net detection and classification dataset. Neurocomputing 473 (2022), 79–97.
[2]
Madhav Agarwal, Ajoy Mondal, and C. V. Jawahar. 2021. CDeC-Net: Composite deformable cascade network for table detection in document images. In 2020 25th International Conference on Pattern Recognition (ICPR). 9491–9498.
[3]
Shashank Agarwal and Hong Yu. 2009. FigSum: Automatically generating structured text summaries for figures in biomedical literature. AMIA Annual Symposium Proceedings 2009 (2009), 6–10.
[4]
Md. Ajij, Sanjoy Pratihar, Diptendu Sinha Roy, and Thomas Hanne. 2022. Robust detection of tables in documents using scores from table cell cores. SN Computer Science 3, 2 (March 2022), 161.
[5]
Ceyhun Burak Akgul, Daniel L. Rubin, Sandy Napel, Christopher F. Beaulieu, Hayit Greenspan, and Burak Acar. 2011. Content-based image retrieval in radiology: Current status and future directions. Journal of Digital Imaging 24, 2 (Jan. 2011), 208–222.
[6]
Rabah A. Al-Zaidy and C. Lee Giles. 2015. Automatic extraction of data from bar charts. (Oct. 2015), 30.
[7]
Sameer Antani, L. Rodney Long, and George R. Thoma. 2004. Content-based image retrieval for large biomedical image archives. In MEDINFO 2004. IOS Press, 829–833.
[8]
Brendan Artley. 2023. GenPlot: Increasing the scale and diversity of chart derendering data. arXiv preprint arXiv:2306.11699 (2023).
[9]
Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Anna Kasprzik, Markus Stocker, Maria-Esther Vidal, and Maria-Esther Vidal. 2018. Towards a knowledge graph for science. (June 2018), 1.
[10]
Filip Bajić and Josip Job. 2023. Review of chart image detection and classification. International Journal on Document Analysis and Recognition (IJDAR) (2023), 1–22.
[11]
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. arxiv:1903.10676 [cs]
[12]
Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Transactions on Information Systems 30, 1 (March 2012), 3:1–3:24.
[13]
Jwalin Bhatt, Khurram Azeem Hashmi, Muhammad Zeshan Afzal, and Didier Stricker. 2021. A survey of graphical page object detection with deep neural networks. Applied Sciences 11, 12 (2021), 5344.
[14]
Galal M. Binmakhashen and Sabri A. Mahmoud. 2019. Document layout analysis: A comprehensive survey. Comput. Surveys 52, 6 (Oct. 2019), 109:1–109:36.
[15]
Sanket Biswas, Ayan Banerjee, Josep Lladós, and Umapada Pal. 2022. DocSegTr: An instance-level end-to-end document image segmentation transformer. arXiv preprint arXiv:2201.11438 (2022).
[16]
Joseph P. Bockhorst, John M. Conroy, Shashank Agarwal, Dianne P. O’Leary, and Hong Yu. 2012. Beyond captions: Linking figures with abstract sentences in biomedical articles. PLoS ONE 7, 7 (July 2012), e39618.
[17]
Sandra Carberry, Stephanie Elzer, Nancy Green, Kathleen F. McCoy, and Daniel Chester. 2004. Extending document summarization to information graphics. In Text Summarization Branches Out. 3–9.
[18]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12346. Springer International Publishing, Cham, 213–229.
[19]
Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier, and Ningchuan Xiao. 2022. MapQA: A dataset for question answering on choropleth maps. arXiv preprint arXiv:2211.08545 (2022).
[20]
Ritwick Chaudhry, Sumit Shekhar, Utkarsh Gupta, Pranav Maneriker, Prann Bansal, and Ajay Joshi. 2020. LEAF-QA: Locate, encode & attend for figure question answering. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Snowmass Village, CO, USA, 3501–3510.
[21]
Jian Chen, Meng Ling, Rui Li, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Torsten Möller, Robert S. Laramee, Han-Wei Shen, Katharina Wünsche, and Qiru Wang. 2021. VIS30K: A collection of figures and tables from IEEE visualization conference publications. IEEE Transactions on Visualization and Computer Graphics 27, 9 (Sept. 2021), 3826–3833.
[22]
Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. 2020. TabFact: A Large-Scale Dataset for Table-Based Fact Verification. arxiv:1909.02164 [cs]
[23]
Xi Chen, Wei Zeng, Yanna Lin, Hayder Mahdi AI-maneea, Jonathan Roberts, and Remco Chang. 2021. Composition and configuration patterns in multiple-view visualizations. IEEE Transactions on Visualization and Computer Graphics 27, 2 (Feb. 2021), 1514–1524.
[24]
Zhe Chen, Michael Cafarella, and Eytan Adar. 2015. DiagramFlyer: A search engine for data-driven diagrams. (May 2015), 183–186.
[25]
Beibei Cheng, Sameer Antani, R. Joe Stanley, and George R. Thoma. 2011. Automatic segmentation of subfigure image panels for multimodal biomedical document retrieval. 7874 (Jan. 2011), 294–304.
[26]
Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. 2019. Complicated table structure recognition. (Aug. 2019).
[27]
Sagnik Ray Choudhury, Prasenjit Mitra, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, and C. Lee Giles. 2013. Figure metadata extraction from digital documents. In 2013 12th International Conference on Document Analysis and Recognition. 135–139.
[28]
Sagnik Ray Choudhury, Suppawong Tuarob, Prasenjit Mitra, Lior Rokach, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, and C. L. Giles. 2013. A figure search engine architecture for a chemistry digital library. (July 2013), 369–370.
[29]
Arnab Ghosh Chowdhury, Martin ben Ahmed, and Martin Atzmueller. 2022. Towards tabular data extraction from richly-structured documents using supervised and weakly-supervised learning. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 1–4.
[30]
Christopher Clark and Santosh Divvala. 2016. PDFFigures 2.0: Mining figures from research papers. In 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL). 143–152.
[31]
Christopher Clark and Santosh K. Divvala. 2015. Looking beyond text: Extracting figures, tables and captions from computer science papers.Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.
[32]
Mathieu Cliche, David Rosenberg, Dhruv Madeka, and Connie Yee. 2017. Scatteract: Automated extraction of data from scatter plots. Vol. 10534. 135–150. arxiv:1704.06687 [cs, stat]
[33]
Wenjing Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. 2018. Chart decoder: Generating textual and numeric information from chart images automatically. Journal of Visual Languages & Computing 48 (Oct. 2018), 101–109.
[34]
Kenny Davila, Bhargava Urala Kota, Srirangaraj Setlur, Venu Govindaraju, Christopher Tensmeyer, Sumit Shekhar, and Ritwick Chaudhry. 2019. ICDAR 2019 competition on harvesting raw tables from infographics (CHART-infographics). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 1594–1599.
[35]
Kenny Davila, Srirangaraj Setlur, David Doermann, Bhargava Urala Kota, and Venu Govindaraju. 2020. Chart mining: A survey of methods for automated chart analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 3799–3819.
[36]
Kenny Davila, Chris Tensmeyer, Sumit Shekhar, Hrituraj Singh, Srirangaraj Setlur, and Venu Govindaraju. 2021. ICPR 2020-competition on harvesting raw tables from infographics. In International Conference on Pattern Recognition. Springer, 361–380.
[37]
Kenny Davila, Fei Xu, Saleem Ahmed, David A. Mendoza, Srirangaraj Setlur, and Venu Govindaraju. 2022. ICPR 2022: Challenge on harvesting raw tables from infographics (CHART-infographics). In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4995–5001.
[38]
Dina Demner-Fushman, Sameer Antani, and George R. Thoma. 2007. Automatically finding images for clinical decision support. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007). 139–144.
[39]
Dazhen Deng, Yihong Wu, Xinhuan Shu, Jiang Wu, Siwei Fu, Weiwei Cui, and Yingcai Wu. 2022. VisImages: A fine-grained expert-annotated visualization dataset. IEEE Transactions on Visualization and Computer Graphics (2022), 1–1.
[40]
Yuntian Deng, Anssi Kanervisto, and Alexander Rush. 2016. What you get is what you see: A visual markup decompiler. (Sept. 2016).
[41]
Yuntian Deng, David Rosenberg, and Gideon Mann. 2019. Challenges in end-to-end neural scientific table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 894–901.
[42]
Harsh Desai, Pratik Kayal, and Mayank Singh. 2021. TabLeX: A benchmark dataset for structure and content information extraction from scientific tables. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12822. Springer International Publishing, Cham, 554–569.
[43]
Siqi Du, Shengjun Tang, Weixi Wang, Xiaoming Li, and Renzhong Guo. 2023. Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis. arxiv:2310.04698 [cs]
[44]
David W. Embley, Matthew Hurst, Daniel Lopresti, and George Nagy. 2006. Table-processing paradigms: A research survey. International Journal of Document Analysis and Recognition (IJDAR) 8, 2-3 (June 2006), 66–86.
[45]
Sedigheh Eslami, Gerard de Melo, and Christoph Meinel. 2021. Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as It Does in the General Domain?arxiv:2112.13906 [cs]
[46]
Keyur Faldu, Amit Sheth, Prashant Kikani, and Hemang Akbari. 2021. KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding. arxiv:2104.08145 [cs]
[47]
Ali Mazraeh Farahani, Peyman Adibi, Alireza Darvishy, Mohammad Saeed Ehsani, and Hans-Peter Hutter. 2023. Automatic chart understanding: A review. IEEE Access (2023).
[48]
Said Fathalla, Sahar Vahdati, Sören Auer, Christoph Lange, Christoph Lange, and Christoph Lange. 2017. Towards a knowledge graph representing research findings by semantifying survey articles. (Sept. 2017), 315–327.
[49]
Jinglun Gao, Yin Zhou, and Kenneth E. Barner. 2012. View: Visual information extraction widget for improving chart images accessibility. In 2012 19th IEEE International Conference on Image Processing. IEEE, 2865–2868.
[50]
Liangcai Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. 2019. ICDAR 2019 Competition on table detection and recognition (cTDaR). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1510–1515.
[51]
Andrea Gemelli, Emanuele Vivoli, and Simone Marinai. 2022. Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents. arxiv:2208.11203 [cs]
[52]
Azka Gilani, Shah Rukh Qasim, Imran Malik, and Faisal Shafait. 2017. Table detection using deep learning. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, Kyoto, 771–776.
[53]
Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1449–1453.
[54]
Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, and Jingdong Wang. 2022. TRUST: An Accurate and End-to-End Table Structure Recognizer Using Splitting-Based Transformers. arxiv:2208.14687 [cs]
[55]
Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. 2023. ChartLlama: A Multimodal LLM for Chart Understanding and Generation. arxiv:2311.16483 [cs]
[56]
Khurram Azeem Hashmi, Marcus Liwicki, Didier Stricker, Muhammad Adnan Afzal, Muhammad Ahtsham Afzal, and Muhammad Zeshan Afzal. 2021. Current status and performance analysis of table recognition in document images with deep neural networks. arXiv:2104.14272 [cs] (May 2021). arxiv:2104.14272 [cs]
[57]
Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. CasTabDetectoRS: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging 7, 10 (Oct. 2021), 214.
[58]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.
[59]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[60]
Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. 2021. PingAn-VCGroup’s solution for ICDAR 2021 Competition on scientific table image recognition to latex. arXiv preprint arXiv:2105.01846 (2021).
[61]
Yingxu He and Qiqi Sun. 2023. Towards Automatic Satellite Images Captions Generation Using Large Language Models. https://arxiv.org/abs/2310.11392v1
[62]
Nidhi Hegde, Sujoy Paul, Gagan Madan, and Gaurav Aggarwal. 2023. Analyzing the Efficacy of an LLM-Only Approach for Image-Based Document Question Answering. https://arxiv.org/abs/2309.14389v1
[63]
William R. Hersh, Henning Müller, and Jayashree Kalpathy-Cramer. 2009. The ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging 22, 6 (Dec. 2009), 648–655.
[64]
Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. TAPAS: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4320–4333. arxiv:2004.02349 [cs]
[65]
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, and Fei Huang. 2023. mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model. arxiv:2311.18248 [cs]
[66]
Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. 2023. Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning. arxiv:2312.10160 [cs]
[67]
Yongshuai Huang, Ning Lu, Dapeng Chen, Yibo Li, Zecheng Xie, Shenggao Zhu, Liangcai Gao, and Wei Peng. 2023. Improving table structure recognition with visual-alignment sequential coordinate modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11134–11143.
[68]
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. 2022. LayoutLMv3: Pre-Training for Document AI with Unified Text and Image Masking. arxiv:2204.08387 [cs]
[69]
Yilun Huang, Qinqin Yan, Yibo Li, Yifan Chen, Xiong Wang, Liangcai Gao, and Zhi Tang. 2019. A YOLO-based table detection method. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 813–818.
[70]
Matthew Hurst. 2001. Layout and language: Challenges for table understanding on the web. In Proceedings of the International Workshop on Web Document Analysis. 27–30.
[71]
Matthew Francis Hurst. 2000. The Interpretation of Tables in Texts. Ph. D. Dissertation. University of Edinburgh.
[72]
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Kheir Eddine Farfar, Manuel Prinz, Jennifer D.’Souza, Jennifer D’Souza, Gábor Kismihók, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. (Sept. 2019), 243–246.
[73]
Aditya Jindal, Ankur Gupta, Jaya Srivastava, Preeti Menghwani, Vijit Malik, Vishesh Kaushik, and Ashutosh Modi. 2021. BreakingBERT@IITK at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables. arxiv:2104.03071 [cs]
[74]
Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive data extraction from chart images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA, 6706–6717.
[75]
Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding data visualizations via question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.
[76]
Charles E. Kahn and Cheng Thao. 2007. GoldMiner: A Radiology Image Search Engine. AJR. American Journal of Roentgenology 188, 6 (June 2007), 1475–1478.
[77]
Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, and Yoshua Bengio. 2018. FigureQA: An Annotated Figure Dataset for Visual Reasoning. arxiv:1710.07300 [cs]
[78]
Sampanna Yashwant Kahu, William A. Ingram, Edward A. Fox, and Jian Wu. 2021. ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations. arxiv:2106.15320 [cs]
[79]
Amar Viswanathan Kannan, Dmitriy Fradkin, Ioannis Akrotirianakis, Tugba Kulahcioglu, Arquimedes Canedo, Aditi Roy, Shih-Yuan Yu, Malawade Arnav, and Mohammad Abdullah Al Faruque. 2020. Multimodal knowledge graph for deep learning papers and code. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3417–3420.
[80]
Zeba Karishma, Shaurya Rohatgi, Kavya Shrinivas Puranik, Jian Wu, and C. Lee Giles. 2023. ACL-Fig: A Dataset for Scientific Figure Classification. arxiv:2301.12293 [cs]
[81]
Jerrold J. Katz and Jerry A. Fodor. 1963. The structure of a semantic theory. Language 39, 2 (1963), 170–210.
[82]
I. Kavasidis, C. Pino, S. Palazzo, F. Rundo, D. Giordano, P. Messina, and C. Spampinato. 2019. A saliency-based convolutional neural network for table and chart detection in digitized documents. In Image Analysis and Processing – ICIAP 2019 (Lecture Notes in Computer Science), Elisa Ricci, Samuel Rota Bulò, Cees Snoek, Oswald Lanz, Stefano Messelodi, and Nicu Sebe (Eds.). Springer International Publishing, Cham, 292–302.
[83]
Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. 2021. ICDAR 2021 competition on scientific table image recognition to latex. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12824. Springer International Publishing, Cham, 754–766.
[84]
Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, and Wolfgang Lehner. 2019. DECO: A dataset of annotated spreadsheets for layout and table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1280–1285.
[85]
Benno Kruit, Hongyu He, and Jacopo Urbani. 2020. Tab2Know: Building a knowledge base from tables in scientific papers. In The Semantic Web – ISWC 2020, Jeff Z. Pan, Valentina Tamma, Claudia d’Amato, Krzysztof Janowicz, Bo Fu, Axel Polleres, Oshani Seneviratne, and Lalana Kagal (Eds.). Vol. 12506. Springer International Publishing, Cham, 349–365.
[86]
Saar Kuzi, ChengXiang Zhai, Yin Tian, and Haichuan Tang. 2020. FigExplorer: A system for retrieval and exploration of figures from collections of research articles. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 2133–2136.
[87]
Jay Lal, Aditya Mitkari, Mahesh Bhosale, and David Doermann. 2023. LineFormer: Line chart data extraction using instance segmentation. In International Conference on Document Analysis and Recognition. Springer, 387–400.
[88]
Po-Shen Lee and Bill Howe. 2015. Detecting and dismantling composite visualizations in the scientific literature. (Jan. 2015), 247–266.
[89]
Po-Shen Lee, Jevin D. West, and Bill Howe. 2018. Viziometrics: Analyzing visual information in the scientific literature. IEEE Transactions on Big Data 4, 1 (March 2018), 117–129.
[90]
Suhyeon Lee, Won Jun Kim, Jinho Chang, and Jong Chul Ye. 2023. LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. https://arxiv.org/abs/2305.11490v4
[91]
Shih-Hsiung Lee and Hung-Chun Chen. 2021. U-SSD: Improved SSD based on U-Net architecture for end-to-end table detection in document images. Applied Sciences 11, 23 (Jan. 2021), 11446.
[92]
Sheng Long Lee, Mohammad Reza Zare, and Mohammad Reza Zare. 2018. Biomedical compound figure detection using deep learning and fusion techniques. IET Image Processing 12, 6 (Jan. 2018), 1031–1037.
[93]
Chenxia Li, Ruoyu Guo, Jun Zhou, Mengtao An, Yuning Du, Lingfeng Zhu, Yi Liu, Xiaoguang Hu, and Dianhai Yu. 2022. PP-StructureV2: A Stronger Document Analysis System. arxiv:2210.05391 [cs]
[94]
Huichao Li, Lingze Zeng, Weiyu Zhang, Jianing Zhang, Ju Fan, and Meihui Zhang. 2022. A two-phase approach for recognizing tables with complex structures. In Database Systems for Advanced Applications, Arnab Bhattacharya, Janice Lee Mong Li, Divyakant Agrawal, P. Krishna Reddy, Mukesh Mohania, Anirban Mondal, Vikram Goyal, and Rage Uday Kiran (Eds.). Vol. 13245. Springer International Publishing, Cham, 587–595.
[95]
Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, and Furu Wei. 2022. DiT: Self-Supervised Pre-Training for Document Image Transformer. arxiv:2203.02378 [cs]
[96]
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2019. TableBank: A benchmark dataset for table detection and recognition. arXiv preprint arXiv:1903.01949 (2019). arxiv:1903.01949
[97]
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2020. TableBank: Table benchmark for image-based table detection and recognition. In Proceedings of The 12th Language Resources and Evaluation Conference. 1918–1925.
[98]
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. 2020. DocBank: A Benchmark Dataset for Document Layout Analysis. arxiv:2006.01038 [cs]
[99]
Xiao-Hui Li. 2022. Table structure recognition and form parsing by end-to-end object detection and relation parsing. Pattern Recognition (2022).
[100]
Weihong Lin. 2022. TSRFormer: Table structure recognition with transformers. (2022).
[101]
Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, and Dong Yu. 2023. MMC: Advancing Multimodal Chart Understanding with Large-Scale Instruction Tuning. arxiv:2311.10774 [cs]
[102]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
[103]
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Neural collaborative graph machines for table structure recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4533–4542.
[104]
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Neural collaborative graph machines for table structure recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4533–4542.
[105]
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, and Rongrong Ji. 2021. Show, read and reason: Table structure recognition with flexible context aggregator. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). Association for Computing Machinery, New York, NY, USA, 1084–1092.
[106]
Jixiong Liu, Yoan Chabot, Raphaël Troncy, Viet-Phi Huynh, Thomas Labbé, and Pierre Monnin. 2023. From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods. Journal of Web Semantics 76 (2023), 100761.
[107]
Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proceedings of the 2007 Conference on Digital Libraries - JCDL ’07. ACM Press, Vancouver, BC, Canada, 91.
[108]
Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 1999. TableRank: A ranking algorithm for table search and retrieval. In Proceedings of the National Conference on Artificial Intelligence, Vol. 22. Menlo Park, CA, Cambridge, MA, London, AAAI Press, MIT Press. 317.
[109]
Yan Liu, Xiaoqing Lu, Yeyang Qin, Zhi Tang, and Jianbo Xu. 2013. Review of chart recognition in document images. In Visualization and Data Analysis 2013, Vol. 8654. SPIE, 384–391.
[110]
Yingli Liu, Changkai Si, Kai Jin, Tao Shen, and Meng Hu. 2021. FCENet: An instance segmentation model for extracting figures and captions from material documents. IEEE Access 9 (2021), 551–564. 3.367
[111]
Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. 2021. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 944–952.
[112]
Luis D. Lopez, Jingyi Yu, Cecilia N. Arighi, Hongzhan Huang, Hagit Shatkay, and Cathy Wu. 2011. An automatic system for extracting figures and captions in biomedical PDF documents. In 2011 IEEE International Conference on Bioinformatics and Biomedicine. 578–581.
[113]
Daniel Lopresti and George Nagy. 2000. A tabular survey of automated table processing. In Graphics Recognition Recent Advances, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Atul K. Chhabra, and Dov Dori (Eds.). Vol. 1941. Springer Berlin, Berlin, 93–120.
[114]
Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, and Xiang Bai. 2021. MASTER: Multi-aspect non-local network for scene text recognition. Pattern Recognition 117 (Sept. 2021), 107980.
[115]
Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3219–3232.
[116]
Junyu Luo, Zekun Li, Jinpeng Wang, and Chin-Yew Lin. 2021. ChartOCR: Data extraction from charts images via a deep hybrid framework. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1917–1925.
[117]
Nam Tuan Ly and Atsuhiro Takasu. 2023. An end-to-end local attention based model for table recognition. In International Conference on Document Analysis and Recognition. Springer, 20–36.
[118]
Nam Tuan Ly, Atsuhiro Takasu, Phuc Nguyen, and Hideaki Takeda. 2023. Rethinking image-based table recognition using weakly supervised methods. arXiv preprint arXiv:2303.07641 (2023).
[119]
Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, and Jingdong Wang. 2023. GridFormer: Towards accurate table structure recognition via grid prediction. In Proceedings of the 31st ACM International Conference on Multimedia. 7747–7757.
[120]
Chixiang Ma, Weihong Lin, Lei Sun, and Qiang Huo. 2023. Robust table detection and structure recognition from heterogeneous document images. Pattern Recognition 133 (Jan. 2023), 109006.
[121]
Paula Maddigan and Teo Susnjak. 2023. Chat2VIS: Generating data visualizations via natural language using ChatGPT, codex and GPT-3 large language models. IEEE Access 11 (2023), 45181–45193.
[122]
Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244 (2022).
[123]
Ahmed Masry and Enamul Hoque Prince. 2021. Integrating image data extraction and table parsing methods for chart question answering. Chart Question Answering Workshop, in Conjunction with the Conference on Computer Vision and Pattern Recognition (CVPR). (2021), 5.
[124]
Mark E. Mattie, Lawrence Staib, Eric Stratmann, Hemant D. Tagare, James Duncan, and Perry L. Miller. 2000. PathMaster: Content-based cell image retrieval using automated feature extraction. Journal of the American Medical Informatics Association 7, 4 (July 2000), 404–415.
[125]
Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. 2024. ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-Training and Multitask Instruction Tuning. arxiv:2401.02384 [cs]
[126]
Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, and Pratyush Kumar. 2020. PlotQA: Reasoning over scientific plots. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1527–1536.
[127]
Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic. 2019. A framework for information extraction from tables in biomedical literature. International Journal on Document Analysis and Recognition (IJDAR) 22 (2019), 55–78.
[128]
Ales Mishchenko and Natalia Vassilieva. 2011. Chart image understanding and numerical data extraction. In 2011 Sixth International Conference on Digital Information Management. IEEE, 115–120.
[129]
Prerna Mishra, Santosh Kumar, and Mithilesh Kumar Chaube. 2022. Evaginating scientific charts: Recovering direct and derived information encodings from chart images. Journal of Visualization 25, 2 (April 2022), 343–359.
[130]
Ajoy Mondal, Peter Lipps, and C. V. Jawahar. 2020. IIIT-AR-13K: A new dataset for graphical object detection in documents. In Document Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26–29, 2020, Proceedings 14. Springer, 216–230.
[131]
Henning Müller, Nicolas Michoux, Nicolas Michoux, Nicolas Michoux, David Bandon, David Bandon, David Bandon, and Antoine Geissbuhler. 2004. A review of content-based image retrieval systems in medical applications—clinical benefits and future directions. International Journal of Medical Informatics 73, 1 (Feb. 2004), 1–23.
[132]
Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, and Venu Govindaraju. 2016. Understanding line plots using Bayesian network. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, 108–113.
[133]
Marcin Namysł, Alexander M. Esser, Sven Behnke, and Joachim Köhler. 2023. Flexible hybrid table recognition and semantic interpretation system. SN Computer Science 4, 3 (2023), 246.
[134]
Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, and Peter Staar. 2022. TableFormer: Table structure understanding with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4614–4623.
[135]
Danish Nazir, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. HybridTabNet: Towards better table detection in scanned document images. Applied Sciences 11, 18 (Jan. 2021), 8396.
[136]
Allard Oelen, Markus Stocker, and Sören Auer. 2020. Creating a scholarly knowledge graph from survey article tables. In Digital Libraries at Times of Massive Societal Transition, Emi Ishita, Natalie Lee San Pang, and Lihong Zhou (Eds.). Springer International Publishing, Cham, 373–389.
[137]
Kemal Oksuz, Baris Can Cam, Emre Akbas, and Sinan Kalkan. 2018. Localization recall precision (LRP): A new performance metric for object detection. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Vol. 11211. Springer International Publishing, Cham, 521–537.
[138]
Rafael Padilla, Sergio L. Netto, and Eduardo A. B. da Silva. 2020. A survey on performance metrics for object-detection algorithms. In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). 237–242.
[139]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
[140]
Hai-Hong Phan. 2021. An integrated approach for table detection and structure recognition. Journal of Research and Development on Information and Communication Technology 2021, 1 (May 2021), 41–50.
[141]
Ihsin Tsaiyun Phillips. 1996. User’s reference manual for the UW English/technical document image database III. UW-III English/Technical Document Image Database Manual (1996).
[142]
Jorge Poco and Jeffrey Heer. 2017. Reverse-engineering visualizations: Recovering visual encodings from chart images. Computer Graphics Forum 36, 3 (June 2017), 353–363.
[143]
Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. 2020. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 572–573.
[144]
Jay Pujara, Pedro Szekely, Huan Sun, and Muhao Chen. 2021. From tables to knowledge: Recent advances in table understanding. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. ACM, Virtual Event Singapore, 4060–4061.
[145]
Shah Rukh Qasim, Jan Kieseler, Yutaro Iiyama, and Maurizio Pierini. 2019. Learning representations of irregular particle-detector geometry with distance-weighted graph networks. The European Physical Journal C 79, 7 (2019), 1–11.
[146]
Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. 2019. Rethinking Table Recognition Using Graph Neural Networks. arxiv:1905.13391 [cs]
[147]
Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, and Fei Wu. 2021. LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12821. Springer International Publishing, Cham, 99–114.
[148]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.
[149]
Sachin Raja, Ajoy Mondal, and C. V. Jawahar. 2020. Table structure recognition using top-down and bottom-up cues. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12373. Springer International Publishing, Cham, 70–86.
[150]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).
[151]
Pau Riba, Anjan Dutta, Lutz Goldmann, Alicia Fornés, Oriol Ramos, and Josep Lladós. 2019. Table detection in invoice documents by graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 122–127.
[152]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.
[153]
Ranajit Saha, Ajoy Mondal, and C. V. Jawahar. 2019. Graphical object detection in document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 51–58. https://doi.org/10/gngxg6
[154]
Naveen Saini, Sriparna Saha, Pushpak Bhattacharyya, and Himanshu Tuteja. 2020. Textual entailment–based figure summarization for biomedical articles. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (April 2020), 35:1–35:24.
[155]
Robert J. Sandusky, Carol Tenopir, and Margaret M. Casado. 2007. Figure and table retrieval from scholarly journal articles: User needs for teaching and research. Proceedings of the American Society for Information Science and Technology 44, 1 (2007), 1–13.
[156]
Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 01. 1162–1167.
[157]
K. C. Shahira and A. Lijiya. 2021. Towards assisting the visually impaired: A review on techniques for decoding the visual data from chart images. IEEE Access 9 (2021), 52926–52943.
[158]
Xiangyang Shi, Yue Wu, Yue Wu, Yue Wu, Huaigu Cao, Huaigu Cao, Gully A. P. C. Burns, and Prem Natarajan. 2019. Layout-aware subfigure decomposition for complex figures in the biomedical literature. (May 2019), 1343–1347.
[159]
Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. 2019. DeepTabStR: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 1403–1409.
[160]
Shoaib Ahmed Siddiqui, Muhammad Imran Malik, Stefan Agne, Andreas Dengel, and Sheraz Ahmed. 2018. DeCNT: Deep deformable CNN for table detection. IEEE Access 6 (2018), 74151–74161. https://doi.org/10/gf8qz9
[161]
Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing result-figures in research papers. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Vol. 9911. Springer International Publishing, Cham, 664–680.
[162]
Noah Siegel, Nicholas Lourie, Russell Power, and Waleed Ammar. 2018. Extracting scientific figures with distantly supervised neural networks. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL ’18). Association for Computing Machinery, New York, NY, USA, 223–232.
[163]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[164]
Hrituraj Singh and Sumit Shekhar. 2020. STL-CQA: Structure-based transformers with localization and encoding for chart question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 3275–3284.
[165]
Brandon Smock, Rohith Pesala, and Robin Abraham. 2021. PubTables-1M: Towards comprehensive table extraction from unstructured documents. (Sept. 2021).
[166]
Carlos Soto and Shinjae Yoo. 2019. Visual detection with context for document layout analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3464–3470.
[167]
Nishant Subramani, Alexandre Matton, Malcolm Greaves, and Adrian Lam. 2021. A Survey of Deep Learning Approaches for OCR and Document Understanding. arxiv:2011.13534 [cs]
[168]
Hemant D. Tagare, C. Carl Jaffe, and James Duncan. 1997. Medical image databases: A content-based retrieval approach. Journal of the American Medical Informatics Association 4, 3 (May 1997), 184–198.
[169]
Mario Taschwer and Oge Marques. 2018. Automatic separation of compound figures in scientific articles. Multimedia Tools and Applications 77, 1 (Jan. 2018), 519–548.
[170]
Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Cohen, and Tony Martinez. 2019. Deep splitting and merging for table structure decomposition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 114–121.
[171]
Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. 2023. ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language. https://arxiv.org/abs/2311.01920v1
[172]
Dominika Tkaczyk, Pawel Szostek, and Lukasz Bolikowski. 2014. GROTOAP2-the methodology of creating a large ground truth dataset of scientific articles. D-Lib Magazine 20, 11/12 (2014).
[173]
Satoshi Tsutsui and David J. Crandall. 2017. A data driven approach for compound figure separation using convolutional neural networks. (Nov. 2017), 533–540.
[174]
Johan Van Benthem. 2008. A brief history of natural logic. (2008). https://eprints.illc.uva.nl/id/eprint/279/
[175]
Honglin Wan, Zongfeng Zhong, Tianping Li, Huaxiang Zhang, and Jiande Sun. 2022. Contextual transformer sequence-based recognition network for medical examination reports. Applied Intelligence (Dec. 2022).
[176]
Nancy X. R. Wang, Diwakar Mahajan, Marina Danilevsky, and Sara Rosenthal. 2021. SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS). arxiv:2105.13995 [cs]
[177]
Sheng Wang, Zihao Zhao, Xi Ouyang, Qian Wang, and Dinggang Shen. 2023. ChatCAD: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257 (2023).
[178]
Ziao Wang, Yuhang Li, Junda Wu, Jaehyeon Soon, and Xiaofeng Zhang. 2023. FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis. https://arxiv.org/abs/2308.01430v1
[179]
Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, and Tomas Pfister. 2024. Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. arxiv:2401.04398 [cs]
[180]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Richter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
[181]
Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2021. AI4VIS: Survey on artificial intelligence approaches for data visualization. IEEE Transactions on Visualization and Computer Graphics (2021).
[182]
Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. 2021. TGRNet: A table graph reconstruction network for table structure recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1295–1304.
[183]
Fan Yang, Lei Hu, Xinwu Liu, Shuangping Huang, and Zhenghui Gu. 2023. A large-scale dataset for end-to-end table recognition in the wild. Scientific Data 10, 1 (2023), 110.
[184]
Liping Yang, Ming Gong, and Vijayan K. Asari. 2020. Diagram image retrieval and analysis: Challenges and opportunities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 180–181.
[185]
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, and Fei Huang. 2023. mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding. https://arxiv.org/abs/2307.02499v1
[186]
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, and Fei Huang. 2023. UReader: Universal OCR-Free Visually-Situated Language Understanding with Multimodal Large Language Model. arxiv:2310.05126 [cs]
[187]
Burcu Yildiz, Katharina Kaiser, and Silvia Miksch. 2005. pdf2table: A method to extract table information from pdf files. In IICAI, Vol. 2005. Citeseer, 1773–1785.
[188]
Daekeun You, Emilia Apostolova, Sameer Antani, Dina Demner-Fushman, and George R. Thoma. 2009. Figure content analysis for improved biomedical article retrieval. In Document Recognition and Retrieval XVI, Vol. 7247. SPIE, 276–285.
[189]
Daekeun You, Emilia Apostolova, Sameer Antani, Dina Demner-Fushman, and George R. Thoma. 2009. Figure content analysis for improved biomedical article retrieval. In Document Recognition and Retrieval XVI, Vol. 7247. SPIE, 276–285.
[190]
Fengchang Yu, Jiani Huang, Zhuoran Luo, Li Zhang, and Wei Lu. 2023. An effective method for figures and tables detection in academic literature. Information Processing & Management 60, 3 (2023), 103286.
[191]
Hong Yu. 2006. Towards answering biological questions with experimental evidence: Automatically identifying text that summarize image content in full-text articles. AMIA Annual Symposium Proceedings 2006 (2006), 834–838.
[192]
Hong Yu and Minsuk Lee. 2006. Accessing bioscience images from abstract sentences. Bioinformatics 22, 14 (July 2006), e547–e556.
[193]
Hong Yu, Feifan Liu, and Balaji Polepalli Ramesh. 2010. Automatic figure ranking and user interfacing for intelligent figure search. PLOS ONE 5, 10 (2010), e12983.
[194]
Abhay Zala, Han Lin, Jaemin Cho, and Mohit Bansal. 2023. DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning. arxiv:2310.12128 [cs]
[195]
Richard Zanibbi, Dorothea Blostein, and James R. Cordy. 2004. A survey of table recognition: Models, observations, transformations, and inferences. Document Analysis and Recognition 7, 1 (March 2004).
[196]
Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Fei Wu. 2021. VSR: A unified framework for document layout analysis combining vision, semantics and relations. In Document Analysis and Recognition – ICDAR 2021 (Lecture Notes in Computer Science), Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Springer International Publishing, Cham, 115–130.
[197]
Shuo Zhang, Zhuyun Dai, Krisztian Balog, and Jamie Callan. 2020. Summarizing and exploring tabular data in conversational search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 1537–1540.
[198]
Zhenrong Zhang, Jianshu Zhang, Jun Du, and Fengren Wang. 2022. Split, embed and merge: An accurate table structure recognizer. Pattern Recognition 126 (June 2022), 108565.
[199]
Xinyi Zheng, Doug Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2020. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. arxiv:2005.00589 [cs]
[200]
Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2021. Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 697–706.
[201]
Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. 2020. Image-based table recognition: Data, model, and evaluation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12366. Springer International Publishing, Cham, 564–580.
[202]
Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. PubLayNet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 1015–1022.
[203]
Mingyang Zhou, Yi Fung, Long Chen, Christopher Thomas, Heng Ji, and Shih-Fu Chang. 2023. Enhanced chart understanding via visual language pre-training on plot table pairs. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 1314–1326.
[204]
Junnan Zhu, Haoran Li, Tianshang Liu, Yu Zhou, Jiajun Zhang, and Chengqing Zong. 2018. MSMO: Multimodal summarization with multimodal output. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4154–4164.

Cited By

View all
  • (2025)Automatic pipeline for information of curve graphs in papers based on deep learningInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02496-7Online publication date: 5-Jan-2025
  • (2024)LitAI: Enhancing Multimodal Literature Understanding and Mining with Generative AI2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR62202.2024.00080(471-476)Online publication date: 7-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 56, Issue 10
October 2024
954 pages
EISSN:1557-7341
DOI:10.1145/3613652
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2024
Online AM: 12 April 2024
Accepted: 02 April 2024
Revised: 26 January 2024
Received: 21 March 2023
Published in CSUR Volume 56, Issue 10

Check for updates

Author Tags

  1. Scientific documents
  2. figure understanding
  3. table understanding

Qualifiers

  • Survey

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,017
  • Downloads (Last 6 weeks)82
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Automatic pipeline for information of curve graphs in papers based on deep learningInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02496-7Online publication date: 5-Jan-2025
  • (2024)LitAI: Enhancing Multimodal Literature Understanding and Mining with Generative AI2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR62202.2024.00080(471-476)Online publication date: 7-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media