Skip to main content

A Table Extraction Solution for Financial Spreading

  • Conference paper
  • First Online:
Database and Expert Systems Applications - DEXA 2022 Workshops (DEXA 2022)

Abstract

Financial spreading is a necessary exercise for financial institutions to break up the analysis of financial data in making decisions like investment advisories, credit appraisals, and more. It refers to the collection of data from financial statements, where their extraction capabilities are largely manual. In today’s fast-paced banking environment, inefficient manual data extraction is a major obstacle, as it is time-consuming and error-prone. In this paper, we, therefore, address the problem of automatically extracting data for Financial Spreading. More specifically, we propose a solution to extract financial tables including Balance Sheet, Income Statement and Cash Flow Statement from financial reports in Portable Document Format (PDF). First, we propose a new extraction diagram to detect and extract financial tables from documents like annual reports; second, we build a system to extract the table using machine learning and post-processing algorithms; and third, we propose an evaluation method for assessing the performance of the extraction system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://aws.amazon.com/textract/.

  2. 2.

    https://azure.microsoft.com/services/form-recognizer/.

  3. 3.

    The table’s title and its text are collected, and similar patterns are then merged to create matching patterns. In the real system, many regular expression patterns are designed. One matching pattern is used in the filter to detect a potential page with financial tables. Others are then used to filter these pages into the potential balance sheet, potential income statement, and potential cash flow statement through the respective regular expression filters.

  4. 4.

    The financial tables are presented consecutively, however, in no fixed order.

  5. 5.

    Due to the lack of human resources, only English reports were selected to be able to accurately annotate the table contents.

References

  1. Agarwal, M., Mondal, A., Jawahar, C.V.: Cdec-net: composite deformable cascade network for table detection in document images. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9491–9498. IEEE (2021)

    Google Scholar 

  2. Camelot: PDF Table Extraction for Humans. http://camelot-py.readthedocs.io/

  3. Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Object Recognition Supported by User Interaction For Service Robots, vol. 3, pp. 236–240. IEEE (2002)

    Google Scholar 

  4. Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776. IEEE (2017)

    Google Scholar 

  5. Göbel, M., Hassan, T., Oro, E., Orsi, G.: A methodology for evaluating algorithms for table understanding in pdf documents. In: Proceedings of the 2012 ACM symposium on Document Engineering, pp. 45–48 (2012)

    Google Scholar 

  6. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: table benchmark for image-based table detection and recognition. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1918–1925 (2020)

    Google Scholar 

  7. Li, Y., Huang, Z., Yan, J., Zhou, Y., Ye, F., Liu, X.: GFTE: graph-based financial table extraction. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12662, pp. 644–658. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_50

    Chapter  Google Scholar 

  8. Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: Tablenet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 128–133. IEEE (2019)

    Google Scholar 

  9. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)

    Google Scholar 

  10. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  11. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)

    Google Scholar 

  12. e Silva, A.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Doc. Anal. Recogn. (IJDAR) 8(2–3), 144–171 (2006)

    Google Scholar 

  13. Tabula-py. https://tabula-py.readthedocs.io/en/latest/

  14. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 697–706 (2021)

    Google Scholar 

  15. Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc-Tuyen Ta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ta, DT., Jendoubi, S., Baelde, A. (2022). A Table Extraction Solution for Financial Spreading. In: Kotsis, G., et al. Database and Expert Systems Applications - DEXA 2022 Workshops. DEXA 2022. Communications in Computer and Information Science, vol 1633. Springer, Cham. https://doi.org/10.1007/978-3-031-14343-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14343-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14342-7

  • Online ISBN: 978-3-031-14343-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics