Skip to main content

Multimodal Classification of Document Embedded Images

  • Conference paper
  • First Online:
Graphics Recognition. Current Trends and Evolutions (GREC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11009))

Included in the following conference series:

  • 1483 Accesses

Abstract

Images embedded in documents carry extremely rich information that is vital in its content extraction and knowledge construction. Interpreting the information in diagrams, scanned tables and other types of images, enriches the underlying concepts, but requires a classifier that can recognize the huge variability of potential embedded image types and enable their relationship reconstruction. Here we tested different deep learning-based approaches for image classification on a dataset of 32K images extracted from documents and divided in 62 categories for which we obtain accuracy of \(\sim 85\%\). We also investigate to what extent textual information improves classification performance when combined with visual features. The textual features were obtained either from text embedded in the images or image captions. Our findings suggest that textual information carry relevant information with respect to the image category and that multimodal classification provides up to 7% better accuracy than single data type classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chao, H., Fan, J.: Layout and content extraction for PDF documents. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 213–224. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28640-0_20

    Chapter  Google Scholar 

  2. Cheng, B., Stanley, R.J., Antani, S., Thoma, G.R.: Graphical figure classification using data fusion for integrating text and image features. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 693–697. IEEE (2013)

    Google Scholar 

  3. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. arXiv preprint (2016)

    Google Scholar 

  4. Clark, C.A., Divvala, S.K.: Looking beyond text: extracting figures, tables and captions from computer science papers. In: AAAI Workshop: Scholarly Big Data (2015)

    Google Scholar 

  5. Ferreira, D.S., Ribeiro, J., Papa, A.R., Menezes, R.: Towards evidences of long-range correlations in seismic activity. arXiv preprint arXiv:1405.0307 (2014)

  6. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995. IEEE (2015)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Ittner, D.J., Lewis, D.D., Ahn, D.D.: Text categorization of low quality images. In: Symposium on Document Analysis and Information Retrieval, pp. 301–315. Citeseer (1995)

    Google Scholar 

  9. Kang, L., Kumar, J., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for document image classification. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 3168–3172. IEEE (2014)

    Google Scholar 

  10. Maaten, L.V.D., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  11. Maderlechner, G., Suda, P., Brückner, T.: Classification of documents by form and content. Pattern Recognit. Lett. 18(11–13), 1225–1231 (1997)

    Article  Google Scholar 

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  13. Miranda, E., Aryuni, M., Irwansyah, E.: A survey of medical image classification techniques. In: International Conference on Information Management and Technology (ICIMTech), pp. 56–61. IEEE (2016)

    Google Scholar 

  14. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  15. Taylor, S.L., Lipshutz, M., Nilson, R.W.: Classification and functional decomposition of business documents. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 2, pp. 563–566. IEEE (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matheus Viana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Viana, M., Nguyen, QB., Smith, J., Gabrani, M. (2018). Multimodal Classification of Document Embedded Images. In: Fornés, A., Lamiroy, B. (eds) Graphics Recognition. Current Trends and Evolutions. GREC 2017. Lecture Notes in Computer Science(), vol 11009. Springer, Cham. https://doi.org/10.1007/978-3-030-02284-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02284-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02283-9

  • Online ISBN: 978-3-030-02284-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics