Skip to main content

Multimodal Data Processing Based on Text Classifiers and Image Recognition

  • Conference paper
  • First Online:
Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges (ICPR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13644))

Included in the following conference series:

  • 300 Accesses

Abstract

A rapid growth of using special intelligent systems in applied tasks has recently been observed in such an area of artificial intelligence as the joint processing of data from different modalities. In other words, in order to obtain more accurate predictions in artificial general intelligence, multimodal processing must be present. The study is devoted to solving the problem of classifying images containing data with text. A complex analysis model is proposed, which includes such steps as text extraction, image processing, text preprocessing, text processing, prediction integration. An important advantage of the multimodal model was an increase in classification efficiency by 5–8% compared to homogeneous information processing approaches. The problem of partitioning into 3 classes is considered, for which an accuracy metric of 86% was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fjelland, R.: Why general artificial intelligence will not be realized. Humanit. Soc. Sci. Commun. 7, 10 (2020). https://doi.org/10.1057/s41599-020-0494-4

    Article  Google Scholar 

  2. Hsu, W.N., Bolte, B., Hubert, Y.H., Lakhotia, K., Salakhutdinov R., Mohamed, A.: HuBERT: Self-supervised Speech Representation Learning by Masked Prediction of Hidden Units. https://arxiv.org/abs/2106.07447. Accessed 30 Aug 2023

  3. Ramesh, V., Kolonin, A.: Unsupervised context-driven question answering based on link grammar. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 210–220. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_22

    Chapter  Google Scholar 

  4. Remesh, A., et al.: Zero-Shot text-to-image generation. https://arxiv.org/abs/2102.12092. Accessed 28 Aug 2023

  5. Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision. https://arxiv.org/abs/2103.00020. Accessed 30 Aug 2023

  6. Andriyanov, N.A., Dementiev, V.E., Tashlinskii, A.G.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 46(1), 139–159 (2022). https://doi.org/10.18287/2412-6179-CO-922

    Article  Google Scholar 

  7. Andriyanov, N.: Methods for preventing visual attacks in convolutional neural networks based on data discard and dimensionality reduction. Appl. Sci. 11, 5235 (2021). https://doi.org/10.3390/app11115235

    Article  Google Scholar 

  8. Vizilter, Y.V., Vygolov, O.V., Zheltov, S.Y.: Morphological analysis of mosaic shapes with directed relationships based on attribute and relational model representations. Comput. Opt. 45(5), 756–766 (2021). https://doi.org/10.18287/2412-6179-CO-843

    Article  Google Scholar 

  9. Tompson, A.: AI: Megatron the Transformer, and its related language models. https://lifearchitect.ai/megatron/. Accessed 31 Aug 2023

  10. Fuentes, J.: How deep learning is transforming design: NLP and CV applications. https://towardsdatascience.com/how-deep-learning-is-transforming-design-cv-and-nlp-applications-4518c50690e6. Accessed 31 Aug 2023

  11. PyTesseract, https://pypi.org/project/pytesseract/. Accessed 31 Aug 2023

  12. Bae, S.Y., Lee, J., Jeong, J., Lim, C., Choi, J.: Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints. Comput. Toxicol. 20, 10–22 (2021). https://doi.org/10.1016/j.comtox.2021.100178

    Article  Google Scholar 

  13. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  14. Corinna, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018.S2CID206787478

    Article  MATH  Google Scholar 

  15. Andriyanov, N.A., Dementev, V.E., Vasiliev, K.K., Tashlinskii, A.G.: Investigation of methods for increasing the efficiency of convolutional neural networks in identifying tennis players. Pattern Recognit. Image Anal. 31(3), 496–505 (2021). https://doi.org/10.1134/S1054661821030032

    Article  Google Scholar 

  16. Vasil’ev, K.K., Dement’ev, V.E., Andriyanov, N.A.: Application of mixed models for solving the problem on restoring and estimating image parameters. Pattern Recognit. Image Anal. 26, 240–247 (2016). https://doi.org/10.1134/S1054661816010284

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikita Andriyanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andriyanov, N. (2023). Multimodal Data Processing Based on Text Classifiers and Image Recognition. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13644. Springer, Cham. https://doi.org/10.1007/978-3-031-37742-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37742-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37741-9

  • Online ISBN: 978-3-031-37742-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics