Multimodal Data Processing Based on Text Classifiers and Image Recognition

Andriyanov, Nikita

doi:10.1007/978-3-031-37742-6_31

Nikita Andriyanov ORCID: orcid.org/0000-0003-0735-7697⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13644))

Included in the following conference series:

International Conference on Pattern Recognition

300 Accesses

Abstract

A rapid growth of using special intelligent systems in applied tasks has recently been observed in such an area of artificial intelligence as the joint processing of data from different modalities. In other words, in order to obtain more accurate predictions in artificial general intelligence, multimodal processing must be present. The study is devoted to solving the problem of classifying images containing data with text. A complex analysis model is proposed, which includes such steps as text extraction, image processing, text preprocessing, text processing, prediction integration. An important advantage of the multimodal model was an increase in classification efficiency by 5–8% compared to homogeneous information processing approaches. The problem of partitioning into 3 classes is considered, for which an accuracy metric of 86% was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fjelland, R.: Why general artificial intelligence will not be realized. Humanit. Soc. Sci. Commun. 7, 10 (2020). https://doi.org/10.1057/s41599-020-0494-4
Article Google Scholar
Hsu, W.N., Bolte, B., Hubert, Y.H., Lakhotia, K., Salakhutdinov R., Mohamed, A.: HuBERT: Self-supervised Speech Representation Learning by Masked Prediction of Hidden Units. https://arxiv.org/abs/2106.07447. Accessed 30 Aug 2023
Ramesh, V., Kolonin, A.: Unsupervised context-driven question answering based on link grammar. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 210–220. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_22
Chapter Google Scholar
Remesh, A., et al.: Zero-Shot text-to-image generation. https://arxiv.org/abs/2102.12092. Accessed 28 Aug 2023
Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision. https://arxiv.org/abs/2103.00020. Accessed 30 Aug 2023
Andriyanov, N.A., Dementiev, V.E., Tashlinskii, A.G.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 46(1), 139–159 (2022). https://doi.org/10.18287/2412-6179-CO-922
Article Google Scholar
Andriyanov, N.: Methods for preventing visual attacks in convolutional neural networks based on data discard and dimensionality reduction. Appl. Sci. 11, 5235 (2021). https://doi.org/10.3390/app11115235
Article Google Scholar
Vizilter, Y.V., Vygolov, O.V., Zheltov, S.Y.: Morphological analysis of mosaic shapes with directed relationships based on attribute and relational model representations. Comput. Opt. 45(5), 756–766 (2021). https://doi.org/10.18287/2412-6179-CO-843
Article Google Scholar
Tompson, A.: AI: Megatron the Transformer, and its related language models. https://lifearchitect.ai/megatron/. Accessed 31 Aug 2023
Fuentes, J.: How deep learning is transforming design: NLP and CV applications. https://towardsdatascience.com/how-deep-learning-is-transforming-design-cv-and-nlp-applications-4518c50690e6. Accessed 31 Aug 2023
PyTesseract, https://pypi.org/project/pytesseract/. Accessed 31 Aug 2023
Bae, S.Y., Lee, J., Jeong, J., Lim, C., Choi, J.: Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints. Comput. Toxicol. 20, 10–22 (2021). https://doi.org/10.1016/j.comtox.2021.100178
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Corinna, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018.S2CID206787478
Article MATH Google Scholar
Andriyanov, N.A., Dementev, V.E., Vasiliev, K.K., Tashlinskii, A.G.: Investigation of methods for increasing the efficiency of convolutional neural networks in identifying tennis players. Pattern Recognit. Image Anal. 31(3), 496–505 (2021). https://doi.org/10.1134/S1054661821030032
Article Google Scholar
Vasil’ev, K.K., Dement’ev, V.E., Andriyanov, N.A.: Application of mixed models for solving the problem on restoring and estimating image parameters. Pattern Recognit. Image Anal. 26, 240–247 (2016). https://doi.org/10.1134/S1054661816010284

Download references

Author information

Authors and Affiliations

Financial University under the Government of the Russian Federation, Leningradsky pr-t 49/2, 125167, Moscow, Russian Federation
Nikita Andriyanov

Authors

Nikita Andriyanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikita Andriyanov .

Editor information

Editors and Affiliations

York University, Toronto, ON, Canada
Jean-Jacques Rousseau
Ontario Tech University, Oshawa, ON, Canada
Bill Kapralos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andriyanov, N. (2023). Multimodal Data Processing Based on Text Classifiers and Image Recognition. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13644. Springer, Cham. https://doi.org/10.1007/978-3-031-37742-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-37742-6_31
Published: 02 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37741-9
Online ISBN: 978-3-031-37742-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)