Abstract
A rapid growth of using special intelligent systems in applied tasks has recently been observed in such an area of artificial intelligence as the joint processing of data from different modalities. In other words, in order to obtain more accurate predictions in artificial general intelligence, multimodal processing must be present. The study is devoted to solving the problem of classifying images containing data with text. A complex analysis model is proposed, which includes such steps as text extraction, image processing, text preprocessing, text processing, prediction integration. An important advantage of the multimodal model was an increase in classification efficiency by 5–8% compared to homogeneous information processing approaches. The problem of partitioning into 3 classes is considered, for which an accuracy metric of 86% was achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fjelland, R.: Why general artificial intelligence will not be realized. Humanit. Soc. Sci. Commun. 7, 10 (2020). https://doi.org/10.1057/s41599-020-0494-4
Hsu, W.N., Bolte, B., Hubert, Y.H., Lakhotia, K., Salakhutdinov R., Mohamed, A.: HuBERT: Self-supervised Speech Representation Learning by Masked Prediction of Hidden Units. https://arxiv.org/abs/2106.07447. Accessed 30 Aug 2023
Ramesh, V., Kolonin, A.: Unsupervised context-driven question answering based on link grammar. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 210–220. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_22
Remesh, A., et al.: Zero-Shot text-to-image generation. https://arxiv.org/abs/2102.12092. Accessed 28 Aug 2023
Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision. https://arxiv.org/abs/2103.00020. Accessed 30 Aug 2023
Andriyanov, N.A., Dementiev, V.E., Tashlinskii, A.G.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 46(1), 139–159 (2022). https://doi.org/10.18287/2412-6179-CO-922
Andriyanov, N.: Methods for preventing visual attacks in convolutional neural networks based on data discard and dimensionality reduction. Appl. Sci. 11, 5235 (2021). https://doi.org/10.3390/app11115235
Vizilter, Y.V., Vygolov, O.V., Zheltov, S.Y.: Morphological analysis of mosaic shapes with directed relationships based on attribute and relational model representations. Comput. Opt. 45(5), 756–766 (2021). https://doi.org/10.18287/2412-6179-CO-843
Tompson, A.: AI: Megatron the Transformer, and its related language models. https://lifearchitect.ai/megatron/. Accessed 31 Aug 2023
Fuentes, J.: How deep learning is transforming design: NLP and CV applications. https://towardsdatascience.com/how-deep-learning-is-transforming-design-cv-and-nlp-applications-4518c50690e6. Accessed 31 Aug 2023
PyTesseract, https://pypi.org/project/pytesseract/. Accessed 31 Aug 2023
Bae, S.Y., Lee, J., Jeong, J., Lim, C., Choi, J.: Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints. Comput. Toxicol. 20, 10–22 (2021). https://doi.org/10.1016/j.comtox.2021.100178
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Corinna, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018.S2CID206787478
Andriyanov, N.A., Dementev, V.E., Vasiliev, K.K., Tashlinskii, A.G.: Investigation of methods for increasing the efficiency of convolutional neural networks in identifying tennis players. Pattern Recognit. Image Anal. 31(3), 496–505 (2021). https://doi.org/10.1134/S1054661821030032
Vasil’ev, K.K., Dement’ev, V.E., Andriyanov, N.A.: Application of mixed models for solving the problem on restoring and estimating image parameters. Pattern Recognit. Image Anal. 26, 240–247 (2016). https://doi.org/10.1134/S1054661816010284
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Andriyanov, N. (2023). Multimodal Data Processing Based on Text Classifiers and Image Recognition. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13644. Springer, Cham. https://doi.org/10.1007/978-3-031-37742-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-37742-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37741-9
Online ISBN: 978-3-031-37742-6
eBook Packages: Computer ScienceComputer Science (R0)