Evaluation of a Visual Question Answering Architecture for Pedestrian Attribute Recognition

Castrillón-Santana, Modesto; Sánchez-Nielsen, Elena; Freire-Obregón, David; Santana, Oliverio J.; Hernández-Sosa, Daniel; Lorenzo-Navarro, Javier

doi:10.1007/978-3-031-44237-7_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14184))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

762 Accesses
2 Citations

Abstract

Pedestrian attribute recognition (PAR) ensures public safety and security. By automatically detecting attributes such as clothing color, accessories, and hairstyles, surveillance systems can provide valuable information for criminal investigations, aiding in identifying suspects based on their appearances. Additionally, in crowd management scenarios, PAR enables monitoring of specific groups, such as individuals wearing safety gear at construction sites or identifying potential threats in sensitive areas. Real-time attribute recognition enhances situational awareness and facilitates rapid response during emergencies, thereby contributing to public spaces’ overall safety and security. This work proposes applying the BLIP-2 Visual Question Answering (VQA) framework to address the PAR problem. By employing Large Language Models (LLMs), we have achieved an accuracy rate of 92% in the private set. This combination of VQA and LLMs makes it possible to effectively analyze visual information and answer questions related to pedestrian attributes, improving the accuracy and performance of PAR systems.

This work is partially funded by the Spanish Ministry of Science and Innovation under project PID2021-122402OB-C22, TED2021-131019B-10, and by the ACIISI-Gobierno de Canarias and European FEDER funds under projects ProID2021010012, ULPGC Facilities Net, and Grant EIS 2021 04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Visual Question Answering Models for Zero-Shot Pedestrian Attribute Recognition: A Comparative Study

Article Open access 28 June 2024

Pedestrian Attribute Recognition Using Hierarchical Transformers

Visual attribute detction for pedestrian detection

Article 16 December 2016

Notes

1.
https://huggingface.co/Salesforce/blip-image-captioning-base.

References

Agrawal, A., et al.: VQA: visual question answering. Int. J. Comput. Vision 123, 4–31 (2015)
Google Scholar
Barra, S., Bisogni, C., De Marsico, M., Ricciardi, S.: Visual question answering: which investigated applications? Pattern Recognit. Lett. 151, 325–331 (2021)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. OpenReview.net (2021). https://openreview.net/forum?id=YicbFdNTTy
Freire-Obregón, D., De Marsico, M., Barra, P., Lorenzo-Navarro, J., Castrillón-Santana, M.: Zero-shot ear cross-dataset transfer for person recognition on mobile devices. Pattern Recogn. Lett. 166, 143–150 (2023)
Google Scholar
Goyal, Y., Khot, T., Agrawal, A., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. Int. J. Comput. Vision 127(4), 398–414 (2019). https://doi.org/10.1007/s11263-018-1116-0
Greco, A., Vento, B.: PAR Contest 2023: pedestrian attributes recognition with multi-task learning. In: 20th International Conference on Computer Analysis of Images and Patterns: CAIP 2023. Springer, Cham (2023)
Google Scholar
Kafle, K., Kanan, C.: An analysis of visual question answering algorithms. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1983–1991 (2017)
Google Scholar
Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models (2023). https://doi.org/10.48550/arXiv.2301.12597
Li, Y., et al.: Competition-level code generation with alphacode. Science 378, 1092–1097 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021). https://proceedings.mlr.press/v139/radford21a.html
Sridhar, P., Lee, H., Dutta, A., Zisserman, A.: Wise image search engine (WISE). In: Wiki Workshop (2023)
Google Scholar
Thoppilan, R., et al.: LaMDA: language models for dialog applications. arXiv abs/2201.08239 (2022)
Google Scholar
Toor, A.S., Wechsler, H., Nappi, M.: Biometric surveillance using visual question answering. Pattern Recognit. Lett. 126, 111–118 (2019). https://doi.org/10.1016/j.patrec.2018.02.013. www.sciencedirect.com/science/article/pii/S0167865518300564. Robustness, Security and Regulation Aspects in Current Biometric Systems
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv abs/2302.13971 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad de Las Palmas de Gran Canaria, 35017, Las Palmas de Gran Canaria, Spain
Modesto Castrillón-Santana, David Freire-Obregón, Oliverio J. Santana, Daniel Hernández-Sosa & Javier Lorenzo-Navarro
Universidad de La Laguna, 38200, San Cristóbal de La Laguna, Spain
Elena Sánchez-Nielsen

Authors

Modesto Castrillón-Santana
View author publications
You can also search for this author in PubMed Google Scholar
Elena Sánchez-Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
David Freire-Obregón
View author publications
You can also search for this author in PubMed Google Scholar
Oliverio J. Santana
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Hernández-Sosa
View author publications
You can also search for this author in PubMed Google Scholar
Javier Lorenzo-Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Modesto Castrillón-Santana .

Editor information

Editors and Affiliations

Cyprus University of Technology, Limassol, Cyprus
Nicolas Tsapatsoulis
Cyprus University of Technology/CYENS Center of Excellence, Limassol, Cyprus
Andreas Lanitis
The University of New Mexico, Albuquerque, NM, USA
Marios Pattichis
University of Cyprus/CYENS Center of Excellence, Nicosia, Cyprus
Constantinos Pattichis
University of Cyprus/KIOS Center of Excellence, Nicosia, Cyprus
Christos Kyrkou
Cyprus University of Technology, Limassol, Cyprus
Efthyvoulos Kyriacou
Cyprus University of Technology/CYENS Center of Excellence, Limassol, Cyprus
Zenonas Theodosiou
CYENS Center of Excellence, Nicosia, Cyprus
Andreas Panayides

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castrillón-Santana, M., Sánchez-Nielsen, E., Freire-Obregón, D., Santana, O.J., Hernández-Sosa, D., Lorenzo-Navarro, J. (2023). Evaluation of a Visual Question Answering Architecture for Pedestrian Attribute Recognition. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14184. Springer, Cham. https://doi.org/10.1007/978-3-031-44237-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-44237-7_2
Published: 20 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44236-0
Online ISBN: 978-3-031-44237-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of a Visual Question Answering Architecture for Pedestrian Attribute Recognition