Image Classification Using Contrastive Language-Image Pre-training: Application to Aerial Views of Power Line Infrastructures

Losada, Adrián; Bernardos, Ana M.; Besada, Juan

doi:10.1007/978-3-031-42536-3_2

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 750))

Included in the following conference series:

International Conference on Soft Computing Models in Industrial and Environmental Applications

744 Accesses

Abstract

This article evaluates the use of CLIP, a contrastive language-image pre-training methodology, for analyzing aerial images of power line infrastructures. Companies record videos using drones and helicopters to assess the health status of the infrastructures, resulting in hours of unlabeled video. This study proposes a semi-supervised approach that combines natural language processing and image understanding to learn a common representation of images and text. A small set of images labeled based on criteria such as transmission tower type, camera angle view, and background were used to fine-tune CLIP for generating domain-specific embeddings. Results show that this approach achieved an F1 score of over 96% for detecting transmission towers, which could be used to automatically classify unlabeled aerial images as the first step in maintenance data pipelines for predictive detection of anomalies in components, presence of nests or plants, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

VLP: A Survey on Vision-language Pre-training

Article Open access 10 January 2023

Development and Classification of Image Dataset for Text-to-Image Generation

Article 29 February 2024

References

Abdelfattah, R., Wang, X, Wang, S.: TTPLA: An Aerial-Image Dataset for Detection and Segmentation of Transmission Towers and Power Lines. In: Proceedings of the Asian Conference on Computer Vision (2020)
Google Scholar
Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision. In: Proceedings of the International Conference on ML, pp. 8748–8763 (2021)
Google Scholar
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: Contrastive learning from unpaired medical images and text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3876–3887 (2022)
Google Scholar
Deng, Y., Campbell, R., Kumar, P.: Fire and Gun Detection Based on Sematic Embeddings. In: IEEE International Conference on Multimedia (2022)
Google Scholar
Endo, M., Krishnan, R., Krishna, V., Ng, A.Y., Rajpurkar, P.: Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model. In: Proceedings of ML Research, vol. 158. PMLR, pp. 209–219, Nov. 28 (2021)
Google Scholar
Khorramshahi, P., Rambhatla S.S., Chellappa, R.: Towards Accurate Visual and Natural Language-Based Vehicle Retrieval Systems. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 4183–4192 (2021)
Google Scholar
Different types of transmission towers, Electrical Engineering Pics (2014)
Google Scholar
Pillow (PIL Fork), PIL Documentation - Concepts, Pillow (PIL Fork) 9.4.0, https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes. Accessed Mar 2023
OpenAI, CLIP (Contrastive Language-Image Pretraining), GitHub, Jan. 05, 2021. https://github.com/openai/CLIP. Accessed Mar 2023
Dosovitskiy, A., et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, https://github.com/. Accessed Mar 2023

Download references

Acknowledgements

Authors acknowledge the funding under grant AI4TES and PDC2021–121567-C21 funded by the Spanish Ministry of Economic Affairs and Digital Transformation and MCIN/AEI/10.13039/501100011033/, respectively, and by EU Next GenerationEU.

Author information

Authors and Affiliations

Information Processing and Telecommunications Center, Universidad Politécnica de Madrid, ETSI Telecomunicación, 28040, Madrid, Spain
Adrián Losada, Ana M. Bernardos & Juan Besada

Authors

Adrián Losada
View author publications
You can also search for this author in PubMed Google Scholar
Ana M. Bernardos
View author publications
You can also search for this author in PubMed Google Scholar
Juan Besada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana M. Bernardos .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Deusto, Bilbao, Spain
Pablo García Bringas
School of Industrial, Computer, University of Leon, León, Spain
Hilde Pérez García
Department of Mechanical Engineering, University of La Rioja, Logroño, Spain
Francisco Javier Martínez de Pisón
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Francisco Martínez Álvarez
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Alicia Troncoso Lora
Applied Computational Intelligence, University of Burgos, Burgos, Spain
Álvaro Herrero
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
José Luis Calvo Rolle
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
Héctor Quintián
Faculty of Science, University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Losada, A., Bernardos, A.M., Besada, J. (2023). Image Classification Using Contrastive Language-Image Pre-training: Application to Aerial Views of Power Line Infrastructures. In: García Bringas, P., et al. 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2023). SOCO 2023. Lecture Notes in Networks and Systems, vol 750. Springer, Cham. https://doi.org/10.1007/978-3-031-42536-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-42536-3_2
Published: 31 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42535-6
Online ISBN: 978-3-031-42536-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics