Intelligent and Interactive Video Annotation for Instance Segmentation Using Siamese Neural Networks

Schneegans, Jan; Bieshaar, Maarten; Heidecker, Florian; Sick, Bernhard

doi:10.1007/978-3-030-68799-1_27

Jan Schneegans¹⁶,
Maarten Bieshaar¹⁶,
Florian Heidecker¹⁶ &
…
Bernhard Sick¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12664))

Included in the following conference series:

International Conference on Pattern Recognition

2209 Accesses

Abstract

Training machine learning models in a supervised manner requires vast amounts of labeled data. These labels are typically provided by humans manually annotating samples using a variety of tools. In this work, we propose an intelligent annotation tool to combine the fast and efficient labeling capabilities of modern machine learning models with the reliable and accurate, but slow, correction capabilities of human annotators. We present our approach to interactively condition a model on previously predicted and manually annotated or corrected instances and explore an iterative workflow combining the advantages of the intelligent model and the human annotator for the task of instance segmentation in videos. Thereby, the intelligent model conducts the bulk of the work, performing instance detection, tracking, and segmentation, and enables the human annotator to correct individual frames and instances selectively. The proposed approach avoids the computational cost of online retraining by being based on the one-shot learning paradigm. For this purpose, we use Siamese neural networks to transfer annotations from one video frame to another. Multiple interaction options regarding the choice of the additional input data to the neural network, e.g., model predictions or manual corrections, are explored to refine the given model’s labeling performance and speed up the annotation process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://git.ies.uni-kassel.de/public_code/intelligent_video_annotation_tool.

References

Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 859–868 (2018)
Google Scholar
Asano, Y.M., Rupprecht, C., Vedaldi, A.: A critical analysis of self-supervision, or what we can learn from a single image. In: ICLR, pp. 1–16. Vienna, Austria (2020)
Google Scholar
Bianco, S., Ciocca, G., Napoletano, P., Schettini, R.: An interactive tool for manual, semi-automatic and automatic video annotation. CVIU 131, 88–99 (2015)
Google Scholar
Castrejón, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: CVPR, pp. 4485–4493. Honolulu, HI, USA (2017)
Google Scholar
Fagot-Bouquet, L., Rabarisoa, J., Pham, Q.: Fast and accurate video annotation using dense motion hypotheses. In: ICIP, pp. 3122–3126. Paris, France (2014)
Google Scholar
Falk, T., et al.: U-Net: deep learning for cell counting, detection, and morphometry. Nature Methods 16, 67–70 (2018)
Article Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)
Article Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR, pp. 1–26. Vancouver, BC, Canada (2017)
Google Scholar
Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML, pp. 1–8. Lille, France (2015)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR, pp. 1–19. New Orleans, LA, USA (2019)
Google Scholar
Nagaraja, N., Schmidt, F.R., Brox, T.: Video segmentation with just a few strokes. In: ICCV, pp. 3235–3243. Santiago, Chile (2015)
Google Scholar
Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732. Las Vegas, NV, USA (2016)
Google Scholar
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 DAVIS challenge on video object segmentation. arXiv:1704.00675 (2017)
Subramanian, A., Subramanian, A.: One-click annotation with guided hierarchical object detection. arXiv:1810.00609 (2018)
Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: CVPR, pp. 2217–2224. Colorado Springs, CO, USA (2011)
Google Scholar
Vondrick, C., Ramanan, D.: Video annotation and tracking with active learning. Adv. Neural Inf. Process. Syst. 24, 28–36 (2011)
Google Scholar
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338. Salt Lake City, UT, USA (2018)
Google Scholar
Wang, Y., Yao, Q., Kwok, J., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53, 1–34 (2019)
Google Scholar

Download references

Acknowledgments

This work results from the project KI Data Tooling (19A20001O) funded by BMWI (German Federal Ministry for Economic Affairs and Energy), and the project DeCoInt\(^2\) supported by the German Research Foundation (DFG) within the priority program SPP 1835: “Kooperativ interagierende Automobile”, grant number SI 674/11-2.

Author information

Authors and Affiliations

Intelligent Embedded Systems, University of Kassel, Wilhelmshöher Allee 73, 34121, Kassel, Germany
Jan Schneegans, Maarten Bieshaar, Florian Heidecker & Bernhard Sick

Authors

Jan Schneegans
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Bieshaar
View author publications
You can also search for this author in PubMed Google Scholar
Florian Heidecker
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Sick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Schneegans .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell'Informazione, University of Firenze, Florence, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schneegans, J., Bieshaar, M., Heidecker, F., Sick, B. (2021). Intelligent and Interactive Video Annotation for Instance Segmentation Using Siamese Neural Networks. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12664. Springer, Cham. https://doi.org/10.1007/978-3-030-68799-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-68799-1_27
Published: 05 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68798-4
Online ISBN: 978-3-030-68799-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)