Skip to main content
Log in

Cognition inspired format for the expression of computer vision metadata

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Over the last decade noticeable progress has occurred in automated computer interpretation of visual information. Computers running artificial intelligence algorithms are growingly capable of extracting perceptual and semantic information from images, and registering it as metadata. There is also a growing body of manually produced image annotation data. All of this data is of great importance for scientific purposes as well as for commercial applications. Optimizing the usefulness of this, manually or automatically produced, information implies its precise and adequate expression at its different logical levels, making it easily accessible, manipulable and shareable. It also implies the development of associated manipulating tools. However, the expression and manipulation of computer vision results has received less attention than the actual extraction of such results. Hence, it has experienced a smaller advance. Existing metadata tools are poorly structured, in logical terms, as they intermix the declaration of visual detections with that of the observed entities, events and comprising context. This poor structuring renders such tools rigid, limited and cumbersome to use. Moreover, they are unprepared to deal with more advanced situations, such as the coherent expression of the information extracted from, or annotated onto, multi-view video resources. The work here presented comprises the specification of an advanced XML based syntax for the expression and processing of Computer Vision relevant metadata. This proposal takes inspiration from the natural cognition process for the adequate expression of the information, with a particular focus on scenarios of varying numbers of sensory devices, notably, multi-view video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. A more detailed description of the metadata model can be found in http://mat.inescporto.pt/wp-content/uploads/2014/01/COG_metadata_model_schema.pdf

References

  1. Barrett D (2013) One surveillance camera for every 11 people in Britain, says CCTV survey. The Telegraph. http://www.telegraph.co.uk/technology/10172298/One-surveillance-camera-for-every-11-people-in- Britain-says-CCTV-survey.html

  2. Carvalho P, Cardoso JS, Corte-Real e L (2012) Filling the gap in quality assessment of video object tracking. Image Vis Comput 30(9):630–640

    Article  Google Scholar 

  3. Carvalho P, Oliveira T, Ciobanu L, Gaspar F, Teixeira LF, Bastos R, Dias MS, Cardoso JS, Côrte-Real e L (2013) Analysis of object description methods in a video object tracking environment. Mach Vis Appl 24(6):1149–1165

    Article  Google Scholar 

  4. Castro H, Alves AP (2009) Cognitive object format, international conference on knowledge engineering and ontology development. Funchal. doi:10.5220/0002263103510358.

  5. Doherty AR, Hodges SE, King AC, Smeaton AF, Berry E, Moulin CJA, Lindley S, Kelly P, Foster C (2013) Wearable cameras in health: the state of the art and future possibilities. Am J Prev Med 44(3):320–323. doi:10.1016/j.amepre.2012.11.008

    Article  Google Scholar 

  6. Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3D object recognition. In CVPR

  7. Francescani C, NYPD (2013) expands surveillance net to fight crime as well as terrorism. Reuters, http://www.reuters.com/article/2013/06/21/us-usa-ny-surveillance-idUSBRE95K0T520130621

  8. Information technology - multimedia content description interface - part 9: Profiles and levels, amendment 1: extensions to profiles and levels ISO/IEC 15938-9:2005/Amd.1:2012 (2012)

  9. Kojima A, Tamura T, Fukunaga K (2002) Natural language description of human activities from video images based on concept hierarchy of actions. Int J Comput Vis 50(2):171–184

    Article  MATH  Google Scholar 

  10. List T, Fisher RB (2004) CVML – An XML-based computer vision markup language. Proceedings of the 17th international conference on pattern recognition ICPR

  11. Manjunath BS, Salembier P, Sikora T (2002) Introduction to mpeg-7: multimedia content description interface. ISBN: 978–0-471-48678-7

  12. Marr D (2010) Vision. A computational investigation into the human representation and processing of visual information. The MIT Press, Cambridge. ISBN 978-0262514620

  13. Newcombe RA, Davison AJ (2010) Live dense reconstruction with a single moving camera. In proceedings of the ieee conference on computer vision and pattern recognition (CvPR) 1:2.2

    Google Scholar 

  14. Pereira F, Koenen R (2001) MPEG-7: a standard for multimedia content description. Intern J Imag Grap 1(3):527--547

  15. Project CAVIAR website, http://homepages.inf.ed.ac.uk/rbf/CAVIAR

  16. Project ViPER website, http://viper-toolkit.sourceforge.net

  17. Reisslein M, Rinner B, Roy-Chowdhury A (2014) Smart camera networks [guest editors’ introduction]. Computer 47(5):23–25. doi:10.1109/MC.2014.134

    Article  Google Scholar 

  18. Saligrama V, Konrad J, Jodoin P (2010) Video anomaly identification: a statistical approach. IEEE Signal Process Mag 27(5):18–33

    Article  Google Scholar 

  19. Sanes DH, Reh TA, Harris WA (2006) Development of the nervous system. Elsevier Academic Press, London

    Google Scholar 

  20. Sano M, Bailer W, Messina A, Evain J-P, Matton M (2013) The MPEG-7 audiovisual description profile (avdp) and its application to multi-view video IVMSP Workshop. 2013 IEEE 11th, pp 1--4, 2013.

  21. Schallauer P, Bailer W, Hofmann A, Mörzinger R (2009) SAM – an interoperable metadata model for multimodal surveillance applications. In proceedings of spie defense, security, and sensing 2009. Orlando

  22. Vezzani R, Cucchiara R (2010) Video surveillance online repository (ViSOR): an integrated framework. Multimedia Tools Appli 50(2):359–380

    Article  Google Scholar 

  23. Volkmer T, Smith JR, Natsev A (2005) A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. Proceedings of the 13th annual ACM international conference on multimedia, pp 892–901

  24. Wines M (2011) China: chongqing will Add 200,000 surveillance cameras. The New York Times. http://www.nytimes.com/2011/03/11/world/asia/11webbrfs-Cameras.html?_r=0

  25. Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. International conference on computer vision

  26. Yan Y, Ricci E, Subramanian R, Liu G, Sebe N (2014) Multi-task linear discriminant analysis for multi-view action recognition. IEEE Trans Image Process 23(12):5599–5611

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The Work was largely developed in the context of: project Media Arts and Technologies (MAT), NORTE-07-0124-FEDER-000061, financed by the North Portugal Regional Operational Programme (ON.2 – O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF), and by national funds, through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT); Project QREN 23277 RETAIL PRO, a co-promotion R&D project funded by European Regional Development Fund (ERDF) through ON2 as part of the National Strategic Reference Framework (NSRF), and managed by Agência de Inovação (ADI); Project QREN 33910 ARENA, a R&D project funded by European Regional Development Fund (ERDF) through ON2 as part of the National Strategic Reference Framework (NSRF), and managed by IAPMEI - Agência para a Competitividade e Inovação, I.P.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Castro.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castro, H., Monteiro, J., Pereira, A. et al. Cognition inspired format for the expression of computer vision metadata. Multimed Tools Appl 75, 17035–17057 (2016). https://doi.org/10.1007/s11042-015-2974-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2974-x

Keywords

Navigation