Abstract
The rapid proliferation of smart devices, surveillance cameras, infrastructures and buildings enhanced with the Internet of Things (IoT) technologies has led to a huge explosion of contents, especially in the video domain, determining an ever increasing interest towards the development of methods and tools for automatic analysis and interpretation of video sequences. Through the years, the availability of contextual knowledge has proven to improve video analysis performances in several ways, although the formal representation of semantic content in a shareable and fusion oriented manner is still an open problem, also considering the wide diffusion of Fog and Edge computing architectures for video analytics lately. In this context, an interesting answer has come from Semantic Web (SW) technologies, that opened a new perspective for the so-called Knowledge Based Computer Vision (KBCV), adding novel analytics opportunities, improving accuracy, and facilitating data exchange between video analysis systems in an open extensible manner. In this work, we propose a survey of the papers from the last eighteen years, back when first applications of semantic technologies to video analytics have appeared. The papers, analyzed under different perspectives to give a comprehensive overview of the technologies involved, reveal an interesting trend towards the adoption of SW technologies for video analytics scopes. As a result of our work, some insights about future challenges are also provided.
Similar content being viewed by others
Notes
SPARQL 1.0 recommendation was released by W3C on January 2008.
Officially released as W3C recommendation on March 2013.
see https://www.w3.org/TR/2012/REC-owl2-new-features-20121211/ for more details
EC Funded CAVIAR project/IST 2001 37540, http://homepages.inf.ed.ac.uk/rbf/CAVIAR/
References
Ainsworth T (2002) Buyer beware. Security Oz 19:18–26
Akdemir U, Turaga P, Chellappa R (2008) An ontology based approach for activity recognition from video. In: Proceedings of the 16th ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’08, pp 709–712
Alatan AA, Tuncel E, Onural L (1997) A rule-based method for object segmentation in video sequences. In: Proceedings of international conference on image processing, 1997, vol 2, pp 522–525
Albanese M, Chellappa R, Moscato V, Picariello A, Subrahmanian VS, Turaga P, Udrea O (2008) A constrained probabilistic petri net framework for human activity detection in video. IEEE Trans Multimed 10(8):1429–1443
Albanese M, Chellappa R, Cuntoor N, Moscato V, Picariello A, Subrahmanian VS, Udrea O (2010) Pads: a probabilistic activity detection framework for video data. IEEE Trans Pattern Anal Mach Intell 32(12):2246–2261
Aliyu A, Abdullah AH, Kaiwartya O, Cao Y, Lloret J, Aslam N, Joda UM (2018) Towards video streaming in iot environments: vehicular communication perspective. Comput Commun 118:93–119
Bai L, Lao S, Jones GJF, Smeaton AF (2007) Video semantic content analysis based on ontology. In: Machine vision and image processing conference, 2007. IMVIP 2007. International, pp 117–124
Ballan L, Bertini M, Del Bimbo A, Serra G (2010) Video annotation and retrieval using ontologies and rule learning. IEEE Multimed 17(4):80–88
Bannour H, Hudelot C (2011) Towards ontologies for image interpretation and annotation. In: 2011 9th International Workshop on content-based multimedia indexing (CBMI), pp 211–216
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43
Bertini M, Bimbo AD, Torniai C, Grana C, Vezzani R, Cucchiara R (2007) Sports video annotation using enhanced hsv histograms in multimedia ontologies. In: ICIAPW 2007. 14th international conference on image analysis and processing workshops, 2007, pp 160–170
Bertini M, Del Bimbo A, Serra G (2008) Learning ontology rules for semantic video annotation. In: Proceedings of the 2nd ACM workshop on Multimedia semantics, ACM, pp 1–8
Bertini M, Bimbo AD, Serra G, Torniai C, Cucchiara R, Grana C, Vezzani R (2009) Dynamic pictorially enriched ontologies for digital video libraries. IEEE Multimed 16(2):42–51
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A survey of context modelling and reasoning techniques. Pervasive Mob Comput 6(2):161–180
Bikakis N, Tsinaraki C, Gioldasis N, Stavrakantonakis I, Christodoulakis S (2013) The XML and semantic web worlds: technologies, interoperability and integration: a survey of the state of the art. Springer, Berlin, pp 319–360
Bird ND, Masoud O, Papanikolopoulos NP, Isaacs A (2005) Detection of loitering individuals in public transportation areas. IEEE Trans Intell Transp Syst 6(2):167–177
Bloehdorn S, Petridis K, Saathoff C, Simou N, Avrithis Y, H S, Kompatsiaris Y, Strintzis MG (2005) Semantic annotation of images and videos for multimedia analysis. In: Proceedings of the 2nd European semantic web conference, ESWC 2005, vol 3532, pp 592–607
Cavaliere D, Senatore S, Vento M, Loia V (2016) Towards semantic context-aware drones for aerial scenes understanding. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 115–121
Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive environments. Int J Web Inf Syst 5(4):410–430
Chen S, Clawson K, Jing M, Liu J, Wang H, Scotney B (2014) Uncertainty reasoning based formal framework for big video data understanding. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol 2, pp 487–494
Chen Y, Xie Y, Hu Y, Liu Y, Shou G (2018) Design and implementation of video analytics system based on edge computing. In: 2018 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC), IEEE, pp 130–1307
Conigliaro D, Ferrario R, Hudelot C, Porello D (2017) Integrating computer vision algorithms and ontologies for spectator crowd behavior analysis. Group and crowd behavior for computer vision. Elsevier, Amsterdam, pp 297–319
Crevier D, Lepage R (1997) Knowledge-based image understanding systems: a survey. Comput Vis Image Underst 67(2):161–185
Dasiopoulou S, Mezaris V, Kompatsiaris I, Papastathis VK, Strintzis MG (2005) Knowledge-assisted semantic video object detection. IEEE Trans Circ Syst Video Technol 15(10):1210–1224
Dasiopoulou S, Giannakidou E, Litos G, Malasioti P, Kompatsiaris Y (2011) A survey of semantic image and video annotation tools. Springer, Berlin, pp 196–239
Erozel G, Cicekli NK, Cicekli I (2008) Natural language querying for video databases. Inf Sci 178(12):2534–2552
Fan J, Luo H, Gao Y, Jain R (2007) Incorporating concept ontology for hierarchical video classification, annotation, and visualization. IEEE Trans Multimed 9(5):939–957
Ferryman J, Ellis AL (2014) Performance evaluation of crowd image analysis using the PETS2009 dataset. Pattern Recognit Lett 44:3–15
Fiorini SR, Abel M (2010) A review on knowledge-based computer vision. Technical Report, UFRGS, Porto Alegre
Foggia P, Percannella G, Saggese A, Vento M (2013) Real-time tracking of single people and groups simultaneously by contextual graph-based reasoning dealing complex occlusions. In 2013 IEEE International workshop on performance evaluation of tracking and surveillance (PETS), pp 29–36
Francois A, Nevatia R, Hobbs J, Bolles R (2005) Verl: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86
Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: International conference on knowledge engineering and knowledge management. Springer, Berlin, Heidelberg, pp. 166–181
García A, Bescós J (2008) Video object segmentation based on feedback schemes guided by a low-level scene ontology. In: Proceedings of the 10th international conference on advanced concepts for intelligent vision systems, Springer, Berlin, ACIVS ’08, pp 322–333
Gaüzère B, Ritrovato P, Saggese A, Vento M (2015) Human tracking using a top–down and knowledge based approach. Springer, Cham, pp 257–267
Gomez-Romero J, Patricio MA, García J, Molina JM (2011) Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert Syst Appl 38(6):7494–7510
Grau BC, Horrocks I, Motik B, Parsia B, Patel-Schneider P, Sattler U (2008) OWL 2: the next step for OWL. Web Semant Sci Serv Agents World Wide Web 6(4):309–322
Greco L, Ritrovato P, Saggese A, Vento M (2016a) Abnormal event recognition: a hybrid approach using semantic web. In: The IEEE conference on computer vision and pattern recognition (CVPR) workshops
Greco L, Ritrovato P, Saggese A, Vento M (2016b) Improving reliability of people tracking by adding semantic reasoning. In: 2016 13th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 194–199
Greco L, Ritrovato P, Vento M (2017a) Advanced video analytics: an ontology-based approach. In: Proceedings of the 7th international conference on web intelligence, mining and semantics, ACM, p 23
Greco L, Ritrovato P, Vento M (2017b) Advanced video analytics: an ontology-based approach. In: Proceedings of 7th international conference on web intelligence, vol part F129475. https://doi.org/10.1145/3102254.3102276
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing? Int J Hum Comput Stud 43(5):907–928
Gupta A, Weymouth TE, Jain R (1991) Semantic queries with pictures: the vimsys model. In: Proceedings of the 17th international conference on very large data bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’91, pp 69–79
Hanbury A (2008) A survey of methods for image annotation. J Vis Lang Comput 19(5):617–627
Hanson AR, Riseman EM (1978) VISIONS: a computer system for interpreting scenes. In: Hanson AR, Riseman EM (eds) Comput Vis Syst. Academic Press, New York
Hernandez-Leal P, Escalante HJ, Sucar LE (2017) Towards a generic ontology for video surveillance. Applications for future internet. Springer, Berlin, pp 3–7
Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S (2009) Owl 2 web ontology language: primer. W3c Recomm 27(1):123
Hollink L, Worring M, Schreiber AT (2005) Building a visual ontology for video retrieval. In: Proceedings of the 13th annual ACM international conference on multimedia, ACM, New York, NY, USA, MULTIMEDIA ’05, pp 479–482
Horridge M, Bechhofer S (2011) The owl api: a java api for owl ontologies. Semant Web 2(1):11–21
Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M et al (2004) Swrl: a semantic web rule language combining owl and ruleml. W3C Mem Submiss 21:79
Hsu CC, Chu WW, Taira RK (1996) A knowledge-based approach for retrieving images by content. IEEE Trans Knowl Data Eng 8(4):522–532
Hudelot C, Atif J, Bloch I (2008) Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets Syst 159(15):1929–1951
Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65
Kannan P, Bala PS, Aghila G (2012) A comparative study of multimedia retrieval using ontology for semantic web. In: 2012 international conference on advances in engineering, science and management (ICAESM), pp 400–405
Kazi Tani MY, Lablack A, Ghomari A, Bilasco I (2015) Events detection using a video-surveillance ontology and a rule-based approach. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision–ECCV 2014 workshops, lecture notes in computer science, vol 8926. Springer, Berlin, pp 299–308
Kompatsiaris I, Mezaris V, Strintzis MG (2005) Multimedia content indexing and retrieval using an object ontology. Multimedia content and semantic web-methods, standards and tools. Wiley, Hoboken, pp 339–371
Laxton B, Lim J, Kriegman D (2007) Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: IEEE conference on computer vision and pattern recognition, CVPR’07, 2007, IEEE, pp 1–8
Liu F (2014) A semantic approach for web reasoning. Inf Sci 279:827–859
Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recognit 40(1):262–282
Li LJ, Socher R, Fei-Fei L (2009) Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE
Maio CD, Fenza G, Loia V, Orciuoli F (2017a) Distributed online temporal fuzzy concept analysis for stream processing in smart cities. J Parallel Distrib Comput 110(Suplement C):31–41
Maio CD, Fenza G, Loia V, Orciuoli F (2017b) Unfolding social content evolution along time and semantics. Future Gener Comput Syst 66(Supplement C):146–159
McBride B (2002) Jena: a semantic web toolkit. IEEE Internet Comput 6(6):55–59
Meditskos G, Kompatsiaris I (2017) iknow: ontology-driven situational awareness for the recognition of activities of daily living. Pervasive Mobile Comput 40:17–41
Meditskos G, Dasiopoulou S, Efstathiou V, Kompatsiaris I (2013) Sp-act: a hybrid framework for complex activity recognition combining owl and sparql rules. In: 2013 IEEE international conference on pervasive computing and communications workshops (PERCOM workshops), pp 25–30
Molinaro C, Moscato V, Picariello A, Pugliese A, Rullo A, Subrahmanian VS (2014) Padua: parallel architecture to detect unexplained activities. ACM Trans Internet Technol 14(1):1–28
Motik B, Sattler U, Studer R (2005) Query answering for owl-dl with rules. Web Semant Sci Serv Agents World Wide Web 3(1):41–60
Motik B, Shearer R, Horrocks I (2009) Hypertableau reasoning for description logics. J Artif Intell Res 36:165–228
Naphade M, Smith JR, Tesic J, Chang SF, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91
Neches R, Fikes R, Finin T, Gruber T, Patil R, Senator T, Swartout WR (1991) Enabling technology for knowledge sharing. AI Mag 12(3):36–56
Onofri L, Soda P, Pechenizkiy M, Iannello G (2016) A survey on using domain and contextual knowledge for human activity recognition in video streams. Expert Syst Appl 63(Supplement C):97–111
Panagidi K, Anagnostopoulos C, Hadjiefthymiades S (2018) Optimal grouping-of-pictures in iot video streams. Comput Commun 118:185–194
Pantoja C, Ciapetti A, Massari C, Tarantelli M (2015) Action recognition in surveillance videos using semantic web rules. In: 6th international conference on imaging for crime prevention and detection (ICDP-15), pp 1–6
Papadopoulos GT, Mezaris V, Kompatsiaris I, Strintzis MG (2007) Semantic multimedia: second international conference on semantic and digital media technologies, SAMT 2007, Genoa, Italy, December 5–7, 2007, Proceedings. Ontology-driven semantic video analysis using visual information objects. Springer, Berlin, pp 56–69
Poppe C, Martens G, De Potter P, Van De Walle R (2012) Semantic web technologies for video surveillance metadata. Multimed Tools Appl 56(3):439–467
Riboni D, Pareschi L, Radaelli L, Bettini C (2011) Is ontology-based activity recognition really effective? In: 2011 IEEE international conference on pervasive computing and communications workshops (PERCOM workshops), pp 427–431
Rodríguez ND, Cuéllar MP, Lilius J, Calvo-Flores MD (2014) A survey on ontologies for human behavior recognition. ACM Comput Surv (CSUR) 46(4):1–33
Sanmiguel JC, Martínez JM (2012) A semantic-based probabilistic approach for real-time video event recognition. Comput Vis Image Underst 116(9):937–952
SanMiguel J, Martínez JM (2013) A semantic-guided and self-configurable framework for video analysis. Mach Vis Appl 24(3):493–512
SanMiguel J, Martinez J, Garcia A (2009) An ontology for event detection and its application in surveillance video. In: Sixth IEEE international conference on advanced video and signal based surveillance, 2009. AVSS ’09, pp 220–225
Schreiber ATG, Dubbeldam B, Wielemaker J, Wielinga B (2001) Ontology-based photo annotation. IEEE Intell Syst 3:66–74
Sikos LF (2018) Vidont: a core reference ontology for reasoning over video scenes. J Inf Telecommun 2(2):1–13
Simou N, Tzouvaras V, Avrithis Y, Stamou G, Kollias S (2005) A visual descriptor ontology for multimedia reasoning. In: Proceedings of WIAMIS ’05
Sjekavica T, Obradović I, Gledec G (2013) Ontologies for multimedia annotation: an overview. In: 4th European Conference of Computer Science (ECCS’13)
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Snidaro L, Belluz M, Foresti GL (2007) Representing and recognizing complex events in surveillance applications. In: IEEE conference on advanced video and signal based surveillance, 2007. AVSS 2007, pp 493–498
Snoek CGM, Huurnink B, Hollink L, de Rijke M, Schreiber G, Worring M (2007) Adding semantics to detectors for video retrieval. IEEE Trans Multimed 9(5):975–986
Sobhani F, Kahar NF, Zhang Q (2015) An ontology framework for automated visual surveillance system. In: 2015 13th international workshop on content-based multimedia indexing (CBMI), pp 1–7
Stavropoulos TG, Meditskos G, Kompatsiaris I (2017) Demaware 2: integrating sensors, multimedia and semantic analysis for the ambient care of dementia. Pervasive Mobile Comput 34:126–145
Stein GC, Rittscher J, Hoogs A (2003) Enabling video annotation using a semantic database extended with visual knowledge. In: 2003 international conference on multimedia and Expo, 2003. ICME ’03. Proceedings, vol 1, pp I–161–4 vol.1
Tani MYK, Ghomari A, Lablack A, Bilasco IM (2017a) Ovis: ontology video surveillance indexing and retrieval system. Int J Multimed Inf Retr 6(4):295–316
Tani MYK, Ghomari A, Youcef LD, Lablack A, Bilasco IM (2017b) An audio indexing and retrieval approach using a video surveillance ontology. In: Computing conference, 2017, IEEE, pp 258–261
Tian L, Wang H, Zhou Y, Peng C (2018) Video big data in smart city: background construction and optimization for surveillance video processing. Future Gener Comput Syst 86:1371–1382
Tiejun H (2014) Surveillance video: the biggest big data. Comput Now 7(2):82–91
Town C (2006) Ontological inference for image and video analysis. Mach Vis Appl 17(2):94–115
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488
Velastin SA, Boghossian BA, Vicencio-Silva MA (2006) A motion-based image processing system for detecting potentially dangerous situations in underground railway stations. Transp Res Part C Emerg Technol 14(2):96–113
Vezzani R, Cucchiara R (2008) Annotation collection and online performance evaluation for video surveillance: the visor project. In: IEEE fifth international conference on advanced video and signal based surveillance, 2008. AVSS’08, IEEE, pp 227–234
Wang D, Song D (2017) Video captioning with semantic information from the knowledge base. In: 2017 IEEE International conference on big knowledge (ICBK), pp 224–229
Wang H, Liu S, Chia L-T (2006) Does ontology help in image retrieval? A comparison between keyword, text ontology and multi-modality ontology approaches. In: Proceedings of the 14th ACM international conference on multimedia
Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3185–3192
Wei XY, Ngo CW, Jiang YG (2008) Selection of concept detectors for video search by ontology-enriched semantic spaces. IEEE Trans Multimed 10(6):1085–1096
Wesley LP (1986) Evidential knowledge-based computer vision. Opt Eng 25(3):363–379
Xue M, Zheng S, Zhang C (2012) Ontology-based surveillance video archive and retrieval system. In: 2012 IEEE fifth international conference on advanced computational intelligence (ICACI), pp 84–89
Xu Z, Mei L, Liu Y, Hu C (2013) Video structural description: a semantic based model for representing and organizing video surveillance big data. In: 2013 IEEE 16th international conference on computational science and engineering (CSE), IEEE, pp 802–809
Xu Z, Liu Y, Mei L, Hu C, Chen L (2015) Semantic based representing and organizing surveillance big data using video structural description technology. J Syst Softw 102(Supplement C):217–225
Yao BZ, Yang X, Lin L, Lee MW, Zhu SC (2010) I2t: image parsing to text description. Proc IEEE 98(8):1485–1508
Ye J, Dasiopoulou S, Stevenson G, Meditskos G, Kontopoulos E, Kompatsiaris I, Dobson S (2015) Semantic web technologies in pervasive computing: a survey and research roadmap. Pervasive Mobile Comput 23:1–25
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13
Yu Y, Ko H, Choi J, Kim G (2017) End-to-end concept word detection for video captioning, retrieval, and question answering. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Zaidenberg S, Boulay B, Brémond F (2012) A generic framework for video understanding applied to group behavior recognition. In: 2012 IEEE ninth international conference on advanced video and signal-based surveillance (AVSS), IEEE, pp 136–142
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Greco, L., Ritrovato, P. & Vento, M. On the use of semantic technologies for video analytics. J Ambient Intell Human Comput 12, 567–587 (2021). https://doi.org/10.1007/s12652-020-02021-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02021-y