Abstract
This paper presents a web-based interactive tool for time-efficient instance-level spatiotemporal labeling of videos, based on the re-detection of manually selected objects of interest that appear in them. The developed tool allows the user to select a number of instances of the object that will be used for annotating the video via detecting and spatially demarcating it in the video frames, and provide a short description about the selected object. These instances are given as input to the object re-detection module of the tool, which detects and spatially demarcates re-occurrences of the object in the video frames. The video segments that contain detected instances of the given object can be then considered as object-related media fragments, being annotated with the user-provided information about the object. A key component for building such a tool is the development of an algorithm that performs the re-detection of the object throughout the video frames. For this, the first part of this work presents our study on different approaches for object re-detection and the finally developed one, which combines the recently proposed BRISK descriptors with a descriptor matching strategy that relies on the LSH algorithm. Following, the second part of this work is dedicated to the description of the implemented tool, introducing the supported functionalities and demonstrating its use for object-specific labeling of videos. A set of experiments and a user study regarding the efficiency of the introduced object re-detection method and the performance of the developed tool indicate that the proposed framework can be used for accurate and time-efficient instance-based annotation of videos, and the creation of object-related spatiotemporal media fragments.
Similar content being viewed by others
References
Abeles P (2013) Examination of hybrid image feature trackers. International Symposium on Visual Computing (ISVC)
Agrawal M, Konolige K, Blas MR (2008) CenSurE: Center surround extremas for realtime feature detection and matching. Comput Vision ECCV 2008(5305):102–115
Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. IEEE Conference on Computer Vision and Pattern Recognition, pp 510–517
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):117–122
Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6583–6587
Apostolidis E, Mezaris V, Kompatsiaris I (2013) Fast object re-detection and localization in video for spatio-temporal fragment creation. IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp 1–6
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comp Vision Image Underst 110(3):346–359
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Bouguet J-Y (1999) Pyramidal implementation of the Lucas Kanade feature tracker: Description of the algorithm. Intel Corporation Microprocessor Research Labs
Calonder M, Lepetit V, Ozuysal M, Trzcinski T, Strecha C, Fua P (2012) BRIEF: Computing a local binary descriptor very fast. IEEE Trans Pattern Anal Mach Int 34(7):1281–1298
Canclini A, Cesana M, Redondi A, Tagliasacchi M, Ascenso J, Cilla R (2013) Evaluation of low-complexity visual feature detectors and descriptors. 18th International Conference on Digital Signal Processing (DSP), pp 1–7
Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 213–218
Chum O, Matas J (2005) Matching with PROSAC - progressive sample consensus. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, pp 220–226
Chum O, Matas J (2008) Optimal randomized RANSAC. IEEE Trans Pattern Anal Mach Int 30(8):1472–1482
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Int 24(5):603–619
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262
Ebrahimi M, Mayol-Cuevas WW (2009) SUSurE: Speeded Up Surround Extrema feature detector and descriptor for realtime applications. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395
Fleury M, Self RP, Downton AC (2004) Development of a fine-grained parallel karhunen-loeve transform. J Parallel Distrib Comput 64(4):520–535
Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw (TOMS) 3(3):209–226
Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput C-24(7):750–753
Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of 4th alvey vision conference, pp 147–151
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with Kernels. In: Proceedings of the 12th European conference on computer vision, Part IV, pp 702–715
Joly A, Buisson O (2008) A posteriori multi-probe locality sensitive hashing. In: Proceedings of the 16th ACM international conference on multimedia, pp 209–218
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Int 34(7):1409–1422
Kato K, Hosino T (2010) Solving k-nearest neighbor problem on multiple graphics processors. In: Proceedings of the 10th IEEE/ACM international conference on cluster cloud and grid computing, pp 769–773
Ke Y, Sukthankar R (2004) PCA-SIFT: A more distinctive representation for local image descriptors
Khvedchenya E (2012) A battle of three descriptors: SURF, FREAK and BRISK. Accessed December 2014. http://computer-vision-talks.com/articles/2012-08-18-a-battle-of-three-descriptors-surf-freak-and-brisk/
Korman S, Avidan S (2011) Coherency sensitive hashing. In: Proceedings of the 2011 international conference on computer vision, pp 1607–1614
Leutenegger S, Chli M, Siegwar R (2011) BRISK: Binary robust invariant scalable keypoints. In: Proceedings of the IEEE international conference on computer vision, pp 2548–2555
Liang-Chi C, Tian-Sheuan C, Jiun-Yen C, Chang NY-C (2013) Fast SIFT design for real-time visual feature extraction. IEEE Trans Image Process 22 (8):3158–3167
Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (Oral session), pp 2074–2081
Liu Z, Xing B, Chen Y (2013) An efficient parallel SURF algorithm for multi-core processor. Computer Engineering and Technology, pp 27–37
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. Proceedings of the 7th international joint conference on artificial intelligence 2:674–679
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large databases, pp 950–961
Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, vol 10, pp 1–36
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Int 27(10):1615–1630
Miksik O, Mikolajczyk K (2012) Evaluation of local detectors and descriptors for fast feature matching. 21st International Conference on Pattern Recognition (ICPR): 2681–2684
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Int 36(11):2227–2240
Nebehay G, Pflugfelder R (2014) Consensus-based matching and tracking of keypoints for object tracking. IEEE Winter Conference on Applications of Computer Vision (WACV)
Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for K-nearest neighbor computation. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 211–220
Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, pp 113–120
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. IEEE International Conference on Computer Vision (ICCV), pp 2564–2571
Shih-Fu C, Junfeng H, Youngwoon L, Jae-Pil H, Sung-Eui Y (2012) Spherical hashing. IEEE Conference on Computer Vision and Pattern Recognition, pp 2957–2964
Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Sismanis N, Pitsianis N, Xiaobai S (2012) Parallel search of k-nearest neighbors with synchronous operations. IEEE Conference on High Performance Extreme Computing (HPEC), pp 1–6
Ta D-N, Chen W-C, Gelfand N, Pulli K (2009) SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors. IEEE Conference on Computer Vision and Pattern Recognition, pp 2937–2944
Tomasi C, Kanade T (1991) Detection and tracking of point features. CMU-CS-91-132, Carnegie Mellon University
Warn S, Emeneker W, Cothren J, Apon A (2009) Accelerating SIFT on parallel architectures. IEEE International Conference on Cluster Computing and Workshops, pp 1–4
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. Advances in Neural Information Processing Systems, pp 1753–1760
Yang D, Liu L, Zhu F, Zhang W (2011) A parallel analysis on scale invariant feature transform (SIFT) algorithm. In: Proceedings of the 9th international conference on advanced parallel processing technologies, pp 98–111
Yue L, Deng C, Cheng L (2012) Density sensitive hashing. CoRR, abs/1205
Zhang N (2009) Computing parallel speeded-up robust features (P-SURF) via POSIX threads. In: Proceedings of the 5th international conference on emerging intelligent computing technology and applications, pp 287–296
Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Comp Vision Image Underst 113(3):345–352
Zhou K, Hou Q, Wang R, Guo B (2008) Real-time KD-tree construction on graphics hardware. ACM SIGGRAPH Asia 2008 Papers, pp 126:1–126:11
Acknowledgments
This work was supported by the European Commission under contract FP7-600826 ForgetIT and FP7-287911 LinkedTV.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ioannidou, A., Apostolidis, E., Collyda, C. et al. A web-based tool for fast instance-level labeling of videos and the creation of spatiotemporal media fragments. Multimed Tools Appl 76, 1735–1774 (2017). https://doi.org/10.1007/s11042-015-3125-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3125-0