Skip to main content
Log in

A web-based tool for fast instance-level labeling of videos and the creation of spatiotemporal media fragments

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a web-based interactive tool for time-efficient instance-level spatiotemporal labeling of videos, based on the re-detection of manually selected objects of interest that appear in them. The developed tool allows the user to select a number of instances of the object that will be used for annotating the video via detecting and spatially demarcating it in the video frames, and provide a short description about the selected object. These instances are given as input to the object re-detection module of the tool, which detects and spatially demarcates re-occurrences of the object in the video frames. The video segments that contain detected instances of the given object can be then considered as object-related media fragments, being annotated with the user-provided information about the object. A key component for building such a tool is the development of an algorithm that performs the re-detection of the object throughout the video frames. For this, the first part of this work presents our study on different approaches for object re-detection and the finally developed one, which combines the recently proposed BRISK descriptors with a descriptor matching strategy that relies on the LSH algorithm. Following, the second part of this work is dedicated to the description of the implemented tool, introducing the supported functionalities and demonstrating its use for object-specific labeling of videos. A set of experiments and a user study regarding the efficiency of the introduced object re-detection method and the performance of the developed tool indicate that the proposed framework can be used for accurate and time-efficient instance-based annotation of videos, and the creation of object-related spatiotemporal media fragments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://opencv.org/

  2. http://www.boost.org

  3. http://avro.nl

  4. http://boofcv.org/index.php?title=Main_Page

References

  1. Abeles P (2013) Examination of hybrid image feature trackers. International Symposium on Visual Computing (ISVC)

  2. Agrawal M, Konolige K, Blas MR (2008) CenSurE: Center surround extremas for realtime feature detection and matching. Comput Vision ECCV 2008(5305):102–115

    Google Scholar 

  3. Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. IEEE Conference on Computer Vision and Pattern Recognition, pp 510–517

  4. Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):117–122

    Article  Google Scholar 

  5. Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6583–6587

  6. Apostolidis E, Mezaris V, Kompatsiaris I (2013) Fast object re-detection and localization in video for spatio-temporal fragment creation. IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp 1–6

  7. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comp Vision Image Underst 110(3):346–359

    Article  Google Scholar 

  8. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  MATH  Google Scholar 

  9. Bouguet J-Y (1999) Pyramidal implementation of the Lucas Kanade feature tracker: Description of the algorithm. Intel Corporation Microprocessor Research Labs

  10. Calonder M, Lepetit V, Ozuysal M, Trzcinski T, Strecha C, Fua P (2012) BRIEF: Computing a local binary descriptor very fast. IEEE Trans Pattern Anal Mach Int 34(7):1281–1298

    Article  Google Scholar 

  11. Canclini A, Cesana M, Redondi A, Tagliasacchi M, Ascenso J, Cilla R (2013) Evaluation of low-complexity visual feature detectors and descriptors. 18th International Conference on Digital Signal Processing (DSP), pp 1–7

  12. Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 213–218

  13. Chum O, Matas J (2005) Matching with PROSAC - progressive sample consensus. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, pp 220–226

  14. Chum O, Matas J (2008) Optimal randomized RANSAC. IEEE Trans Pattern Anal Mach Int 30(8):1472–1482

    Article  Google Scholar 

  15. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Int 24(5):603–619

    Article  Google Scholar 

  16. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry, pp 253–262

  17. Ebrahimi M, Mayol-Cuevas WW (2009) SUSurE: Speeded Up Surround Extrema feature detector and descriptor for realtime applications. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

  18. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395

    Article  MathSciNet  Google Scholar 

  19. Fleury M, Self RP, Downton AC (2004) Development of a fine-grained parallel karhunen-loeve transform. J Parallel Distrib Comput 64(4):520–535

    Article  MATH  Google Scholar 

  20. Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw (TOMS) 3(3):209–226

    Article  MATH  Google Scholar 

  21. Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput C-24(7):750–753

    Article  MATH  Google Scholar 

  22. Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of 4th alvey vision conference, pp 147–151

  23. Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with Kernels. In: Proceedings of the 12th European conference on computer vision, Part IV, pp 702–715

  24. Joly A, Buisson O (2008) A posteriori multi-probe locality sensitive hashing. In: Proceedings of the 16th ACM international conference on multimedia, pp 209–218

  25. Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Int 34(7):1409–1422

    Article  Google Scholar 

  26. Kato K, Hosino T (2010) Solving k-nearest neighbor problem on multiple graphics processors. In: Proceedings of the 10th IEEE/ACM international conference on cluster cloud and grid computing, pp 769–773

  27. Ke Y, Sukthankar R (2004) PCA-SIFT: A more distinctive representation for local image descriptors

  28. Khvedchenya E (2012) A battle of three descriptors: SURF, FREAK and BRISK. Accessed December 2014. http://computer-vision-talks.com/articles/2012-08-18-a-battle-of-three-descriptors-surf-freak-and-brisk/

  29. Korman S, Avidan S (2011) Coherency sensitive hashing. In: Proceedings of the 2011 international conference on computer vision, pp 1607–1614

  30. Leutenegger S, Chli M, Siegwar R (2011) BRISK: Binary robust invariant scalable keypoints. In: Proceedings of the IEEE international conference on computer vision, pp 2548–2555

  31. Liang-Chi C, Tian-Sheuan C, Jiun-Yen C, Chang NY-C (2013) Fast SIFT design for real-time visual feature extraction. IEEE Trans Image Process 22 (8):3158–3167

    Article  Google Scholar 

  32. Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (Oral session), pp 2074–2081

  33. Liu Z, Xing B, Chen Y (2013) An efficient parallel SURF algorithm for multi-core processor. Computer Engineering and Technology, pp 27–37

  34. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  35. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. Proceedings of the 7th international joint conference on artificial intelligence 2:674–679

    Google Scholar 

  36. Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large databases, pp 950–961

  37. Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, vol 10, pp 1–36

  38. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Int 27(10):1615–1630

    Article  Google Scholar 

  39. Miksik O, Mikolajczyk K (2012) Evaluation of local detectors and descriptors for fast feature matching. 21st International Conference on Pattern Recognition (ICPR): 2681–2684

  40. Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Int 36(11):2227–2240

    Article  Google Scholar 

  41. Nebehay G, Pflugfelder R (2014) Consensus-based matching and tracking of keypoints for object tracking. IEEE Winter Conference on Applications of Computer Vision (WACV)

  42. Pan J, Manocha D (2011) Fast GPU-based locality sensitive hashing for K-nearest neighbor computation. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 211–220

  43. Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, pp 113–120

  44. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. IEEE International Conference on Computer Vision (ICCV), pp 2564–2571

  45. Shih-Fu C, Junfeng H, Youngwoon L, Jae-Pil H, Sung-Eui Y (2012) Spherical hashing. IEEE Conference on Computer Vision and Pattern Recognition, pp 2957–2964

  46. Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  47. Sismanis N, Pitsianis N, Xiaobai S (2012) Parallel search of k-nearest neighbors with synchronous operations. IEEE Conference on High Performance Extreme Computing (HPEC), pp 1–6

  48. Ta D-N, Chen W-C, Gelfand N, Pulli K (2009) SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors. IEEE Conference on Computer Vision and Pattern Recognition, pp 2937–2944

  49. Tomasi C, Kanade T (1991) Detection and tracking of point features. CMU-CS-91-132, Carnegie Mellon University

  50. Warn S, Emeneker W, Cothren J, Apon A (2009) Accelerating SIFT on parallel architectures. IEEE International Conference on Cluster Computing and Workshops, pp 1–4

  51. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. Advances in Neural Information Processing Systems, pp 1753–1760

  52. Yang D, Liu L, Zhu F, Zhang W (2011) A parallel analysis on scale invariant feature transform (SIFT) algorithm. In: Proceedings of the 9th international conference on advanced parallel processing technologies, pp 98–111

  53. Yue L, Deng C, Cheng L (2012) Density sensitive hashing. CoRR, abs/1205

  54. Zhang N (2009) Computing parallel speeded-up robust features (P-SURF) via POSIX threads. In: Proceedings of the 5th international conference on emerging intelligent computing technology and applications, pp 287–296

  55. Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Comp Vision Image Underst 113(3):345–352

    Article  Google Scholar 

  56. Zhou K, Hou Q, Wang R, Guo B (2008) Real-time KD-tree construction on graphics hardware. ACM SIGGRAPH Asia 2008 Papers, pp 126:1–126:11

Download references

Acknowledgments

This work was supported by the European Commission under contract FP7-600826 ForgetIT and FP7-287911 LinkedTV.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evlampios Apostolidis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ioannidou, A., Apostolidis, E., Collyda, C. et al. A web-based tool for fast instance-level labeling of videos and the creation of spatiotemporal media fragments. Multimed Tools Appl 76, 1735–1774 (2017). https://doi.org/10.1007/s11042-015-3125-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3125-0

Keywords

Navigation