Skip to main content

Advertisement

Log in

SurvSurf: human retrieval on large surveillance video data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The volume of surveillance videos is increasing rapidly, where humans are the major objects of interest. Rapid human retrieval in surveillance videos is therefore desirable and applicable to a broad spectrum of applications. Existing big data processing tools that mainly target textual data cannot be applied directly for timely processing of large video data due to three main challenges: videos are more data-intensive than textual data; visual operations have higher computational complexity than textual operations; and traditional segmentation may damage video data’s continuous semantics. In this paper, we design SurvSurf, a human retrieval system on large surveillance video data that exploits characteristics of these data and big data processing tools. We propose using motion information contained in videos for video data segmentation. The basic data unit after segmentation is called M-clip. M-clips help remove redundant video contents and reduce data volumes. We use the MapReduce framework to process M-clips in parallel for human detection and appearance/motion feature extraction. We further accelerate vision algorithms by processing only sub-areas with significant motion vectors rather than entire frames. In addition, we design a distributed data store called V-BigTable to structuralize M-clips’ semantic information. V-BigTable enables efficient retrieval on a huge amount of M-clips. We implement the system on Hadoop and HBase. Experimental results show that our system outperforms basic solutions by one order of magnitude in computational time with satisfactory human retrieval accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Apache Hadoop. http://hadoop.apache.org

  2. Apache HBase. http://hbase.apache.org

  3. Araujo A, Chaves J, Angst R, Girod B (2015) Temporal aggregation for large-scale query-by-image video retrieval. In: Proceedings of IEEE ICIP. IEEE, pp 1519–1522

  4. Babu RV, Ramakrishnan K (2007) Compressed domain video retrieval using object and global motion descriptors. Multimed Tools Appl 32(1):93–113

    Article  Google Scholar 

  5. Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics, pp 401–406

  6. Candan KS, Kim JW, Nagarkar P, Nagendra M, Yu R (2011) Rankloud: scalable multimedia data processing in server clusters. IEEE Multimedia 18(1):64–77

    Article  Google Scholar 

  7. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2): 4

    Article  Google Scholar 

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE CVPR, vol 1. IEEE, pp 886–893

  9. De Bruyne S, Van Deursen D, De Cock J, De Neve W, Lambert P, Van de Walle R (2008) A compressed-domain approach for shot boundary detection on h. 264/avc bit streams. Signal Process Image Commun 23(7):473–489

    Article  Google Scholar 

  10. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1):107–113

    Article  Google Scholar 

  11. Deng J, Berg AC, Fei-Fei L (2011) Hierarchical semantic indexing for large scale image retrieval. In: Proceedings of IEEE CVPR. IEEE, pp 785–792

  12. Derpanis KG, Sizintsev M, Cannons K, Wildes RP (2010) Efficient action spotting based on a space-time-oriented structure representation. In: Proceedings IEEE CVPR. IEEE, pp 1990–1997

  13. Doersch C, Singh S, Gupta A, Sivic J, Efros A (2012) What makes paris look like paris? ACM Trans Graphics 31(4):101

    Article  Google Scholar 

  14. Duan LY, Lin J, Chen J, Huang T, Gao W (2014) Compact descriptors for visual search. IEEE Multimedia 21(3):30–40

    Article  Google Scholar 

  15. Efros A (2012) What makes big visual data hard? http://bigdata.csail.mit.edu/node/68. [Online]

  16. Enzweiler M, Gavrila DM (2009) Monocular pedestrian detection: survey and experiments. IEEE Trans Pattern Anal Mach Intell 31(12):2179–2195

    Article  Google Scholar 

  17. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  18. Fernandez-Beltran R, Pla F (2016) Latent topics-based relevance feedback for video retrieval. Pattern Recogn 51:72–84

    Article  Google Scholar 

  19. Heikkinen A, Sarvanko J, Rautiainen M, Ylianttila M (2013) Distributed multimedia content analysis with mapreduce. In: 2013 IEEE 24th international symposium on personal indoor and mobile radio communications (PIMRC). IEEE, pp 3497–3501

  20. Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst, Man, Cybern C 34(3):334–352

    Article  Google Scholar 

  21. Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Pattern Anal Mach Intell 41 (6):797–819

    Google Scholar 

  22. Huang T (2014) Surveillance video: the biggest big data. IEEE Computer Society [Online] 7(2). http://www.computer.org/portal/web/computingnow/archive/february2014

  23. International Data Corporation (2012) The Digital Universe in 2020: Big Data Bigger Digital Shadows, and Biggest Growth in the Far East. http://www.emc.com/leadership/digital-universe/iview/index.htm

  24. Lai Yh, Yang C (2015) Video object retrieval by trajectory and appearance. IEEE Trans Circuits Systems Video Technology 25:1026–1037

    Article  Google Scholar 

  25. Mei S, Guan G, Wang Z, Wan S, He M, Feng DD (2015) Video summarization via minimum sparse reconstruction. Pattern Recogn 48(2):522–533

    Article  Google Scholar 

  26. Mullins J (2006) Ring of Steel II. http://spectrum.ieee.org/computing/hardware/ring-of-steel-ii

  27. Over P, Awad G, Michel M, Fiscus J, Sanders G, Kraaij W, Smeaton AF, Quénot G (2014) Trecvid 2014- an overview of the goals, tasks, data, evaluation mechanisms and metrics

  28. Ozer IB, Wolf W (2001) Human detection in compressed domain. In: Proceeding IEEE ICIP, vol 3. IEEE, pp 274–277

  29. Riggs M (2013) Intense Smog Is Making Beijing’s Massive Surveillance Network Practically Useless. http://goo.gl/9mxG0J

  30. Sadanand S, Corso JJ (2012) Action bank: a High-Level representation of activity in video. In: Proceeding IEEE CVPR. IEEE, pp 1234–1241

  31. Sivic J, Everingham M, Zisserman A (2009) Who are you?”–Learning Person Specific Classifiers from Video. In: Proceeding IEEE CVPR. IEEE, pp 1145–1152

  32. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceeding IEEE ICCV. IEEE, pp 1470–1477

  33. Torralba A, Fergus R, Freeman W (2008) 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970

    Article  Google Scholar 

  34. White B, Yeh T, Lin J, Davis L (2010) Web-scale Computer Vision using MapReduce for Multimedia Data Mining. In: Proceeding ACM MDMKDD, p 9

  35. Yang MH, Kriegman D, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24(1):34–58

    Article  Google Scholar 

  36. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, pp 10–10

  37. Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: OSDI, vol 8, p 7

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sihao Ding.

Additional information

Sihao Ding and Gang Li are co-primary authors.

Dr. Junda Zhu’s research was supported in part by The Macau Science and Technology Development Fund under Grant FDCT 023/2013/A1, and University of Macau Research Council under Multi Year Research Grant.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, S., Li, G., Li, Y. et al. SurvSurf: human retrieval on large surveillance video data. Multimed Tools Appl 76, 6521–6549 (2017). https://doi.org/10.1007/s11042-016-3307-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3307-4

Keywords

Navigation