Abstract
The volume of surveillance videos is increasing rapidly, where humans are the major objects of interest. Rapid human retrieval in surveillance videos is therefore desirable and applicable to a broad spectrum of applications. Existing big data processing tools that mainly target textual data cannot be applied directly for timely processing of large video data due to three main challenges: videos are more data-intensive than textual data; visual operations have higher computational complexity than textual operations; and traditional segmentation may damage video data’s continuous semantics. In this paper, we design SurvSurf, a human retrieval system on large surveillance video data that exploits characteristics of these data and big data processing tools. We propose using motion information contained in videos for video data segmentation. The basic data unit after segmentation is called M-clip. M-clips help remove redundant video contents and reduce data volumes. We use the MapReduce framework to process M-clips in parallel for human detection and appearance/motion feature extraction. We further accelerate vision algorithms by processing only sub-areas with significant motion vectors rather than entire frames. In addition, we design a distributed data store called V-BigTable to structuralize M-clips’ semantic information. V-BigTable enables efficient retrieval on a huge amount of M-clips. We implement the system on Hadoop and HBase. Experimental results show that our system outperforms basic solutions by one order of magnitude in computational time with satisfactory human retrieval accuracy.
















Similar content being viewed by others
References
Apache Hadoop. http://hadoop.apache.org
Apache HBase. http://hbase.apache.org
Araujo A, Chaves J, Angst R, Girod B (2015) Temporal aggregation for large-scale query-by-image video retrieval. In: Proceedings of IEEE ICIP. IEEE, pp 1519–1522
Babu RV, Ramakrishnan K (2007) Compressed domain video retrieval using object and global motion descriptors. Multimed Tools Appl 32(1):93–113
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics, pp 401–406
Candan KS, Kim JW, Nagarkar P, Nagendra M, Yu R (2011) Rankloud: scalable multimedia data processing in server clusters. IEEE Multimedia 18(1):64–77
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2): 4
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE CVPR, vol 1. IEEE, pp 886–893
De Bruyne S, Van Deursen D, De Cock J, De Neve W, Lambert P, Van de Walle R (2008) A compressed-domain approach for shot boundary detection on h. 264/avc bit streams. Signal Process Image Commun 23(7):473–489
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1):107–113
Deng J, Berg AC, Fei-Fei L (2011) Hierarchical semantic indexing for large scale image retrieval. In: Proceedings of IEEE CVPR. IEEE, pp 785–792
Derpanis KG, Sizintsev M, Cannons K, Wildes RP (2010) Efficient action spotting based on a space-time-oriented structure representation. In: Proceedings IEEE CVPR. IEEE, pp 1990–1997
Doersch C, Singh S, Gupta A, Sivic J, Efros A (2012) What makes paris look like paris? ACM Trans Graphics 31(4):101
Duan LY, Lin J, Chen J, Huang T, Gao W (2014) Compact descriptors for visual search. IEEE Multimedia 21(3):30–40
Efros A (2012) What makes big visual data hard? http://bigdata.csail.mit.edu/node/68. [Online]
Enzweiler M, Gavrila DM (2009) Monocular pedestrian detection: survey and experiments. IEEE Trans Pattern Anal Mach Intell 31(12):2179–2195
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fernandez-Beltran R, Pla F (2016) Latent topics-based relevance feedback for video retrieval. Pattern Recogn 51:72–84
Heikkinen A, Sarvanko J, Rautiainen M, Ylianttila M (2013) Distributed multimedia content analysis with mapreduce. In: 2013 IEEE 24th international symposium on personal indoor and mobile radio communications (PIMRC). IEEE, pp 3497–3501
Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst, Man, Cybern C 34(3):334–352
Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Pattern Anal Mach Intell 41 (6):797–819
Huang T (2014) Surveillance video: the biggest big data. IEEE Computer Society [Online] 7(2). http://www.computer.org/portal/web/computingnow/archive/february2014
International Data Corporation (2012) The Digital Universe in 2020: Big Data Bigger Digital Shadows, and Biggest Growth in the Far East. http://www.emc.com/leadership/digital-universe/iview/index.htm
Lai Yh, Yang C (2015) Video object retrieval by trajectory and appearance. IEEE Trans Circuits Systems Video Technology 25:1026–1037
Mei S, Guan G, Wang Z, Wan S, He M, Feng DD (2015) Video summarization via minimum sparse reconstruction. Pattern Recogn 48(2):522–533
Mullins J (2006) Ring of Steel II. http://spectrum.ieee.org/computing/hardware/ring-of-steel-ii
Over P, Awad G, Michel M, Fiscus J, Sanders G, Kraaij W, Smeaton AF, Quénot G (2014) Trecvid 2014- an overview of the goals, tasks, data, evaluation mechanisms and metrics
Ozer IB, Wolf W (2001) Human detection in compressed domain. In: Proceeding IEEE ICIP, vol 3. IEEE, pp 274–277
Riggs M (2013) Intense Smog Is Making Beijing’s Massive Surveillance Network Practically Useless. http://goo.gl/9mxG0J
Sadanand S, Corso JJ (2012) Action bank: a High-Level representation of activity in video. In: Proceeding IEEE CVPR. IEEE, pp 1234–1241
Sivic J, Everingham M, Zisserman A (2009) Who are you?”–Learning Person Specific Classifiers from Video. In: Proceeding IEEE CVPR. IEEE, pp 1145–1152
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceeding IEEE ICCV. IEEE, pp 1470–1477
Torralba A, Fergus R, Freeman W (2008) 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
White B, Yeh T, Lin J, Davis L (2010) Web-scale Computer Vision using MapReduce for Multimedia Data Mining. In: Proceeding ACM MDMKDD, p 9
Yang MH, Kriegman D, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24(1):34–58
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, pp 10–10
Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: OSDI, vol 8, p 7
Author information
Authors and Affiliations
Corresponding author
Additional information
Sihao Ding and Gang Li are co-primary authors.
Dr. Junda Zhu’s research was supported in part by The Macau Science and Technology Development Fund under Grant FDCT 023/2013/A1, and University of Macau Research Council under Multi Year Research Grant.
Rights and permissions
About this article
Cite this article
Ding, S., Li, G., Li, Y. et al. SurvSurf: human retrieval on large surveillance video data. Multimed Tools Appl 76, 6521–6549 (2017). https://doi.org/10.1007/s11042-016-3307-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3307-4