Abstract
Dealing with compressed video streams in mobile robotics is an unavoidable fact of life. Transferring images between mobile robots or to the Cloud using wireless links can practically only be achieved using lossy video compression. This introduces artifacts that often make image processing challenging. Recent algorithms based on deep neural networks, as advanced as they are, are commonly trained and evaluated on datasets of high-fidelity images which are typically not captured from aerial views. In this work we evaluate a number of deep neural network based object detection algorithms in the context of aerial search and rescue scenarios where real-time and robust detection of human bodies is a priority. We provide an evaluation using a number of video sequences collected in-flight using Unmanned Aerial Vehicle (UAV) platforms in different environmental conditions. We also describe the detection performance degradation under limited bitrate compression using H.264, H.265 and VP9 video codecs, in addition to analyzing the timing effects of moving image processing tasks to off-board entities.
Supported by the ELLIIT network organization for Information and Communication Technology, the Swedish Foundation for Strategic Research (SymbiKBot Project), and the Wallenberg AI, Autonomous Systems and Software Program (WASP) - Research Arena Public Safety (WARA-PS).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
TensorFlow detection model zoo: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md (2019).
- 2.
FFmpeg multimedia framework: https://www.ffmpeg.org.
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. software available from tensorflow.org, 2019
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 379–387. Curran Associates Inc., USA (2016). http://dl.acm.org/citation.cfm?id=3157096.3157139
Dodge, S., Karam, L.: Understanding how image quality affects deep neural networks. In: 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6, June 2016
Guo, L., Cock, J.D., Aaron, A.: Compression performance comparison of x264, x265, libvpx and aomenc for on-demand adaptive streaming applications. In: 2018 Picture Coding Symposium, PCS 2018, San Francisco, CA, USA, 24–27 June 2018, pp. 26–30 (2018). https://doi.org/10.1109/PCS.2018.8456302
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. CoRR abs/1611.10012 (2016). http://arxiv.org/abs/1611.10012
Lee, J., Wang, J., Crandall, D., Šabanović, S., Fox, G.: Real-time, cloud-based object detection for unmanned aerial vehicles. In: 2017 First IEEE International Conference on Robotic Computing (IRC), pp. 36–43, April 2017
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ohm, J.R., Sullivan, G., Schwarz, H., Tan, T., Wiegand, T.: Comparison of the coding efficiency of video coding standards including high efficiency video coding (HEVC). IEEE Trans. Circ. Syst. Video Technol. 22, 1669–1684 (2012). https://doi.org/10.1109/TCSVT.2012.2221192
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. CoRR abs/1506.02640 (2015). http://arxiv.org/abs/1506.02640
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Saha, O., Dasgupta, P.: A comprehensive survey of recent trends in cloud robotics architectures and applications. Robotics 7(3) (2018). https://doi.org/10.3390/robotics7030047
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-ResNet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Rudol, P., Doherty, P. (2019). Evaluation of Human Body Detection Using Deep Neural Networks with Highly Compressed Videos for UAV Search and Rescue Missions. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-29894-4_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29893-7
Online ISBN: 978-3-030-29894-4
eBook Packages: Computer ScienceComputer Science (R0)