Robot Vision System for Real-Time Human Detection and Action Recognition

Hoshino, Satoshi; Niimura, Kyohei

doi:10.1007/978-3-030-01370-7_40

Satoshi Hoshino¹⁸ &
Kyohei Niimura¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 867))

Included in the following conference series:

International Conference on Intelligent Autonomous Systems

1408 Accesses
4 Citations

Abstract

Mobile robots equipped with camera sensors are required to perceive surrounding humans and their actions for safe autonomous navigation. These are so-called human detection and action recognition. In this paper, moving humans are target objects. Compared to computer vision, the real-time performance of robot vision is more important. For this challenge, we propose a robot vision system. In this system, images described by the optical flow are used as an input. For the classification of humans and actions in the input images, we use Convolutional Neural Network, CNN, rather than coding invariant features. Moreover, we present a novel detector, local search window, for clipping partial images around target objects. Through the experiment, finally, we show that the robot vision system is able to detect the moving human and recognize the action in real time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the right image was only used in this experiment. In future works, we will use the 3D information obtained from both the images.
2.
The optical flow was used as an input to the CNN classifier.
3.
The mean shift clustering [18] was used for integrating the windows.

References

Ojala, T., et al.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: International Conference on Pattern Recognition, vol. 1, pp. 582–585 (1994)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Csurka, G., et al.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, pp. 59–74 (2004)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Google Scholar
Dollar, P., et al.: Behavior recognition via sparse spatio-temporal features. In: International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
van de Sande, K.E.A., et al.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision, pp. 1879–1886 (2011)
Google Scholar
Uijlings, J.R.R., et al.: Selective search for object recognition. In: International Journal of Computer Vision, vol. 104, pp. 154–171 (2013)
Article Google Scholar
LeCun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1137–1149 (2016)
Article Google Scholar
Farneb\({\rm \ddot{a}}\)ck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis, vol. 2749, pp. 363–370 (2003)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2749, pp. 1–8 (2008)
Google Scholar
Jain, M., et al.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (2015)
Google Scholar
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
LeCun, Y., et al.: Deep learning. Nature 521(7553), 436–444 (2015)
Article MathSciNet Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article Google Scholar
Goudail, F., et al.: Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images. J. Opt. Soc. Am. A 21(7), 1231–1240 (2004)
Article MathSciNet Google Scholar
Comaniciu, D., et al.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
Oliveira, L., et al.: On exploration of classifier ensemble synergism in pedestrian detection. IEEE Trans. Intell. Transp. Syst. 11(1), 16–27 (2010)
Article Google Scholar
Wang, H., Schmid, C.: LEAR-INRIA submission for the THUMOS workshop. In: ICCV Workshop on Action Recognition with a Large Number of Classes, vol. 2, no. 7 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical and Intelligent Engineering, Utsunomiya University, 7-1-2 Yoto, Utsunomiya, Tochigi, 321-8585, Japan
Satoshi Hoshino & Kyohei Niimura

Authors

Satoshi Hoshino
View author publications
You can also search for this author in PubMed Google Scholar
Kyohei Niimura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satoshi Hoshino .

Editor information

Editors and Affiliations

Baden-Wuerttemberg Cooperative State University, Karlsruhe, Germany
Marcus Strand
Humanoids and Intelligence Systems Lab, KIT - Karlsruher Institut für Technologie, Karlsruhe, Germany
Rüdiger Dillmann
University of Padua , Padua, Italy
Emanuele Menegatti
University of Padua, Padua, Italy
Stefano Ghidoni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoshino, S., Niimura, K. (2019). Robot Vision System for Real-Time Human Detection and Action Recognition. In: Strand, M., Dillmann, R., Menegatti, E., Ghidoni, S. (eds) Intelligent Autonomous Systems 15. IAS 2018. Advances in Intelligent Systems and Computing, vol 867. Springer, Cham. https://doi.org/10.1007/978-3-030-01370-7_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-01370-7_40
Published: 31 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01369-1
Online ISBN: 978-3-030-01370-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics