Skip to main content
Log in

Real-Time Embedded Motion Detection via Neural Response Mixture Modeling

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) have shown significant promise in different fields including computer vision. Although previous research have demonstrated the ability of DNNs, utilizing these types of networks for real-time applications on embedded systems is not possible without requiring specialized hardware such as GPUs. In this paper, we propose a new approach to real-time motion detection in videos that leverages the power of DNNs while maintaining low computational complexity needed for real-time performance on existing embedded platforms without specialized hardware. The rich deep features extracted from the neural responses of an efficient, stochastically-formed deep neural network (StochasticNet) are utilized for constructing Gaussian mixture models (GMM) to detect motion in a scene. The proposed Ne ural R esponse M ixture (NeRM) model was embedded on an Axis surveillance camera, and results demonstrate that the proposed NeRM approach can improve the GMM performance (i.e., with less false detection and noise) in modeling the foreground and background compared to other state-of-the-art approaches applied for motion detection on embedded systems while operating at real-time performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

Notes

  1. http://www.axis.com/ca/en/products/axis-q7436.

References

  1. Barnich, O., & Van Droogenbroeck, M. (2011). Vibe: a universal background subtraction algorithm for video sequences. Transactions on Image Processing, 20(6).

  2. Bouwmans, T., El Baf, F., & Vachon, B. (2008). Background modeling using mixture of gaussians for foreground detection-a survey. Recent Patents on Computer Science, 1(3).

  3. Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., & Chen, Y. (2015). Compressing neural networks with the hashing trick. In ICML (pp. 2285–2294).

  4. Chen, Y., Chen, C., Huang, C., & Hung, Y. (2007). Efficient hierarchical method for background subtraction. Pattern Recognition, 40(10).

  5. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). ImageNet: a large-scale hierarchical image database. In IEEE computer vision and pattern recognition (CVPR).

  6. Elgammal, A., Harwood, D., & Davis, L. (2000). Non-parametric model for background subtraction. In European conference on computer vision (ECCV). Springer.

  7. Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D. (2014). Scalable object detection using deep neural networks. In Conference on computer vision and pattern recognition (CVPR).

  8. Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1915– 1929.

    Article  Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Conference on computer vision and pattern recognition (CVPR). IEEE.

  10. Goyette, N., Jodoin, P., Porikli, F., Konrad, J., & Ishwar, P. (2012). Changedetection. net: a new change detection benchmark dataset. In Computer vision and pattern recognition workshops (CVPRW). IEEE.

  11. Gupta, S., Girshick, R., Arbeláez, P., & Malik, J. (2014). Learning rich features from rgb-d images for object detection and segmentation. In European conference on computer vision (ECCV). Springer.

  12. Han, S., Mao, H., & Dally, W. (2015). Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149.

  13. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., & et al. (2012). Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  14. Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. In British machine vision conference (BMVC).

  15. Jenifa, R., Akila, C., & Kavitha, V. (2012). Rapid background subtraction from video sequences. In International conference on computing, electronics and electrical technologies (ICCEET). IEEE.

  16. Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: fully convolutional localization networks for dense captioning. In Conference on computer vision and pattern recognition (CVPR).

  17. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS).

  18. Mahendran, A., & Vedaldi, A. (2016). Visualizing deep convolutional neural networks using natural pre-images. In International journal of computer vision (IJCV).

  19. Mason, M., & Duric, Z. (2001). Using histograms to detect and track objects in color video. In Applied imagery pattern recognition workshop. IEEE.

  20. Matsuyama, T., Ohya, T., & Habe, H. (1999). Background subtraction for non-stationary scenes. Department of Electronics and Communications, Graduate School of Engineering, Kyoto University, Sakyo, Kyoto, Japan.

  21. Mikolov, T., Deoras, A., Povey, D., Burget, L., & Ċernockỳ, J. (2011). Strategies for training large scale neural network language models. In IEEE Workshop on automatic speech recognition and understanding (ASRU) (pp. 196–201). IEEE.

  22. Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition.

  23. Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 24(7).

  24. Oliver, N., Rosario, B., & Pentland, A. (2000). A bayesian computer vision system for modeling human interactions. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8).

  25. Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-shelf: an astounding baseline for recognition. In Conference on computer vision and pattern recognition workshops (CVPR). IEEE.

  26. Sainath, T., Mohamed, A., Kingsbury, B., & Ramabhadran, B. (2013). Deep convolutional neural networks for lvcsr. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 8614–8618). IEEE.

  27. Shafiee, M.J., Mishra, A., & Wong, A. (2016). Deep learning with Darwin: evolutionary synthesis of deep neural networks. arXiv:1606.04393.

  28. Shafiee, M.J., Siva, P., Fieguth, P., & Wong, A. (2016). Embedded motion detection via neural response mixture background modeling. In Computer vision and pattern recognition workshop. IEEE.

  29. Shafiee, M.J., Siva, P., & Wong, A. (2016). Stochasticnet: forming deep neural networks via stochastic connectivity. IEEE Access (99).

  30. Shafiee, M.J., & Wong, A. (2016). Evolutionary synthesis of deep neural networks via synaptic cluster-driven genetic encoding. arXiv:1609.01360.

  31. Siva, P., Shafiee, M.J., Li, F., & Wong, A. (2015). Pirm: fast background subtraction under sudden, local illumination changes via probabilistic illumination range modelling. In International conference on image processing (ICIP). IEEE.

  32. Stauffer, C., & Grimson, W. (1999). Adaptive background mixture models for real-time tracking. In Computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2). IEEE.

  33. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Conference on computer vision and pattern recognition (CVPR) (pp. 1–9).

  34. Tompson, J., Jain, A., LeCun, Y., & Bregler, C. (2014). Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in neural information processing systems (pp. 1799–1807).

  35. Vedaldi, A., & Lenc, K. Matconvnet – convolutional neural networks for matlab.

  36. Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in deep neural networks. In Advances in neural information processing systems (pp. 2074–2082).

  37. Wren, C., Azarbayejani, A., Darrell, T., & Pentland, A. (1997). Pfinder: real-time tracking of the human body. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 19(7).

  38. Yann, L., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  39. Yu, G., Sapiro, G., & Mallat, S. (2012). Solving inverse problems with piecewise linear estimators: from gaussian mixture models to structured sparsity. IEEE Transactions on Image Processing.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Javad Shafiee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shafiee, M.J., Siva, P., Fieguth, P. et al. Real-Time Embedded Motion Detection via Neural Response Mixture Modeling. J Sign Process Syst 90, 931–946 (2018). https://doi.org/10.1007/s11265-017-1265-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-017-1265-3

Keywords

Navigation