Skip to main content
Log in

Probabilistic Hierarchical Model Using First Person Vision for Scenario Recognition

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Smart homes are becoming a growing need to prepare for a comfortable life style for the elderly and make things easy for the caretakers of the future. One important component of these systems is to identify the human activities and scenarios. As the wireless technologies are becoming advanced, they are being used to provide a low-cost, non-intrusive and privacy-conscious solution to activity recognition. However, in more complicated environments, we need to identify scenarios with subtle cues e.g. eye gaze. These situations call for a complementary vision-based solution, and we present a robust scenario recognition system by following the objects seen in eye gaze trajectories. In this paper, we present a probabilistic hierarchical model for scenario recognition using the environmental elements like objects in the scene. We utilize the fact that any scenario can be divided into constituent tasks and activities recursively to the level of atomic actions and objects. Considering bottom-up, the scenario recognition problem can be hierarchically solved by identifying the objects seen and combining them together to form coarse-grained higher level activities. This is a novel contribution to be able to recognize complete scenarios only on the basis of objects seen. We performed experiments on standard datasets of Georgia Tech Egocentric Activities (GTEA-Gaze) and unconstrained videos collected “in the Wild”; and trained an Artificial Neural Network to get a precision of 73.84% and accuracy of 92.27%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.who.int/mediacentre/factsheets/fs404/en/.

  2. http://www.pointgrab.com/.

  3. https://en.wikipedia.org/wiki/C4.5_algorithm.

  4. https://en.wikipedia.org/wiki/Fixationvisual.

  5. https://en.wikipedia.org/wiki/Saccade.

  6. https://en.wikipedia.org/wiki/Backpropagation.

  7. https://ai.stanford.edu/~alireza/GTEA_Gaze_Website/GTEA_Gaze.html.

  8. https://www.tobii.com/.

References

  1. Wang, Y., Jiang, X., Cao, R., & Wang, X. (2015). Robust indoor human activity recognition using wireless signals. Sensor, 15(7), 17195–17208.

    Article  Google Scholar 

  2. Al-qaness, M. A. A., & Li, F. (2016). WiGer: WiFi-based gesture recognition system. ISPRS International Journal of Geo-Information, 5(6), 92.

    Article  Google Scholar 

  3. Kim, K., Kim, J., Choi, J., Kim, J., & Lee, S. (2015). Depth camera-based 3D hand gesture controls with immersive tactile feedback for natural Mid-Air gesture interactions. Sensors, 15(1), 1022–1046.

    Article  Google Scholar 

  4. Gupta, S., Morris, D., Patel, S. N., & Tan, D. (2012). SoundWave: Using the Doppler effect to sense gestures. In Session: Sensory interaction modalities (pp. 1911–1914).

  5. Santos, A. (2012). Pioneers latest Raku Navi GPS units take commands from hand gestures. https://www.engadget.com/2012/10/07/pioneer-raku-navi-gps-hand-gesture-controlled/. Accessed 25 Apr 2018.

  6. Palacios, J. M., Sags, C., Montijano, E., & Llorente, S. (2013). Human-computer interaction based on hand gestures using RDB-D sensors. Sensors, 13(9), 11842–11860.

    Article  Google Scholar 

  7. Wu, J., Pan, G., Zhang, D., Qi, G., & Li, S. (2009). Gesture recognition with a 3-D accelerometer (pp. 25–38). Berlin: Springer.

    Google Scholar 

  8. Wang, S., & Zhou, G. (2015). A review on radio based activity recognition. Digital Communications and Networks, 1(1), 20–29.

    Article  Google Scholar 

  9. Qi, X., Zhou, G., Li, Y., & Peng, G. (2012). RadioSense: Exploiting wireless communication patterns for body sensor network activity recognition. In Proceedings of the 2012 IEEE 33rd Real-Time Systems Symposium (pp. 95–104).

  10. Qifan, P., Gupta, S., Gollakota, S., & Patel, S. (2013), Whole-home gesture recognition using wireless signals. In Proceedings of the 19th annual international conference on mobile computing and networking, ACM, New York, NY, USA, MobiCom 2013 (pp. 27–38).

  11. Wang, J., Vasisht, D., & Katabi, D. (2014). RF-IDraw: Virtual touch screen in the air using RF signals. SIGCOMM Computer Communication Review, 44(4), 235–246.

    Article  Google Scholar 

  12. Haque, S. A., Rahman, M., & Aziz, S. M. (2015). Sensor anomaly detection in wireless sensor networks for healthcare. Sensors, 15(4), 8764–8786.

    Article  Google Scholar 

  13. Lu, C. H., & Fu, L. C. (2009). Robust location-aware activity recognition using wireless sensor network in an attentive home. IEEE Transactions on Automation Science and Engineering, 6(4), 598–609.

    Article  Google Scholar 

  14. Xi, W., Huang, D., Zhao, K., Yan, Y., Cai, Y., Ma, R., & Chen, D. (2015). Device-free human activity recognition using CSI. In Proceedings of the 1st workshop on context sensing and activity recognition, ACM, New York, NY, USA, CSAR ’15 (pp. 31–36).

  15. Wu, C. X., Su, H., & Yu, K. (2016). A wireless signal denoising model for human activity recognition. In 2016 international conference on artificial intelligence and computer science (AICS 2016).

  16. Sun, Q., Shen, J., Qiao, H., Huang, X., Chen, C., & Hu, F. (2017). Static human detection and scenario recognition via wearable thermal sensing system. Computers, 6(1), 3.

    Article  Google Scholar 

  17. Bourobou, S. T. M., & Yoo, Y. (2015). User activity recognition in smart homes using pattern clustering applied to temporal ANN algorithm. Sensors, 15, 11953–11971.

    Article  Google Scholar 

  18. Yao, M. (2017). Understanding the limits of deep learning. https://venturebeat.com/2017/04/02/understanding-the-limits-of-deep-learning/. Accessed 25 Apr 2018.

  19. Collobert, R., & Bengio, S. (2004). Links between Perceptrons, MLPs and SVMs. In Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, USA, ICML (pp. 23–30). https://doi.org/10.1145/1015330.1015415.

  20. Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large-scale kernel machines. Cambridge: MIT Press.

    Google Scholar 

  21. Kadous, W. (2017). What are the advantages and disadvantages of deep learning? Can you compare it with the statistical learning theory? Promoted by NYC Data Science Academy. https://www.quora.com/. Accessed on July 2017.

  22. Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review of Psychology, 50, 243–271.

    Article  Google Scholar 

  23. Rayner, K., Smith, T. J., Malcolm, G. L., & Henderson, J. M. (2009). Eye movements and visual encoding during scene perception. Psychological Science, 20(1), 6–10.

    Article  Google Scholar 

  24. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 487–495). Red Hook: Curran Associates Inc.

    Google Scholar 

  25. Ekure, I. N., Wang, S., & Zhou, G. (2014). A theoretical analysis of path loss based activity recognition. In Proceedings of the 2014 IEEE 11th international conference on mobile ad hoc and sensor systems (pp. 277–281).

  26. Nechayev, Y. I., Hall, P. S., Constantinou, C. C., Hao, Y., Alomainy, A., Dubrovka, R., & Parini, C. G. (2005). On-body path gain variations with changing body posture and antenna position. In Proceedings of the 2005 IEEE antennas and propagation society international symposium (Vol. 1B, pp. 731–734).

  27. Kanade, T., & Hebert, M. (2012). First-person vision. Proceedings of the IEEE, 100(8), 2442–2453.

    Article  Google Scholar 

  28. Lowe, D. G. (2004). Distinctive image features from scale-invariant Keypoints. International Journal of Computer Vision, 60, 91–110.

    Article  Google Scholar 

  29. Yang, J., & Leskovec, J. (2012). Community-affiliation graph model for overlapping network community detection. In Proceedings of the 2012 IEEE 12th international conference on data mining (pp. 1170–1175). https://doi.org/10.1109/ICDM.2012.139.

  30. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, International Weekly Journal of Science, 323, 533–536. https://doi.org/10.1038/323533a0.

    MATH  Google Scholar 

  31. Tenorth, M., Kunze, L., Jain, D. & Beet, M. (2010). KNOWROB-MAP—Knowledge-linked semantic object maps. In Proceedings of the 2010 10th IEEE-RAS international conference on humanoid robots, Humanoids 2010. https://doi.org/10.1109/ICHR.2010.5686350.

  32. Gupta, R., & Kochenderfer, M. J. (2004). Common sense data acquisition for indoor mobile robots. In Proceedings of the nineteenth national conference on artificial intelligence (AAAI-04) (pp. 605–610).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaheena Noor.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Noor, S., Uddin, V. & Jabeen, De. Probabilistic Hierarchical Model Using First Person Vision for Scenario Recognition. Wireless Pers Commun 106, 2179–2193 (2019). https://doi.org/10.1007/s11277-018-5933-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-018-5933-9

Keywords

Navigation