Skip to main content
Log in

Action recognition from point cloud patches using discrete orthogonal moments

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

3D sensors such as standoff Light Detection and Ranging (LIDAR) generate partial 3D point clouds that resemble patches of irregularly-shaped, coarse groups of points. 3D modeling of this type of data for human action recognition has been rarely studied. Although 2D–based depth image analysis is an option, its effectiveness on this type of low-resolution data hasn’t been well answered. This paper investigates a new multi-scale 3D shape descriptor, based on the discrete orthogonal Tchebichef Moments, for the characterization of 3D action pose shapes made of low-resolution point cloud patches. Our shape descriptor consists of low-order 3D Tchebichef moments computed with respect to a new point cloud voxelization scheme that normalizes translation, scale, and resolution. The action recognition is built on the Naïve Bayes classifier using temporal statistics of a ‘bag of pose shapes’. For performance evaluation, a synthetic LIDAR pose shape baseline was developed with 62 human subjects performing three actions ― digging, jogging, and throwing. Our action classification experiments demonstrated that the 3D Tchebichef moment representation of point clouds achieves excellent action and viewing direction predictions with superb consistency across a large range of scale and viewing angle variations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recogn Lett 48:70–80

    Article  Google Scholar 

  2. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: Proc. Int. Conf. database theory, pp 420–434

  3. Ballin G, Munaro M, Menegatti E (2012) Human action recognition from RGB-D frames based on real-time 3d optical flow estimation. Biologically Inspired Cognitive Architectures, Springer-Velag, pp 65–74

  4. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  5. Cheng H, Chung SM (2016) Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches. Pattern Recognition 52, Elsevier Science:397–406

    Article  Google Scholar 

  6. Chihara TS (1978) An introduction to orthogonal polynomials, Gordon and Breach

  7. Costantini L, Seidenari L, Serra G, Capodiferro L, Bimbo AD (2011) Space-time Zernike moments and pyramid kernel descriptors for action classification. In: Proc. Int. Conf. Image Anal. Processing

  8. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. Proc Eur Conf Comput Vis. Lect Notes Comput Sci 3952:428–441

  9. Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. IEEE Conf Comput Vis Pattern Recogn 2625–2634

  10. Efros AA, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. Proc Int Conf Comput Vis 2:726–733

    Article  Google Scholar 

  11. Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253

    Article  Google Scholar 

  12. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  13. Johnstone IM, Lu AY (2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc 104:682–693

    Article  MathSciNet  MATH  Google Scholar 

  14. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis. Pattern Recogn 1725–1732

  15. Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Proc. Eurographics Symp. Geometry Processing, pp 156–164

  16. Kläser A, Marszałek M, Schmid C (2008) A spatial-temporal descriptor based on 3D gradients. In: Proc. British Mach. Vis. Conf

  17. Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (NIPS 2012), pp 1097–1105

  18. Laptev I, Lindeberg T (2003) Space–time interest points. Proc Int Conf Comput Vis 2:432–439

    Article  MATH  Google Scholar 

  19. Lassoued I, Zagrouba E, Chahir Y (2011) An efficient approach for video action classification based on 3D Zernike moments. In: Proc. Int. Conf. Future Inf. Tech., Part II, pp 196–205

  20. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proc. IEEE. Conf. Comput. Vis. Pattern Recogn. Workshops, pp 9–14

  21. Lian Z, Godil A, Sun X (2010) Visual similarity based 3D shape retrieval using bag-of-features. Int Conf Shape Model Appl 25–36

  22. Lu Y, Li Y, Shen Y, Ding F, Wang X, Hu J, Ding S (2012) A human action recognition method based on Tchebichef moment invariants and temporal templates. In: Proc. Int. Conf. Intelligent Human-Machine Sys. and Cybernetics, vol. II, pp 76–79

  23. Mademlis A, Axenopoulos A, Daras P, Tzovaras D, Strintzis MG (2006) 3D content-based search based on 3D Krawtchouk moments. In: Proc. Int. Symp. 3D data processing, visualization, and transmission, pp 743–749

  24. Maturana D, Scherer S (2015) Voxnet: a 3D convolutional neural network for real-time object recognition. IEEE/RSJ Int Conf Intell Robots Sys 922–928

  25. McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Int Conf Mach Learning 591–598

  26. Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive Bayes — which naive Bayes? In: Proc. Conf. Email and anti-spam, pp 27–28

  27. Mukundan R, Ong SH, Lee PA (2001) Image analysis by Tchebichef moments. IEEE Trans Image Process 10(9):1357–1364

    Article  MathSciNet  MATH  Google Scholar 

  28. Ni B, Wang G, Moulin P (2011) RGBD-HuDaAct: a color-depth video database for human daily activity recognition. In: Proc. IEEE. Int. Conf. Comput. Vis. Workshops, pp 1147–1153

  29. Niebles J, Wang H, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  30. Novotni M, Klein R (2004) Shape retrieval using 3D Zernike descriptors. Comput Aided Des 36(11):1047–1062

    Article  Google Scholar 

  31. Ohbuchi R, Osada K, Furuya T, Banno T (2008) Salient local visual features for shape-based 3D model retrieval. IEEE Int Conf Shape Model Appl 93–102

  32. Ovsjanikov M, Bronstein AM, Bronstein MM, Guibas L (2009) Shape google: a computer vision approach to isometry invariant shape retrieval. In: Proc. workshop on non-rigid shape analysis and deformable image alignment (NORDIA’09)

  33. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  34. Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach Int Conf Pattern Recogn 32–36

  35. Sheng Y, Shen L (1994) Orthogonal Fourier-Mellin moments for invariant pattern recognition. J Opt Soc Am 11(6):1748–1757

    Article  Google Scholar 

  36. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (NIPS 2014), pp. 568–576

  37. Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. Proc Int Conf Comput Vis 2:1470–1477

    Article  Google Scholar 

  38. Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vis Image Underst 104:210–220

    Article  Google Scholar 

  39. Sun X, Cheng M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 58–65

  40. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc. IEEE Conf Comput Vis Pattern Recogn

  41. Tabia H, Daoudi M, Vandeborre J-P, Colot O (2011) Deformable shape retrieval using bag-of-feautre techniques. In: Proc. SPIE-IS&T Electronic Imaging, SPIE, vol 7864

  42. Teague MR (1980) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930

    Article  MathSciNet  Google Scholar 

  43. Teh CH, Chin RT (1988) On image analysis by the methods of moments. IEEE Trans Pattern Anal Mach Intell 10(4):496–513

    Article  MATH  Google Scholar 

  44. Vieira A, Nascimento E, Oliveira G, Liu Z, Campos M (2012) STOP: Space-time occupancy patterns for 3D action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision and Application. Lect Notes Comput Sci 7441:252–259

    Article  Google Scholar 

  45. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. IEEE Conf Comput Vis Pattern Recogn 3156–3164

  46. Wang Y, Mori G (2009) Human action recognition by Semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774

    Article  Google Scholar 

  47. Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. European Conf Comput Vis 872–885

  48. Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927

    Article  Google Scholar 

  49. Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation, and recognition. Comput Vis Image Underst 115(2):224–241

    Article  Google Scholar 

  50. Wolf C, Mille J, Lombardi E, Celiktutan O, Jiu MB, Dellandrea E, Bichot C, Garcia C, Sankur B (2012) The LIRIS human activities dataset and the ICPR 2012 human activities recognition and localization competition. Technical report RR-LIRIS-2012-004, LIRIS Laboratory

  51. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. IEEE Conf Comput Vis Pattern Recogn 1912–1920

  52. Xia L, Chen C.-C, and Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 20–27

  53. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. IEEE Conf Comput Vis Pattern Recogn 379–385

  54. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps based histograms of oriented gradients. ACM Int Conf Multimed 1057–1060

  55. Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure. IEEE Conf Comput Vis 4507–4515

  56. Ye M, Zhang Q, Wang L, Zhu J, Yang R, Gail J (2013) A survey on human motion analysis from depth data. Time-of-Flight and Depth Imaging, Sensors, Algorithms, and Applications. Lect Notes Comput Sci 8200:149–187

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Isiah Davenport, Max Grattan, and Jeanne Smith for their indispensable help in the creation of biofidelic pose shape baseline.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soon M. Chung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, H., Chung, S.M. Action recognition from point cloud patches using discrete orthogonal moments. Multimed Tools Appl 77, 8213–8236 (2018). https://doi.org/10.1007/s11042-017-4711-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4711-0

Keywords

Navigation