Skip to main content
Log in

Multimedia content analysis on gesture event detection for a SMART TV Keyboard application

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We have proposed an effective machine learning method to analyze multimedia content addressing gesture event detection and recognition. Our machine learning method is based on well-studied techniques such that Procrustes Analysis, Combination of Local and Global Representations, Linear Shape Model, and application to SMART TV Virtual Keyboard. In this paper, we address gesture event detection specially fingertip gesture detection to get smart and advanced usage of technology. Our modern vision keyboard could be a good next generation replacement of SMART TV remote control. It can be more economical as we don’t need physical object like traditional keyboard, remote control and their energy resources like batteries. More information and demonstrations of the proposed keyboard can be accessed at http://video.minelab.tw/MCAoGED/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Abdulameer MH, Sheikh ASNH, Othman ZA et al. (2014) A modified active appearance model based on an adaptive artificial bee colony. Sci World J

  2. Anderson TW, Gupta SD (1963) Some inequalities on characteristic roots of matrices. Biometrika 50:522–524

    Article  MathSciNet  MATH  Google Scholar 

  3. Andrea C (2001) Dynamic time warping for offline recognition of a small gesture vocabulary. In: Proceedings of the IEEE ICCV workshop on recognition, analysis, and tracking of faces and gestures in real-time systems, July–August, p 83

  4. Atchle WR, Edwin HB (1975) Multivariate statistical methods, among-groups covariation. Dowden, Hutchinson & Ross

    Google Scholar 

  5. Baggio DL (2012) Mastering OpenCV with practical computer vision projects. Packt Publishing Ltd

  6. Baker S, Matthews I (2001) Equivalence and efficiency of image alignment algorithms. Comput Vision Pattern Recognition, CVPR 1:I–1090, IEEE, 2001

    Google Scholar 

  7. Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12:149–198

    MathSciNet  MATH  Google Scholar 

  8. Beltrami E (1873) On bilinear functions. SVD and signal processing, pp 9–18

  9. Berge T, Jos MF (1977) Orthogonal Procrustes rotation for two or more matrices. Psychometrika 42(2):267–276

    Article  MathSciNet  MATH  Google Scholar 

  10. Berge T, Jos MF, Dirk LK (1984) Orthogonal rotations to maximal agreement for two or more matrices of different column orders. Psychometrika 49(1):49–55

    Article  Google Scholar 

  11. Brown T, Thomas RC (2000) Finger tracking for the digital desk. Proc First Australasian User Interface Conf 11–16

  12. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Kluwer, Boston, pp 1–43

    Google Scholar 

  13. Cambridge Hand Gesture Dataset. http://www.iis.ee.ic.ac.uk/icvl/ges_db.htm

  14. Cardoso JF (1999) High-order contrasts for independent component analysis. Neural Comput 11(1):157–192

    Article  Google Scholar 

  15. Cauchy AL. Sur l’équationa l’aide de laquelle on détermine les inégalités séculaires des mouvements des planetes. Exer de math 4(1)74–195

  16. Charniak E (1993) Statistical language learning. MIT Press, Cambridge

    Google Scholar 

  17. Chennubhotla C, Allan J (2001) Sparse PCA. extracting multi-scale structure from data. Computer vision, ICCV 2001. Proc Eighth IEEE Int Conf 1

  18. Christian VH, François B (2001) Bare-hand human computer interaction. Proc 2001 Workshop Percetive User Interfaces, Orlando, Florida, USA, 1–8

  19. Cliff N (1966) Orthogonal rotation to congruence. Psychometrika 31(1):33–42

    Article  MathSciNet  Google Scholar 

  20. Commandeur JJ (1991) Matching configurations. DSWO Press, Leiden University, pp 13–61

  21. Cootes TF, Gareth JE, Christopher JT et al. (1998) A comparative evaluation of active appearance model algorithms. BMVC 98:680–689

  22. Cootes TF, Kittipanya-ngam P (2002) Comparing variations on the active appearance model algorithm. In BMVC, pp 1–10, 2002

  23. Crowley JL, Berard F, Coutaz J et al. (1995) Finger tacking as an input device for augmented reality. Proc Int Workshop Automatic face Gesture Recognition, Zurich, Switzerland, 195–200

  24. Derpanis KG (2005) Mean shift clustering, Lecture notes. http://www.cse.yorku.ca/~kosta/CompVis_Notes/mean_shift.pdf

  25. Dijksterhuis GB, Gower JC (1992) The interpretation of generalized procrustes analysis and allied methods. Food Qual Prefer 3(2):67–87

    Article  Google Scholar 

  26. Edwards, GJ, Christopher JT, Timothy FC et al. (1998) Interpreting face images using active appearance models. automatic face and gesture recognition, proceedings. Third IEEE Int Conf IEEE

  27. Everson R (1998) Orthogonal, but not orthonormal, procrustes problems. Adv Comput Math

  28. Fisher RA, Winifred AM (1923) CP32 studies in crop variation, II: the manurialresponse of different potato varieties. J Agric Sci Camb 13:311–320

    Article  Google Scholar 

  29. Forbes K, Eugene F (2005) An efficient search algorithm for motion data using weighted PCA. Proceedings of the 2005 ACM SIGGRAPH. ACM, 2005

  30. Francois R, Medioni G (1999) Adaptive color background modeling for real-time segmentation of video streams. In: International conference on imaging science, systems, and technology, Las Vegas, pp 227–232

  31. Gavrila DM, Davis LS (1995) Towards 3-d model-based tracking and recognition of human movement: multi-view approach. IEEE Int Workshop automatic face- and gesture recognition. IEEE Computer Society, Zurich, 272–277

  32. Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338

    Article  MathSciNet  MATH  Google Scholar 

  33. Gower JC (1975) Generalized procrustes analysis. Psychometrika 40(1):33–51

    Article  MathSciNet  MATH  Google Scholar 

  34. Gower J (1995) Orthogonal and projection procrustes analysis

  35. Gower JC, Dijksterhuis GB (2004) Procrustes problems. Oxford University Press, Oxford

    Book  MATH  Google Scholar 

  36. Green B (1952) The orthogonal approximation of an oblique structure in factor analysis. Psychometrika 17(4):429–440

    Article  MathSciNet  MATH  Google Scholar 

  37. Green BF, Gower JC (1979) A problem with congruence. Annual meeting of the psychometric society, Monterey, California

  38. Gross R, Matthews I, Baker S (2005) Generic vs. person specific active appearance models. Image Vis Comput 23(11):1080–1093

    Article  Google Scholar 

  39. Gruen AW, Akca MD (2003) Generalized procrustes analysis and its applications in photogrammetry

  40. Holzmann GJ (1925) Finite state machine: Ebook. http://www.spinroot.com/spin/Doc/Book91_PDF/F1.pdf

  41. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441

    Article  MATH  Google Scholar 

  42. Hou XW, Li SZ, Zhang H, Cheng Q (2001) Direct appearance models. Computer Vision and Pattern Recognition, 2001 CVPR 1:I–828, IEEE, 2001

    Google Scholar 

  43. Hubert M, Sanne E (2004) Robust PCA and classification in biosciences. Bioinformatics 20(11):1728–1736

    Article  Google Scholar 

  44. Hurley JR, Cattell RB (1962) Producing direct rotation to test a hypothesized factor structure. Behav Sci 7(2):258–262

    Article  Google Scholar 

  45. Igual L, Perez-Sala X, Escalera S, Angulo C, Dela TF (2014) Continuous generalized procrustes analysis. Pattern Recogn 47(2):659–671

    Article  MATH  Google Scholar 

  46. Jeffers JNR (1967) Two case studies in the application of principal component analysis. Appl Stat 225–236

  47. Jolliffe L (2002) Principal component analysis. Wiley, New York

    MATH  Google Scholar 

  48. Jordan C (1874) Mémoire sur les formes bilinéaires. J Math Pures Appl 19:35–54

    MATH  Google Scholar 

  49. Karhunen J, Jyrki J (1994) Representation and separation of signals using nonlinear PCA type learning. Neural Netw 7(1):113–127

    Article  Google Scholar 

  50. Keaton T, Dominguez SM, Sayed AH et al. (2002) SNAP&TELL: a multi-modal wearable computer interface for browsing the environment. Proc Sixth Int Symposium Wearable Comput, 2002. (ISWC 2002), 75–82

  51. Kiers HAL, ten Berge JMF (1992) Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika 57(3):371–382

    Article  MathSciNet  MATH  Google Scholar 

  52. Kristof W, Wingersky B (1971) A generalization of the orthogonal Procrustes rotation procedure to more than two matrices. Proc Ann Convention Am Psychol Assoc. American Psychological association, 1971

  53. Lee HK, Kim JH (1999) An HMM-based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21:961–973

    Article  Google Scholar 

  54. Li F, Wechsler H (2005) Open set face recognition using transduction. IEEE Trans Pattern Anal Mach Intell 27:1686–1697

    Article  Google Scholar 

  55. Lingoes JC, Ingwer B (1978) A direct approach to individual differences scaling using increasingly complex transformations. Psychometrika 43(4):491–519

    Article  MathSciNet  MATH  Google Scholar 

  56. Lu W-L, Little JJ (2006) Simultaneous tracking and action recognition using the pca-hog descriptor. In: The 3rd Canadian conference on computer and robot vision, 2006. Quebec, pp 6–13

  57. Lu H, Plataniotis KN, Venetsanopoulos AN (2006) MPCA: multilinear principal component analysis of tensor objects. Neural Netw IEEE Trans 19(1):18–39

    Google Scholar 

  58. Marcell S. Hand posture and gesture dataset. http://www.idiap.ch/resource/gestures/

  59. Mika S, Schölkopf B, Smola AJ, Müller KR, Scholz M, Rätsch G. (1998) Kernel PCA and de-noising in feature spaces. In NIPS, vol 4(5)

  60. Mosier CI (1939) Determining a simple sturcture when loadings for certain tests are known. Psychometrika 4:149–162

    Article  MATH  Google Scholar 

  61. Oka K, Sato Y, Koike H (2002) Real-time gesture event detection tracking and gesture recognition. Comput Graph Appl IEEE 22:64–71

    Article  Google Scholar 

  62. Papandreou G, Maragos P (2008) Adaptive and constrained algorithms for inverse compositional active appearance model fitting. Comput Vision Patt Recognition CVPR 1–8

  63. Pearson K (1901) Principal components analysis. London, Edinb, Dublin Philos Mag J Sci 6(2):572–575

    Google Scholar 

  64. Peay ER (1988) Multidimensional rotation and scaling of configurations to optimal agreement. Psychometrika 53(2):199–208

    Article  MathSciNet  MATH  Google Scholar 

  65. Preisendorfer RW (1988) In: Mobley CD (ed) Principal component analysis in meteorology and oceanography, vol 425. Elsevier, Amsterdam

    Google Scholar 

  66. Quach KG, Duong CN, Luu K et al. (2012) Gabor wavelet-based appearance models. In: Computing and communication technologies, research, innovation, and vision for the future (RIVF), 1–6

  67. Quek FKH, Mysliwiec T, Zhao M et al. (1995) Finger mouse: a freehand pointing computer interface. Proc Int Workshop Automatic Face Gesture Recognition, Zurich, Switzerland, 372–377

  68. Ramage D (2007) Hidden Markov models fundamentals, Lecture notes. http://cs229.stanford.edu/section/cs229-hmm.pdf

  69. Rao CR (1964) The use and interpretation of principal component analysis in applied research. Sankhyā: Indian J Stat Ser A 26:329–358

    MathSciNet  MATH  Google Scholar 

  70. Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43:1–54

    Article  Google Scholar 

  71. Ren Y, Zhang F (2009) Hand gesture recognition based on meb-svm. In: Second international conference on embedded software and systems, IEEE computer society, Los Alamitos, pp 344–349

  72. Ross A Procrustes analysis, Technical report, Department of computer science and engineering, University of South Carolina, SC 29208

  73. Sato Y, Kobayashi Y, Koike H et al. (2000) Fast tracking of hands and gesture event detection in infrared images for augmented desk interface. Proc Fourth IEEE Int Conf Automatic Face Gesture Recognition, 462–467, 28–30

  74. Schönemann PH (1966) A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1):1–10

    Article  MathSciNet  MATH  Google Scholar 

  75. Schönemann PH, Robert MC (1970) Fitting one matrix to another under choice of a central dilation and a rigid motion. Psychometrika 35(2):245–255

    Article  Google Scholar 

  76. Senin P (2008) Dynamic time warping algorithm review, technical report. http://csdl.ics.hawaii.edu/techreports/08-04/08-04.pdf

  77. Sigal L, Sclaroff S, Athitsos V et al. (2004) Skin color-based video segmentation under time-varying illumination. IEEE Trans Pattern Anal Mach Intell 862–877

  78. Song G, Ai H, Xu GY et al. (2003) Hierarchical direct appearance model for elastic labeled graph localization. Third Int Symposium Multispectral Image Process Pattern Recognition 139–144

  79. Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566

    Article  MathSciNet  MATH  Google Scholar 

  80. Thirumuruganathan S (2010) A detailed introduction to K-nearest neighbor (KNN) algorithm. http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/

  81. Tomita A, Ishii JR (1994) Hand shape extraction from a sequence of digitized gray-scale images”, 20th Int. Conf. Industrial Electronics, Control and Instrumentation. IECON ’94 3:1925–1930

    Google Scholar 

  82. Vidal R, Ma Y (2005) Generalized principal component analysis. IEEE Trans Pattern Anal Mach Intell 27:1945–1960

    Article  Google Scholar 

  83. Wang RY, Popovi J (2009) Real-time hand-tracking with a color glove. ACM SIGGRAPH 2009 papers, 1–8

  84. Wöhler C, Anlauf JK (1999) An adaptable time-delay neural-network algorithm for image sequence analysis. IEEE Trans Neural Netw 10:1531–1536

    Article  Google Scholar 

  85. Wu Y, Ma B, Yang M, Zhang J, Jia Y (2014) Metric learning based structural appearance model for robust visual tracking. Circuits Syst Video Technol IEEE Trans 24(5):865–877

    Article  Google Scholar 

  86. Wu Y, Shan Y, Zhangy Z et al. (2000) VISUAL PANEL: from an ordinary paper to a wireless and mobile input device. Technical report, MSR-TR-2000 Microsoft Research Corporation, http://www.research.microsoft.com, October 2000

  87. Yan Y, Liu G, Ricci E et al. (2013) Multi-task linear discriminant analysis for multi-view action recognition. Image Process (ICIP), 20th IEEE Int Conf 2842–2846

  88. Yan Y, Ricci E, Subramanian R et al. (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Comput Vision (ICCV), IEEE Int Conf 1177–1184

  89. Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLocal tells you more: coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enkhtogtokh Togootogtokh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Togootogtokh, E., Shih, T.K. Multimedia content analysis on gesture event detection for a SMART TV Keyboard application. Multimed Tools Appl 76, 7341–7363 (2017). https://doi.org/10.1007/s11042-016-3385-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3385-3

Keywords

Navigation