Abstract
We have proposed an effective machine learning method to analyze multimedia content addressing gesture event detection and recognition. Our machine learning method is based on well-studied techniques such that Procrustes Analysis, Combination of Local and Global Representations, Linear Shape Model, and application to SMART TV Virtual Keyboard. In this paper, we address gesture event detection specially fingertip gesture detection to get smart and advanced usage of technology. Our modern vision keyboard could be a good next generation replacement of SMART TV remote control. It can be more economical as we don’t need physical object like traditional keyboard, remote control and their energy resources like batteries. More information and demonstrations of the proposed keyboard can be accessed at http://video.minelab.tw/MCAoGED/.
Similar content being viewed by others
References
Abdulameer MH, Sheikh ASNH, Othman ZA et al. (2014) A modified active appearance model based on an adaptive artificial bee colony. Sci World J
Anderson TW, Gupta SD (1963) Some inequalities on characteristic roots of matrices. Biometrika 50:522–524
Andrea C (2001) Dynamic time warping for offline recognition of a small gesture vocabulary. In: Proceedings of the IEEE ICCV workshop on recognition, analysis, and tracking of faces and gestures in real-time systems, July–August, p 83
Atchle WR, Edwin HB (1975) Multivariate statistical methods, among-groups covariation. Dowden, Hutchinson & Ross
Baggio DL (2012) Mastering OpenCV with practical computer vision projects. Packt Publishing Ltd
Baker S, Matthews I (2001) Equivalence and efficiency of image alignment algorithms. Comput Vision Pattern Recognition, CVPR 1:I–1090, IEEE, 2001
Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12:149–198
Beltrami E (1873) On bilinear functions. SVD and signal processing, pp 9–18
Berge T, Jos MF (1977) Orthogonal Procrustes rotation for two or more matrices. Psychometrika 42(2):267–276
Berge T, Jos MF, Dirk LK (1984) Orthogonal rotations to maximal agreement for two or more matrices of different column orders. Psychometrika 49(1):49–55
Brown T, Thomas RC (2000) Finger tracking for the digital desk. Proc First Australasian User Interface Conf 11–16
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Kluwer, Boston, pp 1–43
Cambridge Hand Gesture Dataset. http://www.iis.ee.ic.ac.uk/icvl/ges_db.htm
Cardoso JF (1999) High-order contrasts for independent component analysis. Neural Comput 11(1):157–192
Cauchy AL. Sur l’équationa l’aide de laquelle on détermine les inégalités séculaires des mouvements des planetes. Exer de math 4(1)74–195
Charniak E (1993) Statistical language learning. MIT Press, Cambridge
Chennubhotla C, Allan J (2001) Sparse PCA. extracting multi-scale structure from data. Computer vision, ICCV 2001. Proc Eighth IEEE Int Conf 1
Christian VH, François B (2001) Bare-hand human computer interaction. Proc 2001 Workshop Percetive User Interfaces, Orlando, Florida, USA, 1–8
Cliff N (1966) Orthogonal rotation to congruence. Psychometrika 31(1):33–42
Commandeur JJ (1991) Matching configurations. DSWO Press, Leiden University, pp 13–61
Cootes TF, Gareth JE, Christopher JT et al. (1998) A comparative evaluation of active appearance model algorithms. BMVC 98:680–689
Cootes TF, Kittipanya-ngam P (2002) Comparing variations on the active appearance model algorithm. In BMVC, pp 1–10, 2002
Crowley JL, Berard F, Coutaz J et al. (1995) Finger tacking as an input device for augmented reality. Proc Int Workshop Automatic face Gesture Recognition, Zurich, Switzerland, 195–200
Derpanis KG (2005) Mean shift clustering, Lecture notes. http://www.cse.yorku.ca/~kosta/CompVis_Notes/mean_shift.pdf
Dijksterhuis GB, Gower JC (1992) The interpretation of generalized procrustes analysis and allied methods. Food Qual Prefer 3(2):67–87
Edwards, GJ, Christopher JT, Timothy FC et al. (1998) Interpreting face images using active appearance models. automatic face and gesture recognition, proceedings. Third IEEE Int Conf IEEE
Everson R (1998) Orthogonal, but not orthonormal, procrustes problems. Adv Comput Math
Fisher RA, Winifred AM (1923) CP32 studies in crop variation, II: the manurialresponse of different potato varieties. J Agric Sci Camb 13:311–320
Forbes K, Eugene F (2005) An efficient search algorithm for motion data using weighted PCA. Proceedings of the 2005 ACM SIGGRAPH. ACM, 2005
Francois R, Medioni G (1999) Adaptive color background modeling for real-time segmentation of video streams. In: International conference on imaging science, systems, and technology, Las Vegas, pp 227–232
Gavrila DM, Davis LS (1995) Towards 3-d model-based tracking and recognition of human movement: multi-view approach. IEEE Int Workshop automatic face- and gesture recognition. IEEE Computer Society, Zurich, 272–277
Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338
Gower JC (1975) Generalized procrustes analysis. Psychometrika 40(1):33–51
Gower J (1995) Orthogonal and projection procrustes analysis
Gower JC, Dijksterhuis GB (2004) Procrustes problems. Oxford University Press, Oxford
Green B (1952) The orthogonal approximation of an oblique structure in factor analysis. Psychometrika 17(4):429–440
Green BF, Gower JC (1979) A problem with congruence. Annual meeting of the psychometric society, Monterey, California
Gross R, Matthews I, Baker S (2005) Generic vs. person specific active appearance models. Image Vis Comput 23(11):1080–1093
Gruen AW, Akca MD (2003) Generalized procrustes analysis and its applications in photogrammetry
Holzmann GJ (1925) Finite state machine: Ebook. http://www.spinroot.com/spin/Doc/Book91_PDF/F1.pdf
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
Hou XW, Li SZ, Zhang H, Cheng Q (2001) Direct appearance models. Computer Vision and Pattern Recognition, 2001 CVPR 1:I–828, IEEE, 2001
Hubert M, Sanne E (2004) Robust PCA and classification in biosciences. Bioinformatics 20(11):1728–1736
Hurley JR, Cattell RB (1962) Producing direct rotation to test a hypothesized factor structure. Behav Sci 7(2):258–262
Igual L, Perez-Sala X, Escalera S, Angulo C, Dela TF (2014) Continuous generalized procrustes analysis. Pattern Recogn 47(2):659–671
Jeffers JNR (1967) Two case studies in the application of principal component analysis. Appl Stat 225–236
Jolliffe L (2002) Principal component analysis. Wiley, New York
Jordan C (1874) Mémoire sur les formes bilinéaires. J Math Pures Appl 19:35–54
Karhunen J, Jyrki J (1994) Representation and separation of signals using nonlinear PCA type learning. Neural Netw 7(1):113–127
Keaton T, Dominguez SM, Sayed AH et al. (2002) SNAP&TELL: a multi-modal wearable computer interface for browsing the environment. Proc Sixth Int Symposium Wearable Comput, 2002. (ISWC 2002), 75–82
Kiers HAL, ten Berge JMF (1992) Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika 57(3):371–382
Kristof W, Wingersky B (1971) A generalization of the orthogonal Procrustes rotation procedure to more than two matrices. Proc Ann Convention Am Psychol Assoc. American Psychological association, 1971
Lee HK, Kim JH (1999) An HMM-based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21:961–973
Li F, Wechsler H (2005) Open set face recognition using transduction. IEEE Trans Pattern Anal Mach Intell 27:1686–1697
Lingoes JC, Ingwer B (1978) A direct approach to individual differences scaling using increasingly complex transformations. Psychometrika 43(4):491–519
Lu W-L, Little JJ (2006) Simultaneous tracking and action recognition using the pca-hog descriptor. In: The 3rd Canadian conference on computer and robot vision, 2006. Quebec, pp 6–13
Lu H, Plataniotis KN, Venetsanopoulos AN (2006) MPCA: multilinear principal component analysis of tensor objects. Neural Netw IEEE Trans 19(1):18–39
Marcell S. Hand posture and gesture dataset. http://www.idiap.ch/resource/gestures/
Mika S, Schölkopf B, Smola AJ, Müller KR, Scholz M, Rätsch G. (1998) Kernel PCA and de-noising in feature spaces. In NIPS, vol 4(5)
Mosier CI (1939) Determining a simple sturcture when loadings for certain tests are known. Psychometrika 4:149–162
Oka K, Sato Y, Koike H (2002) Real-time gesture event detection tracking and gesture recognition. Comput Graph Appl IEEE 22:64–71
Papandreou G, Maragos P (2008) Adaptive and constrained algorithms for inverse compositional active appearance model fitting. Comput Vision Patt Recognition CVPR 1–8
Pearson K (1901) Principal components analysis. London, Edinb, Dublin Philos Mag J Sci 6(2):572–575
Peay ER (1988) Multidimensional rotation and scaling of configurations to optimal agreement. Psychometrika 53(2):199–208
Preisendorfer RW (1988) In: Mobley CD (ed) Principal component analysis in meteorology and oceanography, vol 425. Elsevier, Amsterdam
Quach KG, Duong CN, Luu K et al. (2012) Gabor wavelet-based appearance models. In: Computing and communication technologies, research, innovation, and vision for the future (RIVF), 1–6
Quek FKH, Mysliwiec T, Zhao M et al. (1995) Finger mouse: a freehand pointing computer interface. Proc Int Workshop Automatic Face Gesture Recognition, Zurich, Switzerland, 372–377
Ramage D (2007) Hidden Markov models fundamentals, Lecture notes. http://cs229.stanford.edu/section/cs229-hmm.pdf
Rao CR (1964) The use and interpretation of principal component analysis in applied research. Sankhyā: Indian J Stat Ser A 26:329–358
Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43:1–54
Ren Y, Zhang F (2009) Hand gesture recognition based on meb-svm. In: Second international conference on embedded software and systems, IEEE computer society, Los Alamitos, pp 344–349
Ross A Procrustes analysis, Technical report, Department of computer science and engineering, University of South Carolina, SC 29208
Sato Y, Kobayashi Y, Koike H et al. (2000) Fast tracking of hands and gesture event detection in infrared images for augmented desk interface. Proc Fourth IEEE Int Conf Automatic Face Gesture Recognition, 462–467, 28–30
Schönemann PH (1966) A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1):1–10
Schönemann PH, Robert MC (1970) Fitting one matrix to another under choice of a central dilation and a rigid motion. Psychometrika 35(2):245–255
Senin P (2008) Dynamic time warping algorithm review, technical report. http://csdl.ics.hawaii.edu/techreports/08-04/08-04.pdf
Sigal L, Sclaroff S, Athitsos V et al. (2004) Skin color-based video segmentation under time-varying illumination. IEEE Trans Pattern Anal Mach Intell 862–877
Song G, Ai H, Xu GY et al. (2003) Hierarchical direct appearance model for elastic labeled graph localization. Third Int Symposium Multispectral Image Process Pattern Recognition 139–144
Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566
Thirumuruganathan S (2010) A detailed introduction to K-nearest neighbor (KNN) algorithm. http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/
Tomita A, Ishii JR (1994) Hand shape extraction from a sequence of digitized gray-scale images”, 20th Int. Conf. Industrial Electronics, Control and Instrumentation. IECON ’94 3:1925–1930
Vidal R, Ma Y (2005) Generalized principal component analysis. IEEE Trans Pattern Anal Mach Intell 27:1945–1960
Wang RY, Popovi J (2009) Real-time hand-tracking with a color glove. ACM SIGGRAPH 2009 papers, 1–8
Wöhler C, Anlauf JK (1999) An adaptable time-delay neural-network algorithm for image sequence analysis. IEEE Trans Neural Netw 10:1531–1536
Wu Y, Ma B, Yang M, Zhang J, Jia Y (2014) Metric learning based structural appearance model for robust visual tracking. Circuits Syst Video Technol IEEE Trans 24(5):865–877
Wu Y, Shan Y, Zhangy Z et al. (2000) VISUAL PANEL: from an ordinary paper to a wireless and mobile input device. Technical report, MSR-TR-2000 Microsoft Research Corporation, http://www.research.microsoft.com, October 2000
Yan Y, Liu G, Ricci E et al. (2013) Multi-task linear discriminant analysis for multi-view action recognition. Image Process (ICIP), 20th IEEE Int Conf 2842–2846
Yan Y, Ricci E, Subramanian R et al. (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Comput Vision (ICCV), IEEE Int Conf 1177–1184
Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLocal tells you more: coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Togootogtokh, E., Shih, T.K. Multimedia content analysis on gesture event detection for a SMART TV Keyboard application. Multimed Tools Appl 76, 7341–7363 (2017). https://doi.org/10.1007/s11042-016-3385-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3385-3