Skip to main content

Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

  • Chapter
  • First Online:
Gesture Recognition

Abstract

We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. Aff-SAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hand’s shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving model’s compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results.

Editors: Isabelle Guyon and Vassilis Athitsos

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The existence of two such transforms is due to the modulo-\(\pi \) ambiguity of the orientation.

  2. 2.

    We have assumed that the parameters \(\varvec{\lambda }\) and \(\varvec{p}\) are statistically independent.

  3. 3.

    The \((K+1)\)-plets follow from the fact that we need K neighbouring samples + the current sample.

References

  • U. Agris, J. Zieren, U. Canzler, B. Bauer, K.F. Kraiss, Recent developments in visual sign language recognition. Univ. Access Inf. Soc. 6, 323–362 (2008)

    Article  Google Scholar 

  • T. Ahmad, C.J. Taylor, T.F. Lanitis, A. Cootes, Tracking and recognising hand gestures, using statistical shape models. Image Vis. Comput. 15(5), 345–352 (1997)

    Article  Google Scholar 

  • A. Argyros, M. Lourakis, Real time tracking of multiple skin-colored objects with a possibly moving camera, in Proceedings of the European Conference on Computer Vision, 2004

    Google Scholar 

  • V. Athitsos, S. Sclaroff, An appearance-based framework for 3d hand shape classification and camera viewpoint estimation, in Proceedings of the International Conference on Automatic Face and Gesture Recognition, 2002, pp. 45–52

    Google Scholar 

  • S. Baker, I. Matthews, Lucas-kanade 20 years on: a unifying framework: Part 1. Technical report, Carnegie Mellon University, 2002

    Google Scholar 

  • S. Baker, R. Gross, I. Matthews, Lucas-kanade 20 years on: a unifying framework: Part 4, Technical report, Carnegie Mellon University, 2004

    Google Scholar 

  • B. Bauer, K.F. Kraiss, Towards an automatic sign language recognition system using subunits, in Proceedings of the International Gesture Workshop vol. 2298, 2001, pp. 64–75

    Google Scholar 

  • H. Birk, T.B. Moeslund, C.B. Madsen, Real-time recognition of hand alphabet gestures using principal component analysis, in Proceedings of the Scandinavian Conference Image Analysis, 1997

    Google Scholar 

  • A. Blake, M. Isard, Active Contours (Springer, 1998)

    Google Scholar 

  • R. Bowden, M. Sarhadi, A nonlinear model of shape and motion for tracking fingerspelt american sign language. Image Vis. Comput. 20, 597–607 (2002)

    Article  Google Scholar 

  • P. Buehler, M. Everingham, A. Zisserman, Learning sign language by watching TV (using weakly aligned subtitles), in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2009

    Google Scholar 

  • J. Cai, A. Goshtasby, Detecting human faces in color images. Image Vis. Comput. 18, 63–75 (1999)

    Article  Google Scholar 

  • F.-S. Chen, C.-M. Fu, C.-L. Huang, Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis. Comput. 21(8), 745–758 (2003)

    Article  Google Scholar 

  • S. Conseil, S. Bourennane, L. Martin, Comparison of Fourier descriptors and Hu moments for hand posture recognition, in Proceedings of the European Conference on Signal Processing, 2007

    Google Scholar 

  • T.F. Cootes, C.J. Taylor, Statistical models of appearance for computer vision. Technical report, University of Manchester, 2004

    Google Scholar 

  • Y. Cui, J. Weng, Appearance-based hand sign recognition from intensity image sequences. Comput. Vis. Image Underst. 78(2), 157–176 (2000)

    Article  Google Scholar 

  • D.L. Davies, D.W. Bouldin, A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  • J.-W. Deng, H.T. Tsui, A novel two-layer PCA/MDA scheme for hand posture recognition. Proc. Int. Conf. Pattern Recognit. 1, 283–286 (2002)

    Google Scholar 

  • DictaSign, Greek sign language corpus, http://www.sign-lang.uni-hamburg.de/dicta-sign/portal, 2012

  • L. Ding, A.M. Martinez, Modelling and recognition of the linguistic components in american sign language. Image Vis. Comput. 27(12), 1826–1844 (2009)

    Article  Google Scholar 

  • P. Dreuw, J. Forster, T. Deselaers, H. Ney, Efficient approximations to model-based joint tracking and recognition of continuous sign language, in Proceedings of the International Conference on Automatic Face and Gesture Recognition, 2008

    Google Scholar 

  • I.L. Dryden, K.V. Mardia, Statistical Shape Analysis (Wiley, 1998)

    Google Scholar 

  • W. Du, J. Piater, Hand modeling and tracking for video-based sign language recognition by robust principal component analysis, in Proceedings of the ECCV Workshop on Sign, Gesture and Activity, 2010

    Google Scholar 

  • A. Farhadi, D. Forsyth, R. White, Transfer learning in sign language, in Proceedings of the Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–8

    Google Scholar 

  • H. Fillbrandt, S. Akyol, K.-F. Kraiss, Extraction of 3D hand shape and posture from images sequences from sign language recognition, in Proceedings of the International Workshop on Analysis and Modeling of Faces and Gestures, 2003, pp. 181–186

    Google Scholar 

  • R. Gross, I. Matthews, S. Baker, Generic vs. person specific active appearance models. Image Vis. Comput. 23(12), 1080–1093 (2005)

    Article  Google Scholar 

  • T. Hanke, HamNoSys Representing sign language data in language resources and language processing contexts, in Proceedings of the International Conference on Language Resources and Evaluation, 2004

    Google Scholar 

  • M.-K. Hu, Visual pattern recognition by moment invariants. IEEE Trans. Inf. Theory 8(2), 179–187 (1962)

    Article  MATH  Google Scholar 

  • C.-L. Huang, S.-H. Jeng, A model-based hand gesture recognition system. Mach. Vis. Appl. 12(5), 243–258 (2001)

    Article  Google Scholar 

  • P. Kakumanu, S. Makrogiannis, N. Bourbakis, A survey of skin-color modeling and detection methods. Pattern Recogn. 40(3), 1106–1122 (2007)

    Article  MATH  Google Scholar 

  • E. Learned-Miller, Data driven image models through continuous joint alignment. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 236–250 (2005)

    Article  Google Scholar 

  • S. Liwicki, M. Everingham, Automatic recognition of fingerspelled words in British sign language, in Proceedings of the CVPR Workshop on Human Communicative Behavior Analysis, 2009

    Google Scholar 

  • P. Maragos, Morphological Filtering for Image Enhancement and Feature Detection, The Image and Video Processing Handbook (Elsevier, 2005)

    Google Scholar 

  • I. Matthews, S. Baker, Active appearance models revisited. Int. J. Comput. Vis. 60(2), 135–164 (2004)

    Article  Google Scholar 

  • C. Neidle, Signstream annotation: addendum to conventions used for the american sign language linguistic research project. Technical report, 2007

    Google Scholar 

  • C. Neidle, C. Vogler, A new web interface to facilitate access to corpora: development of the ASLLRP data access interface, in Proceedings of the International Conference on Language Resources and Evaluation, 2012

    Google Scholar 

  • E.J. Ong, H. Cooper, N. Pugeault, R. Bowden, Sign language recognition using sequential pattern trees, in Proceedings of the Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 2200–2207

    Google Scholar 

  • Y. Peng, A. Ganesh, J. Wright, W. Xu, Y. Ma, RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images, in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2010

    Google Scholar 

  • V. Pitsikalis, S. Theodorakis, P. Maragos, Data-driven sub-units and modeling structure for continuous sign language recognition with multiple cues, in LREC Workshop Repr. & Proc. SL, Corpora and SL Technologies, 2010

    Google Scholar 

  • L.R. Rabiner, R.W. Schafer, Introduction to digital speech processing. Found. Trends Signal Process. 1(1–2), 1–194 (2007)

    MATH  Google Scholar 

  • A. Roussos, S. Theodorakis, V. Pitsikalis, P. Maragos, Affine-invariant modeling of shape-appearance images applied on sign language handshape classification, in Proceedings of the International Conference on Image Processing, 2010a

    Google Scholar 

  • A. Roussos, S. Theodorakis, V. Pitsikalis, P. Maragos, Hand tracking and affine shape-appearance handshape sub-units in continuous sign language recognition, in Proceedings of the ECCV Workshop on Sign, Gesture and Activity, 2010b

    Google Scholar 

  • J. Sherrah, S. Gong, Resolving visual uncertainty and occlusion through probabilistic reasoning, in Proceedings of the British Machine Vision Conference, 2000, pp. 252–261

    Google Scholar 

  • P. Soille, Morphological Image Analysis: Principles and Applications (Springer, 2004)

    Google Scholar 

  • T. Starner, J. Weaver, A. Pentland, Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998). Dec

    Article  Google Scholar 

  • B. Stenger, A. Thayananthan, P.H.S. Torr, R. Cipolla, Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1372–1384 (2006). Sep

    Article  MATH  Google Scholar 

  • G.J. Sweeney, A.C. Downton, Towards appearance-based multi-channel gesture recognition, in Proceedings of the International Gesture Workshop, 1996, pp. 7–16

    Google Scholar 

  • N. Tanibata, N. Shimada, Y. Shirai, Extraction of hand features for recognition of sign language words, in Proceedings of the International Conference on Vision Interface, 2002, pp. 391–398

    Google Scholar 

  • J. Terrillon, M. Shirazi, H. Fukamachi, S. Akamatsu, Comparative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images, in Proceedings of the International Conference on Automatic Face and Gesture Recognition, 2000, pp. 54–61

    Google Scholar 

  • A. Thangali, J.P. Nash, S. Sclaroff, C. Neidle, Exploiting phonological constraints for handshape inference in asl video, in Proceedings of the Conference on Computer Vision and Pattern Recognition (IEEE, 2011) pp. 521–528

    Google Scholar 

  • S. Theodorakis, V. Pitsikalis, P. Maragos, Advances in dynamic-static integration of movement and handshape cues for sign language recognition, in Proceedings of the International Gesture Workshop, 2011

    Google Scholar 

  • S. Theodorakis, V. Pitsikalis, I. Rodomagoulakis, P. Maragos, Recognition with raw canonical phonetic movement and handshape subunits on videos of continuous sign language, in Proceedings of the International Conference on Image Processing, 2012

    Google Scholar 

  • M. Viola, M.J. Jones, Fast multi-view face detection, in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2003

    Google Scholar 

  • C. Vogler, D. Metaxas, Parallel hidden markov models for american sign language recognition. Proc. Int. Conf. Comput. Vis. 1, 116–122 (1999)

    Google Scholar 

  • Y. Wu, T.S. Huang, View-independent recognition of hand postures. Proc. Conf. Comput. Vis. Pattern Recognit. 2, 88–94 (2000)

    Google Scholar 

  • M.-H. Yang, N. Ahuja, M. Tabb, Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(8), 1061–1074 (2002). Aug

    Article  Google Scholar 

  • S. Young, D. Kershaw, J. Odell, D. Ollason, V. Woodland, P. Valtchevand, The HTK Book (Entropic Ltd., 1999)

    Google Scholar 

  • J. Zieren, N. Unger, S. Akyol, Hands tracking from frontal view for vision-based gesture recognition, in Pattern Recognition, LNCS, 2002, pp. 531–539

    Google Scholar 

Download references

Acknowledgements

This research work was supported by the EU under the research program Dictasign with Grant FP7-ICT-3-231135. A. Roussos was also supported by the ERC Starting Grant 204871-HUMANIS. This work was done while A. Roussos was with National Technical University of Athens, Greece and Queen Mary, University of London, UK; S. Theodorakis and V. Pitsikalis were both with the National Technical University of Athens; S. Theodorakis and V. Pitsikalis are now with deeplab.ai, Athens, GR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasios Roussos .

Editor information

Editors and Affiliations

Appendix A. Details about the Regularized Fitting Algorithm

Appendix A. Details about the Regularized Fitting Algorithm

We provide here details about the algorithm of the regularized fitting of the shape-appearance model. The total energy \(E(\varvec{\lambda },\varvec{p})\) that is to be minimized can be written as (after a multiplication with \(N_M\) that does not affect the optimum parameters):

$$\begin{aligned} \begin{aligned} J(\varvec{\lambda },\varvec{p}) =&\sum _{\varvec{x}} \biggl \{ A_0(\varvec{x}) + \sum _{i=1}^{N_c} \lambda _i A_i(\varvec{x}) - f(W_{\varvec{p}}(\varvec{x})) \biggr \}^2+\\&\frac{N_M}{N_c} \left( w_S \left\| \varvec{\lambda } - \varvec{\lambda }_0\right\| _{\Sigma _{\varvec{\lambda }}}^2 + w_D \left\| \varvec{\lambda } - \varvec{\lambda }^e\right\| _{\Sigma _{\varvec{\varepsilon }_{\varvec{\lambda }}}}^2 \right) +\\&\frac{N_M}{N_p}\left( w_S \left\| \varvec{p} - \varvec{p}_0\right\| _{\Sigma _{\varvec{p}}}^2 + w_D \left\| \varvec{p} - \varvec{p}^e\right\| _{\Sigma _{\varvec{\varepsilon }_{\varvec{p}}}}^2 \right) \,\, . \end{aligned} \end{aligned}$$
(8.4)

If \(\sigma _{{\lambda }_i}\), \(\sigma _{\tilde{p}_i}\) are the standard deviations of the components of the parameters \(\varvec{\lambda }\), \(\widetilde{\varvec{p}}\) respectively and \(\sigma _{{\varepsilon }_{\varvec{\lambda },i}}\), \(\sigma _{{\varepsilon }_{\widetilde{\varvec{p}},i}}\) are the standard deviations of the components of the parameters’ prediction errors \(\varvec{\varepsilon }_{\varvec{\lambda }}\), \(\varvec{\varepsilon }_{\widetilde{\varvec{p}}}\), then the corresponding covariance matrices \(\Sigma _{\varvec{\lambda }}\), \(\Sigma _{\widetilde{\varvec{p}}}\), \(\Sigma _{\varvec{\varepsilon }_{\varvec{\lambda }}}\), \(\Sigma _{\varvec{\varepsilon }_{\widetilde{\varvec{p}}}}\), which are diagonal, can be written as:

$$\begin{aligned} \begin{aligned}&\Sigma _{\varvec{\lambda }} = \mathrm {diag}( \sigma _{{\lambda }_1}^2, \ldots , \sigma _{{\lambda }_{N_c}}^2 ), \Sigma _{\widetilde{\varvec{p}}} = \mathrm {diag}( \sigma _{\tilde{p}_1}^2, \ldots , \sigma _{\tilde{p}_{N_c}}^2 ), \\&\Sigma _{\varvec{\varepsilon }_{\varvec{\lambda }}} = \mathrm {diag}( \sigma _{\varvec{\varepsilon }_{\varvec{\lambda },1}}^2, \ldots , \sigma _{\varvec{\varepsilon }_{\varvec{\lambda },N_c}}^2), \Sigma _{\varvec{\varepsilon }_{\widetilde{\varvec{p}}}} = \mathrm {diag}( \sigma _{\varvec{\varepsilon }_{\widetilde{\varvec{p}},1}}^2, \ldots , \sigma _{\varvec{\varepsilon }_{\widetilde{\varvec{p}},N_p}}^2). \end{aligned} \end{aligned}$$

The squared norms of the prior terms in Eq. (8.4) are thus given by:

$$\begin{aligned} \left\| \varvec{\lambda } - \varvec{\lambda }_0\right\| _{\Sigma _{\varvec{\lambda }}}^2 = \sum _{i=1}^{N_c} \left( \frac{\lambda _i}{\sigma _{{\lambda }_i}} \right) ^2\, , \,\, \end{aligned}$$
$$\begin{aligned} \left\| \varvec{\lambda } - \varvec{\lambda }^e\right\| _{\Sigma _{\varvec{\varepsilon }_{\varvec{\lambda }}}}^2 = \sum _{i=1}^{N_c} \left( \frac{\lambda _i-\lambda _i^e}{\sigma _{{\varepsilon }_{{\lambda },i}}} \right) ^2\,\, , \end{aligned}$$
$$\begin{aligned} \left\| \varvec{p} - \varvec{p}_0\right\| _{\Sigma _{\varvec{p}}}^2 = (\varvec{p} - \varvec{p}_0)^T U_{\varvec{p}} \Sigma _{\widetilde{\varvec{p}}}^{-1} U_{\varvec{p}}^T (\varvec{p} - \varvec{p}_0) = \left\| \widetilde{\varvec{p}}\right\| _{\Sigma _{\widetilde{\varvec{p}}}}^2 = \sum _{i=1}^{N_p} \left( \frac{\tilde{p}_i}{\sigma _{{\tilde{p}_i}}} \right) ^2\, , \,\, \end{aligned}$$
$$\begin{aligned} \left\| \varvec{p} - \varvec{p}^e\right\| _{\Sigma _{\varvec{\varepsilon }_{\varvec{p}}}}^2 = \left\| \widetilde{\varvec{p}}- \widetilde{\varvec{p}}^e\right\| _{\Sigma _{\varvec{\varepsilon }_{\widetilde{\varvec{p}}}}}^2 = \sum _{i=1}^{N_p} \left( \frac{\tilde{p}_i - \tilde{p}_i^e}{\sigma _{{\varepsilon }_{{\tilde{p}},i}}} \right) ^2\, . \end{aligned}$$

Therefore, if we set:

$$\begin{aligned} \begin{aligned} m_1 = \sqrt{w_S N_M / N_c}\, ,\, m_2 = \sqrt{w_D N_M / N_c}\, ,\, \\ m_3 = \sqrt{w_S N_M / N_p}\, ,\, m_4 = \sqrt{w_D N_M / N_p}\, ,\, \end{aligned} \end{aligned}$$

the energy in Eq. (8.4) takes the form:

$$\begin{aligned} J(\varvec{\lambda },\varvec{p}) = \sum _{\varvec{x}} \biggl \{ A_0(\varvec{x}) + \sum _{i=1}^{N_c} \lambda _i A_i(\varvec{x}) - f(W_{\varvec{p}}(\varvec{x})) \biggr \}^2+ \sum _{i=1}^{N_G} G_i^2(\varvec{\lambda },\varvec{p}) \,\, , \end{aligned}$$
(8.5)

with \(G_i(\varvec{\lambda },\varvec{p})\) being \(N_G=2 N_c + 2 N_p\) prior functions defined by:

$$\begin{aligned} G_i(\varvec{\lambda },\varvec{p}) = \,\,{\left\{ \begin{array}{ll} m_1 \frac{\large \lambda _i}{\sigma _{{\lambda }_i}} , &{}{1 \le i \le N_c }\\ m_2 \frac{\lambda _j-\lambda _j^e}{\sigma _{{\varepsilon }_{{\lambda },j}}} , {j=i-N_c} , &{}{N_c+1 \le i \le 2N_c}\\ m_3 \frac{\tilde{p}_j}{\sigma _{{\tilde{p}_j}}} , {j=i-2N_c} , &{}{2N_c+1 \le i \le 2N_c+N_p}\\ m_4 \frac{\tilde{p}_j - \tilde{p}_j^e}{\sigma _{{\varepsilon }_{{\tilde{p}},j}}} , {j=i-2N_c-N_p} , &{}{2N_c+N_p+1\le i \le 2N_c+2N_p} \end{array}\right. }. \end{aligned}$$
(8.6)

Each component \(\tilde{p}_j\), \(j=1,\ldots ,N_p\), of the re-parametrization of \(\varvec{p}\) can be written as:

$$\begin{aligned} \tilde{p}_j = \varvec{v}_{\tilde{p}_j}^T (\varvec{p} - \varvec{p}_0) \,\, , \end{aligned}$$
(8.7)

where \(\varvec{v}_{\tilde{p}_j}\) is the j-th column of \(U_{\varvec{p}}\), that is the eigenvector of the covariance matrix \(\Sigma _{\varvec{p}}\) that corresponds to the j-th principal component \(\tilde{p}_j\).

In fact, the energy \(J(\varvec{\lambda },\varvec{p})\), Eq. (8.5), for general prior functions \(G_i(\varvec{\lambda },\varvec{p})\), has exactly the same form as the energy that is minimized by the algorithm of Baker et al. (2004). Next, we describe this algorithm and then we specialize it in the specific case of our framework.

8.1.1 A.1. Simultaneous Inverse Compositional Algorithm with a Prior

We briefly present here the algorithm simultaneous inverse compositional with a prior (SICP) (Baker et al. 2004). This is a Gauss-Newton algorithm that finds a local minimum of the energy \(J(\varvec{\lambda },\varvec{p})\) (8.5) for general cases of prior functions \(G_i(\varvec{\lambda },\varvec{p})\) and warps \(W_{\varvec{p}}(\varvec{x})\) that are controlled by some parameters \(\varvec{p}\).

The algorithm starts from some initial estimates of \(\varvec{\lambda }\) and \(\varvec{p}\). Afterwards, in every iteration, the previous estimates of \(\varvec{\lambda }\) and \(\varvec{p}\) are updated to \(\varvec{\lambda }'\) and \(\varvec{p}'\) as follows. It is considered that a vector \(\Delta \varvec{\lambda }\) is added to \(\varvec{\lambda }\):

$$\begin{aligned} \varvec{\lambda }' = \varvec{\lambda }+\Delta \varvec{\lambda } \end{aligned}$$
(8.8)

and a warp with parameters \(\Delta \varvec{p}\) is applied to the synthesized image \(A_0(\varvec{x}) + \sum \lambda _i A_i(\varvec{x})\). As an approximation, the latter is taken as equivalent to updating the warp parameters from \(\varvec{p}\) to \(\varvec{p}'\) by composing \(W_{\varvec{p}}(\varvec{x})\) with the inverse of \(W_{\Delta \varvec{p}}(\varvec{x})\) :

$$\begin{aligned} W_{\varvec{p}'} = W_{\varvec{p}} \circ W_{\Delta \varvec{p}}^{-1} \,\, . \end{aligned}$$
(8.9)

From the above relation, given that \(\varvec{p}\) is constant, \(\varvec{p}'\) can be expressed as a \(\mathbb {R}^{N_p} \rightarrow \mathbb {R}^{N_p}\) function of \(\Delta \varvec{p}\), \(\varvec{p}'=\varvec{p}'(\Delta \varvec{p})\), with \(\varvec{p}'(\Delta \varvec{p}=0) = \varvec{p}\). Further, \(\varvec{p}'(\Delta \varvec{p})\) is approximated with a first order Taylor expansion around \(\Delta p = 0\):

$$\begin{aligned} \varvec{p}'(\Delta \varvec{p}) = \varvec{p} + \frac{\partial \varvec{p}'}{\partial \Delta \varvec{p}} \Delta \varvec{p} \,\, . \end{aligned}$$
(8.10)

where \(\frac{\partial \varvec{p}'}{\partial \Delta \varvec{p}}\) is the Jacobian of the function \(\varvec{p}'(\Delta \varvec{p})\), which generally depends on \(\Delta \varvec{p}\).

Based on the aforementioned type of updates of \(\varvec{\lambda }\) and \(\varvec{p}\) as well as the considered approximations, the values \(\Delta \varvec{\lambda }\) and \(\Delta \varvec{p}\) are specified by minimizing the following energy:

$$\begin{aligned} \begin{aligned} F(\Delta \varvec{\lambda },\Delta \varvec{p}) =&\sum _{\varvec{x}} \biggl \{ A_0\bigl ( W_{\Delta \varvec{p}}(\varvec{x}) \bigr ) + \sum _{i=1}^{N_c} (\lambda _i + \Delta \lambda _i ) A_i\bigl ( W_{\Delta \varvec{p}}(\varvec{x}) \bigr ) \\ -&f\bigl ( W_{\varvec{p}}(\varvec{x}) \bigr ) \biggr \}^2 + \sum _{i=1}^{N_G} G_i^2\left( \varvec{\lambda }+\Delta \varvec{\lambda },\varvec{p}+\frac{\partial \varvec{p}'}{\partial \Delta \varvec{p}} \Delta \varvec{p} \right) \,\, , \end{aligned} \end{aligned}$$

simultaneously with respect to \(\Delta \varvec{\lambda }\) and \(\Delta \varvec{p}\). By applying first order Taylor approximations on the two terms of the above energy \(F(\varvec{\lambda },\varvec{p})\), one gets:

$$\begin{aligned} \begin{aligned} F(\Delta \varvec{\lambda },\Delta \varvec{p}) \approx&\sum _{\varvec{x}} \biggl \{ E_{sim}(\varvec{x}) + \varvec{SD}_{sim}(\varvec{x}) \left( \begin{array}{c} \Delta \varvec{\lambda } \\ \Delta \varvec{p} \\ \end{array}\right) \biggr \}^2+\\&\sum _{i=1}^{N_G} \biggl \{ G_i(\varvec{\lambda },\varvec{p}) + \varvec{SD}_{G_i} \left( \begin{array}{c} \Delta \varvec{\lambda } \\ \Delta \varvec{p} \\ \end{array}\right) \biggr \}^2 \,\, , \end{aligned} \end{aligned}$$
(8.11)

where \(E_{sim}(\varvec{x})\) is the image of reconstruction error evaluated at the model domain:

$$\begin{aligned} E_{sim}(\varvec{x}) = A_0(\varvec{x}) + \sum _{i=1}^{N_c} \lambda _i A_i(\varvec{x}) - f\bigl ( W_{\varvec{p}}(\varvec{x}) \bigr ) \end{aligned}$$

and \(\varvec{SD}_{sim}(\varvec{x})\) is a vector-valued “steepest descent” image with \(N_c+N_{\varvec{p}}\) channels, each one of them corresponding to a specific component of the parameter vectors \(\varvec{\lambda }\) and \(\varvec{p}\):

$$\begin{aligned} \varvec{SD}_{sim}(\varvec{x}) = \biggl [ A_1(\varvec{x}),...,A_{N_c}(\varvec{x}), \biggl ( \nabla A_0(\varvec{x}) + \sum _{i=1}^{N_c} \lambda _i \nabla A_i(\varvec{x}) \biggr ) \frac{\partial W_p(\varvec{x})}{\partial \varvec{p}} \biggr ]\,, \end{aligned}$$
(8.12)

where the gradients \(\nabla A_i(\varvec{x})=\left[ \frac{\partial A_i}{\partial x_1} \,,\, \frac{\partial A_i}{\partial x_2} \right] \) are considered as row vector functions. Also \(\varvec{SD}_{G_i}\), for each \(i=1,...,N_G\), is a row vector with dimension \(N_c+N_{\varvec{p}}\) that corresponds to the steepest descent direction of the prior term \(G_i(\varvec{\lambda },\varvec{p})\):

$$\begin{aligned} \varvec{SD}_{G_i} = \left( \frac{\partial G_i}{\partial \varvec{\lambda }} \,,\, \frac{\partial G_i}{\partial \varvec{p}} \frac{\partial \varvec{p}'}{\partial \Delta \varvec{p}} \right) \,\,. \end{aligned}$$
(8.13)

The approximated energy \(F(\varvec{\lambda },\varvec{p})\) (8.11) is quadratic with respect to both \(\Delta \varvec{\lambda }\) and \(\Delta \varvec{p}\), therefore the minimization can be done analytically and leads to the following solution:

$$\begin{aligned} \left( \begin{array}{c} \Delta \varvec{\lambda } \\ \Delta \varvec{p} \\ \end{array}\right) = -H^{-1} \biggl [ \sum _{\varvec{x}} \varvec{SD}_{sim}^T(\varvec{x}) E_{sim}(\varvec{x}) + \sum _{i=1}^{N_G} \varvec{SD}_{G_i}^T G_i(\varvec{\lambda },\varvec{p}) \biggr ]\,, \end{aligned}$$
(8.14)

where H is the matrix (which approximates the Hessian of F):

$$\begin{aligned} H = \sum _{\varvec{x}} \varvec{SD}_{sim}^T(\varvec{x}) \varvec{SD}_{sim}(\varvec{x}) + \sum _{i=1}^{N_G} \varvec{SD}_{G_i}^T \varvec{SD}_{G_i} \,\, . \end{aligned}$$

In conclusion, in every iteration of the SICP algorithm, the Eq. (8.14) is applied and the parameters \(\varvec{\lambda }\) and \(\varvec{p}\) are updated using Eqs. (8.8) and (8.10). This process terminates when a norm of the update vector \( \left( \begin{array}{c} \Delta \varvec{\lambda } \\ \Delta \varvec{p} \\ \end{array}\right) \) falls below a relatively small threshold and then it is considered that the process has converged.

8.1.1.1 A.1.1. Combination with Levenberg-Marquardt Algorithm

In the algorithm described above, there is no guarantee that the original energy (8.5), that is the objective function before any approximation, decreases in every iteration; it might increase if the involved approximations are not accurate. Therefore, following Baker and Matthews (2002), we use a modification of this algorithm by combining it with the Levenberg-Marquardt algorithm: In Eq. (8.14) that specifies the updates, we replace the Hessian approximation H by \(H+\delta \, \mathrm {diag}(H)\), where \(\delta \) is a positive weight and \(\mathrm {diag}(H)\) is the diagonal matrix that contains the diagonal elements of H. This corresponds to an interpolation between the updates given by the Gauss-Newton algorithm and weighted gradient descent. As \(\delta \) increases, the algorithm has a behavior closer to gradient descent, which means that from the one hand is slower but from the other hand yields updates that are more reliable, in the sense that the energy will eventually decrease for sufficiently large \(\delta \).

In every iteration, we specify the appropriate weight \(\delta \) as follows. Starting from setting \(\delta \) to 1 / 10 of its value in the previous iteration (or from \(\delta =0.01\) if this is the first iteration), we compute the updates \(\Delta \varvec{\lambda }\) and \(\Delta \varvec{p}\) using the Hessian approximation \(H+\delta \, \mathrm {diag}(H)\) and then evaluate the original energy (8.5). If the energy has decreased we keep the updates and finish the iteration. If the energy has increased, we set \(\delta \rightarrow 10\,\delta \) and try again. We repeat that step until the energy decreases.

8.1.2 A.2. Specialization in the Current Framework

In this section, we derive the SICP algorithm for the special case that concerns our method. This case arises when (1) the general warps \(W_{\varvec{p}}(\varvec{x})\) are specialized to affine transforms and (2) the general prior functions \(G_i(\varvec{\lambda },\varvec{p})\) are given by Eq. (8.6).

8.1.2.1 A.2.1. The Case of Affine Transforms

In our framework, the general warps \(W_{\varvec{p}}(\varvec{x})\) of the SICP algorithm are specialized to affine transforms with parameters \(\varvec{p}=(p_1\cdots p_6)\) that are defined by:

$$\begin{aligned} W_{\varvec{p}}(x,y) = \left( \begin{array}{ccc} 1+p_1 &{} p_3 &{} p_5 \\ p_2 &{} 1+p_4 &{} p_6 \\ \end{array} \right) \left( \begin{array}{c} x\\ y\\ 1\\ \end{array} \right) . \end{aligned}$$

In this special case, which is analyzed also in Baker et al. (2004), the Jacobian \(\frac{\partial W_p(\varvec{x})}{\partial \varvec{p}}\) that is used in Eq. (8.12) is given by:

$$\begin{aligned} \frac{\partial W_p(\varvec{x})}{\partial \varvec{p}} = \left( \begin{array}{cccccc} x_1 &{} 0 &{} x_2 &{} 0 &{} 1 &{} 0 \\ 0 &{} x_1 &{} 0 &{} x_2 &{} 0 &{} 1 \\ \end{array} \right) \,\, . \end{aligned}$$

The restriction to affine transforms implies also a special form for the Jacobian \(\frac{\partial \varvec{p}'}{\partial \Delta \varvec{p}}\) that is used in Eq. (8.13). More precisely, as described in Baker et al. (2004), a first order Taylor approximation is first applied to the inverse warp \(W_{\Delta \varvec{p}}^{-1}\) and yields \(W_{\Delta \varvec{p}}^{-1} \approx W_{-\Delta \varvec{p}}\). Afterwards, based on Eq. (8.9) and the fact that the parameters of a composition \(W_{\varvec{r}} = W_{\varvec{p}} \circ W_{\varvec{q}}\) of two affine transforms are given by:

$$\begin{aligned} \varvec{r} = \left( \begin{array}{c} p_1 + q_1 + p_1 q_1 + p_3 q_2 \\ p_2 + q_2 + p_2 q_1 + p_4 q_2 \\ p_3 + q_3 + p_1 q_3 + p_3 q_4 \\ p_4 + q_4 + p_2 q_3 + p_4 q_4 \\ p_5 + q_5 + p_1 q_5 + p_3 q_6 \\ p_6 + q_6 + p_2 q_5 + p_4 q_6 \\ \end{array} \right) \,\, , \end{aligned}$$

the function \(\varvec{p}'(\Delta \varvec{p})\) (8.10) is approximated as:

$$\begin{aligned} \varvec{p}'(\Delta \varvec{p}) = \left( \begin{array}{c} p_1 - \Delta p_1 - p_1 \Delta p_1 - p_3 \Delta p_2\\ p_2 - \Delta p_2 - p_2 \Delta p_1 - p_4 \Delta p_2 \\ p_3 - \Delta p_3 - p_1 \Delta p_3 - p_3 \Delta p_4 \\ p_4 - \Delta p_4 - p_2 \Delta p_3 - p_4 \Delta p_4 \\ p_5 - \Delta p_5 - p_1 \Delta p_5 - p_3 \Delta p_6 \\ p_6 - \Delta p_6 - p_2 \Delta p_5 - p_4 \Delta p_6 \\ \end{array} \right) \,\, . \end{aligned}$$

Therefore, its Jacobian is given by:

$$\begin{aligned} \frac{\partial \varvec{p}'}{\partial \Delta \varvec{p}} = - \left( \begin{array}{cccccc} 1+p_1 &{} p_3 &{} 0 &{} 0 &{} 0 &{} 0 \\ p_2 &{} 1+p_4 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1+p_1 &{} p_3 &{} 0 &{} 0 \\ 0 &{} 0 &{} p_2 &{} 1+p_4 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 1+p_1 &{} p_3 \\ 0 &{} 0 &{} 0 &{} 0 &{} p_2 &{} 1+p_4 \\ \end{array} \right) \,\, . \end{aligned}$$

8.1.2.2 A.2.2. Specific Type of Prior Functions

Apart from the restriction to affine transforms, in the proposed framework of the regularized shape-appearance model fitting, we have derived the specific formulas of Eq. (8.6) for the prior functions \(G_i(\varvec{\lambda },\varvec{p})\) of the energy \(J(\varvec{\lambda },\varvec{p})\) in Eq. (8.5). Therefore, in our case, their partial derivatives, which are involved in the above described SICP algorithm (see Eq. (8.13)), are specialized as follows:

$$\begin{aligned} \frac{\partial G_i}{\partial \varvec{p}} \mathop {=}\limits ^{(7)} {\left\{ \begin{array}{ll} 0\,, &{}{1 \le i \le 2N_c }\\ \frac{m_3}{\sigma _{{\tilde{p}_j}}} \varvec{v}_{\tilde{p}_j}^T \,, {j=i-2N_c} \,, &{}{2N_c+1 \le i \le 2N_c+N_p}\\ \frac{m_4}{\sigma _{{\varepsilon }_{{\tilde{p}},j}}} \varvec{v}_{\tilde{p}_j}^T \,, {j=i-2N_c-N_p} \,, &{}{2N_c+N_p+1 \le i \le 2N_c+2N_p} \end{array}\right. }\,, \end{aligned}$$
$$\begin{aligned} \frac{\partial G_i}{\partial \varvec{\lambda }} = {\left\{ \begin{array}{ll} \frac{\large m_1}{\sigma _{{\lambda }_i}} \varvec{e}_i^T \,, &{}{1 \le i \le N_c }\\ \frac{m_2}{\sigma _{{\varepsilon }_{{\lambda },j}}} \varvec{e}_j^T \,, {j=i-N_c} \,, &{}{N_c+1 \le i \le 2N_c} \\ 0 \,, &{}{2N_c+1 \le i \le 2N_c+2N_p} \end{array}\right. }\,, \end{aligned}$$

where \(\varvec{e}_i\), \(1 \le i \le N_c\), is the ith column of the \(N_c \times N_c\) identity matrix.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Roussos, A., Theodorakis, S., Pitsikalis, V., Maragos, P. (2017). Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57021-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57020-4

  • Online ISBN: 978-3-319-57021-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics