2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings

Viitaniemi, Ville; Karppa, Matti; Laaksonen, Jorma

doi:10.1007/978-3-319-11755-3_4

2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings

Ville Viitaniemi¹⁷,
Matti Karppa¹⁷ &
Jorma Laaksonen¹⁷

Conference paper
First Online: 10 October 2014

2280 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8815))

Abstract

Current linguistic research on sign language is often based on analysing large corpora of video recordings. The videos must be annotated either manually or automatically. Automatic methods for estimating the signer body configuration—especially the hand positions and shapes—would thus be of great practical interest. Methods based on rigorous 3D and 2D modelling of the body parts have been presented. However, they face insurmountable problems of computational complexity due to the large sizes of modern linguistic corpora. In this paper we look at an alternative approach and investigate what can be achieved with the use of straightforward local 2D appearance based methods: template matching-based tracking of local image neighbourhoods and supervised skin blob category detection based on local appearance features. After describing these techniques, we construct a signer configuration estimation system using the described techniques among others, and demonstrate the system in the video material of Suvi dictionary of Finnish Sign Language.

This work has been funded by the following grants of the Academy of Finland: 140245, Content-based video analysis and annotation of Finnish Sign Language (CoBaSiL); 251170, Finnish Centre of Excellence in Computational Inference Research (COIN).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language TV broadcasts. In: Proceedings of the British Machine Vision Conference (2008)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Google Scholar
de La Gorce, M., Fleet, D., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(9), 1793–1805 (2011)
Article Google Scholar
Dreuw, P., Forster, J., Ney, H.: Tracking benchmark databases for video-based sign language recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 286–297. Springer, Heidelberg (2012)
Chapter Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vision 61(1), 55–79 (2005)
Article Google Scholar
Karppa, M., Viitaniemi, V., Luzardo, M., Laaksonen, J., Jantunen, T.: SLMotion - an extensible sign language oriented video analysis tool. In: Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Iceland. European Language Resources Association (May 2014)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A.: OP-ELM: Optimally-pruned extreme learning machine. IEEE Transactions on Neural Networks 21(1), 158–162 (2010)
Article Google Scholar
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29(1), 51–59 (1996)
Article Google Scholar
Pfister, T., Charles, J., Everingham, M., Zisserman, A.: Automatic and efficient long term arm and hand tracking for continuous sign language TV broadcasts. In: British Machine Vision Conference (2012)
Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1994), pp. 593–600 (June 1994)
Google Scholar
Suvi, the on-line dictionary of Finnish Sign Language (2013), http://suvi.viittomat.net, The online service was opened in 2003 and the user interface has been renewed in 2013
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluation of color descriptors for object and scene recognition. In: Proc. of IEEE CVPR 2008, Anchorage. Alaska, USA (June 2008)
Google Scholar
Viitaniemi, V., Jantunen, T., Savolainen, L., Karppa, M., Laaksonen, J.: S-pot - a benchmark in spotting signs within continuous signing. In Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Iceland. European Language Resources Association (May 2014)
Google Scholar
Viitaniemi, V., Karppa, M., Laaksonen, J.: Experiments on recognising the handshape in blobs extracted from sign language videos. In: Proceedings of 22th International Conference on Pattern Recognition (ICPR), Stockholm, Sweden (August 2014)
Google Scholar
Viitaniemi, V., Karppa, M., Laaksonen, J., Jantunen, T.: Detecting hand-head occlusions in sign language video. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 361–372. Springer, Heidelberg (2013)
Chapter Google Scholar
Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: Proceedings of ACM International Conference on Image and Video Retrieval (CIVR 2009), Fira, Greece (July 2009)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), pp. I:511–I:518 (2001)
Google Scholar
Wu, J., Rehg, J.M.: CENTRIST: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8), 1489–1501 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland
Ville Viitaniemi, Matti Karppa & Jorma Laaksonen

Authors

Ville Viitaniemi
View author publications
You can also search for this author in PubMed Google Scholar
Matti Karppa
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Laaksonen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ville Viitaniemi .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Porto, Porto, Portugal
Aurélio Campilho
Dept. of Electrical and Computer Eng., University of Waterloo, Waterloo, Ontario, Canada
Mohamed Kamel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Viitaniemi, V., Karppa, M., Laaksonen, J. (2014). 2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8815. Springer, Cham. https://doi.org/10.1007/978-3-319-11755-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-11755-3_4
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11754-6
Online ISBN: 978-3-319-11755-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics