Skip to main content

2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings

  • Conference paper
  • First Online:
  • 2280 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8815))

Abstract

Current linguistic research on sign language is often based on analysing large corpora of video recordings. The videos must be annotated either manually or automatically. Automatic methods for estimating the signer body configuration—especially the hand positions and shapes—would thus be of great practical interest. Methods based on rigorous 3D and 2D modelling of the body parts have been presented. However, they face insurmountable problems of computational complexity due to the large sizes of modern linguistic corpora. In this paper we look at an alternative approach and investigate what can be achieved with the use of straightforward local 2D appearance based methods: template matching-based tracking of local image neighbourhoods and supervised skin blob category detection based on local appearance features. After describing these techniques, we construct a signer configuration estimation system using the described techniques among others, and demonstrate the system in the video material of Suvi dictionary of Finnish Sign Language.

This work has been funded by the following grants of the Academy of Finland: 140245, Content-based video analysis and annotation of Finnish Sign Language (CoBaSiL); 251170, Finnish Centre of Excellence in Computational Inference Research (COIN).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language TV broadcasts. In: Proceedings of the British Machine Vision Conference (2008)

    Google Scholar 

  2. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

    Google Scholar 

  3. de La Gorce, M., Fleet, D., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(9), 1793–1805 (2011)

    Article  Google Scholar 

  4. Dreuw, P., Forster, J., Ney, H.: Tracking benchmark databases for video-based sign language recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 286–297. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vision 61(1), 55–79 (2005)

    Article  Google Scholar 

  6. Karppa, M., Viitaniemi, V., Luzardo, M., Laaksonen, J., Jantunen, T.: SLMotion - an extensible sign language oriented video analysis tool. In: Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Iceland. European Language Resources Association (May 2014)

    Google Scholar 

  7. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  8. Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A.: OP-ELM: Optimally-pruned extreme learning machine. IEEE Transactions on Neural Networks 21(1), 158–162 (2010)

    Article  Google Scholar 

  9. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29(1), 51–59 (1996)

    Article  Google Scholar 

  10. Pfister, T., Charles, J., Everingham, M., Zisserman, A.: Automatic and efficient long term arm and hand tracking for continuous sign language TV broadcasts. In: British Machine Vision Conference (2012)

    Google Scholar 

  11. Shi, J., Tomasi, C.: Good features to track. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1994), pp. 593–600 (June 1994)

    Google Scholar 

  12. Suvi, the on-line dictionary of Finnish Sign Language (2013), http://suvi.viittomat.net, The online service was opened in 2003 and the user interface has been renewed in 2013

  13. van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluation of color descriptors for object and scene recognition. In: Proc. of IEEE CVPR 2008, Anchorage. Alaska, USA (June 2008)

    Google Scholar 

  14. Viitaniemi, V., Jantunen, T., Savolainen, L., Karppa, M., Laaksonen, J.: S-pot - a benchmark in spotting signs within continuous signing. In Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Iceland. European Language Resources Association (May 2014)

    Google Scholar 

  15. Viitaniemi, V., Karppa, M., Laaksonen, J.: Experiments on recognising the handshape in blobs extracted from sign language videos. In: Proceedings of 22th International Conference on Pattern Recognition (ICPR), Stockholm, Sweden (August 2014)

    Google Scholar 

  16. Viitaniemi, V., Karppa, M., Laaksonen, J., Jantunen, T.: Detecting hand-head occlusions in sign language video. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 361–372. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: Proceedings of ACM International Conference on Image and Video Retrieval (CIVR 2009), Fira, Greece (July 2009)

    Google Scholar 

  18. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), pp. I:511–I:518 (2001)

    Google Scholar 

  19. Wu, J., Rehg, J.M.: CENTRIST: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8), 1489–1501 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ville Viitaniemi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Viitaniemi, V., Karppa, M., Laaksonen, J. (2014). 2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8815. Springer, Cham. https://doi.org/10.1007/978-3-319-11755-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11755-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11754-6

  • Online ISBN: 978-3-319-11755-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics