Robust Online Gesture Recognition with Crowdsourced Annotations

Nguyen-Dinh, Long-Van; Calatroni, Alberto; Tröster, Gerhard

doi:10.1007/978-3-319-57021-1_18

Long-Van Nguyen-Dinh⁷,
Alberto Calatroni⁷ &
Gerhard Tröster⁷

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

2187 Accesses
4 Citations

Abstract

Crowdsourcing is a promising way to reduce the effort of collecting annotations for training gesture recognition systems. Crowdsourced annotations suffer from “noise” such as mislabeling, or inaccurate identification of start and end time of gesture instances. In this paper we present SegmentedLCSS and WarpingLCSS, two template-matching methods offering robustness when trained with noisy crowdsourced annotations to spot gestures from wearable motion sensors. The methods quantize signals into strings of characters and then apply variations of the longest common subsequence algorithm (LCSS) to spot gestures. We compare the noise robustness of our methods against baselines which use dynamic time warping (DTW) and support vector machines (SVM). The experiments are performed on data sets with various gesture classes (10–17 classes) recorded from accelerometers on arms, with both real and synthetic crowdsourced annotations. WarpingLCSS has similar or better performance than baselines in absence of noisy annotations. In presence of 60% mislabeled instances, WarpingLCSS outperformed SVM by 22% F1-score and outperformed DTW-based methods by 36% F1-score on average. SegmentedLCSS yields similar performance as WarpingLCSS, however it performs one order of magnitude slower. Additionally, we show to use our methods to filter out the noise in the crowdsourced annotation before training a traditional classifier. The filtering increases the performance of SVM by 20% F1-score and of DTW-based methods by 8% F1-score on average in the noisy real crowdsourced annotations.

Editors: Isabelle Guyon, Vassilis Athitsos and Sergio Escalera.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The home page for AMT is http://www.mturk.com.
2.
The home page for Crowdflower is http://crowdflower.com.
3.
Skoda and Opportunity data sets can be downloaded from http://www.wearable.ethz.ch/resources/Dataset.

References

J. Aggarwal, M. Ryoo, Human activity analysis: a review. ACM Comput. Surv. 43(3), 16 (2011)
Article Google Scholar
J. Alon, V. Athitsos, Q. Yuan, S. Sclaroff, A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31(9), 1685–1699 (2009)
Article Google Scholar
R. Amini, P. Gallinari, Semi-supervised learning with an imperfect supervisor. Knowl. Inf. Syst. 8, 385–413 (2005)
Article Google Scholar
D. Angluin, P. Laird, Learning from noisy examples. Mach. Learn. 2, 343–370 (1988). April
Google Scholar
O. Banos, A. Calatroni, M. Damas, H. Pomares, I. Rojas, H. Sagha, J. del R. Millán, G. Tröster, R. Chavarriaga, D. Roggen, Kinect=imu? learning mimo signal mappings to automatically translate activity recognition systems across sensor modalities, in Proceedings of the 2012 16th International Symposium on Wearable Computers (ISWC), 2012, pp. 92–99
Google Scholar
L. Bao, S.S. Intille, Activity recognition from user-annotated acceleration data, in Proceedings of the 2nd International Conference on Pervasive Computing 2004
Google Scholar
B. Bauer, K. Karl-Friedrich, Towards an automatic sign language recognition system using subunits, in International Gesture Workshop on Gesture and Sign Languages in, Human-Computer Interaction, 2002, pp. 64–75
Google Scholar
M. Berchtold, M. Budde, D. Gordon, H. Schmidtke, M. Beigl, Actiserv: activity recognition service for mobile phones, in Proceedings of the 2010 14th International Symposium on Wearable Computers (ISWC), 2010, pp. 1–8
Google Scholar
R. Bowden, D. Windridge, T. Kadir, A. Zisserman, M. Brady, A linguistic feature vector for the visual interpretation of sign language, in European Conference on Computer Vision, ECCV ’04. 2004
Google Scholar
C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 27 (2011)
Google Scholar
L. Chen, J. Hoey, C.D. Nugent, D.J. Cook, Z. Yu, Sensor-based activity recognition, in IEEE Transactions on Systems, Man and Cybernetics 2012
Google Scholar
H. Cooper, E.-J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13(1), 2205–2231 (2012)
MATH Google Scholar
T. H. Cormen, C. Stein, R. L. Rivest, C. E. Leiserson, Introduction to Algorithms, 2nd edn, (2001). ISBN 0070131511
Google Scholar
A.P. Dawid, A.M. Skene, Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28(1), 20–28 (1979)
Article Google Scholar
J. Deng, H. Tsui, An HMM-based approach for gesture segmentation and recognition, in Proceedings of the International Conference on Pattern Recognition, ICPR ’00 2000
Google Scholar
A. Doan, R. Ramakrishnan, A.Y. Halevy, Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 89–96 (2011)
Article Google Scholar
M. Elmezain, A. Al-Hamadi, B. Michaelis, Improving hand gesture recognition using 3D combined features, in Proceedings of the 2nd International Conference on Machine Vision, ICMV ’09, 2009, pp. 128–132
Google Scholar
G. Fang, X. Gao, W. Gao, Y. Chen, A novel approach to automatically extracting basic units from chinese sign language, in Proceedings of the 17th International Conference on Pattern Recognition, vol. 4, 2004, pp. 454–457
Google Scholar
J. Froehlich, M.Y. Chen, S. Consolvo, B. Harrison, J.A. Landay, Myexperience: a system for in situ tracing and capturing of user feedback on mobile phones, in Proceedings of the 5th International Conference on Mobile Systems, Applications and Services, MobiSys ’07 2007
Google Scholar
D. Frolova, H. Stern, S. Berman, Most probable longest common subsequence for recognition of gesture character input. IEEE Trans. Cybern. 43(3), 871–880 (2013)
Article Google Scholar
T.-C. Fu, A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Article Google Scholar
N. Gayar, F. Schwenker, G. Palm, A study of the robustness of KNN classifiers trained using soft labels, in Artificial Neural Networks in Pattern Recognition, vol. 4087, 2006
Google Scholar
I. Guyon, J. Makhoul, R. Schwartz, V. Vapnik, What size test set gives good error rate estimates? IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 52–64 (1998)
Article Google Scholar
J. Hao, T. Shibata, Digit-writing hand gesture recognition by hand-held camera motion analysis, in Proceedings of the 3rd International Conference on Signal Processing and Communication Systems, ICSPCS ’09, 2009, pp. 1–5
Google Scholar
B. Hartmann, N. Link, Gesture recognition with inertial sensors and optimized DTW prototypes, in Proceedings of the 2010 IEEE International Conference on Systems Man and Cybernetics (SMC) 2010
Google Scholar
Z. He, L. Jin, L. Zhen, and J. Huang. Gesture recognition based on 3D accelerometer for cell phones interaction, in IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2008, pp. 217–220
Google Scholar
J. Howe, The Rise of Crowdsourcing, June 2006. http://www.wired.com/wired/archive/14.06/crowds.html. Accessed 20 July 2010
P.G. Ipeirotis, F. Provost, J. Wang, Quality management on Amazon Mechanical Turk, in Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10 2010, pp. 64–67
Google Scholar
H. Junker, O. Amft, P. Lukowicz, G. Tröster, Gesture spotting with body-worn inertial sensors to detect user activities. Pattern Recognit. 41(6), 2010–2024 (2008)
Article MATH Google Scholar
C. Keskin, A. Cemgil, L. Akarun. DTW based clustering to improve hand gesture recognition, in Proceedings of the 2nd International Conference on Human Behavior Unterstanding, HBU’11, 2011, pp. 72–81
Google Scholar
A. Kittur, E. H. Chi, B. Suh, Crowdsourcing user studies with mechanical turk, in Proceedings of the Twenty-sixth SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, 2008, pp. 453–456
Google Scholar
M. H. Ko, G. West, S. Venkatesh, M. Kumar, Online context recognition in multisensor systems using dynamic time warping, in Proceedings of the Intelligent Sensors, Sensor Networks and Information Processing Conference 2005
Google Scholar
W.S. Lasecki, Y.C. Song, H. Kautz, J. P. Bigham, Real-time crowd labeling for deployable activity recognition, in Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW ’13, 2013, pp. 1203–1212
Google Scholar
N. D. Lawrence, B. Schölkopf, Estimating a kernel fisher discriminant in the presence of label noise, in Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, 2001, pp. 306–313
Google Scholar
H.-K. Lee, J.H. Kim, An hmm-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 961–973 (1999)
Article Google Scholar
L.V. Nguyen-Dinh, D. Roggen, A. Calatroni, G. Tröster, Improving online gesture recognition with template matching methods in accelerometer data, in Proceedings of the 12th International Conference on Intelligent Systems Design and Applications (ISDA) 2012
Google Scholar
L.V. Nguyen-Dinh, U. Blanke, and G. Tröster, Towards scalable activity recognition: adapting zero-effort crowdsourced acoustic models, in Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM ’13 2013a
Google Scholar
L.V. Nguyen-Dinh, M. Rossi, U. Blanke, and G. Tröster, Combining crowd-generated media and personal data: Semi-supervised learning for context recognition, in Proceedings of the 1st ACM International Workshop on Personal Data Meets Distributed Multimedia, PDM ’13 2013b
Google Scholar
L.V. Nguyen-Dinh, C. Waldburger, D. Roggen, G. Tröster, Tagging human activities in video by crowdsourcing, in Proceedings of the ACM International Conference on Multimedia Retrieval, ICMR ’13 2013c
Google Scholar
L.V. Nguyen-Dinh, A. Calatroni, G. Tröster, Towards a unified system for multimodal activity spotting: challenges and a proposal, in Proceedings of the ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication, UbiComp ’14 Adjunct 2014
Google Scholar
N. Ravi, N.D, P. Mysore, M.L. Littman, Activity recognition from accelerometer data, in Proceedings of the Seventeenth Conference on Innovative Applications of Artificial Intelligence(IAAI), AAAI Press 2005, pp. 1541–1546
Google Scholar
V.C. Raykar, S. Yu, L.H. Zhao, G.H. Valadez, C. Florin, L. Bogoni, L. Moy, Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
MathSciNet Google Scholar
D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Forster, G. Troster, et al. Collecting complex activity data sets in highly rich networked sensor environments, in Proceedings of the 7th International Conference on Networked Sensing Systems. IEEE Press, 2010
Google Scholar
M. Rossi, O. Amft, G. Tröster, Recognizing daily life context using web-collected audio data, in Proceedings of the 16th IEEE International Symposium on Wearable Computers (ISWC) 2012
Google Scholar
T. Schlömer, B. Poppinga, N. Henze, S. Boll, Gesture recognition with a Wii controller, in Proceedings of the 2nd International Conference on Tangible and Embedded Interaction 2008
Google Scholar
V.S. Sheng, F. Provost, P.G. Ipeirotis, Get another label? improving data quality and data mining using multiple, noisy labelers, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08 2008
Google Scholar
H. Stern, M. Shmueli, S. Berman, Most discriminating segment—longest common subsequence (MDSLCS) algorithm for dynamic hand gesture classification. Pattern Recognit. Lett. 34(15), 1980–1989 (2013)
Article Google Scholar
T. Stiefmeier, D. Roggen, G. Ogris, P. Lukowicz, G. Tröster, Wearable activity tracking in car manufacturing. IEEE Pervasive Comput. Mag. 7(2), 1–6 (2008)
Article Google Scholar
M. Stikic, D. Larlus, S. Ebert, B. Schiele, Weakly supervised recognition of daily life activities with wearable sensors. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2521–2537 (2011)
Article Google Scholar
K. Van Laerhoven, D. Kilian, B. Schiele, Using rhythm awareness in long-term activity recognition, in Proceedings of the IEEE International Symposium on Wearable Computers (ISWC) 2008
Google Scholar
C. Vogler, D.N. Metaxas, Toward scalability in ASL recognition: breaking down signs into phonemes, in Gesture-Based Communication in Human-Computer Interaction, Lecture Notes in Computer Science 1999, pp. 211–224
Google Scholar
J.A. Ward, P. Lukowicz, H.W. Gellersen, Performance metrics for activity recognition. ACM Trans. Intell. Syst. Technol. 2(1), 6 (2011)
Article Google Scholar
A. Wilson, A. Bobick, Parametric hidden markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 884–900 (1999)
Article Google Scholar
J. Wu, G. Pan, D. Zhang, G. Qi, S. Li, Gesture recognition with a 3-D accelerometer, in Proceedings of the 6th International Conference on Ubiquitous Intelligence and Computing, UIC ’09, 2009, pp. 25–38
Google Scholar
H.S. Yoon, J. Soh, Y.J. Bae, H.S. Yang, Hand gesture recognition using combined features of location, angle and velocity. Pattern Recogn. 34(7), 1491–1501 (2001)
Article MATH Google Scholar
M.C. Yuen, I. King, K.S. Leung, A survey of crowdsourcing systems, in SocialCom/PASSAT, 2011, pp. 766–773
Google Scholar
P. Zappi, C. Lombriser, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, G. Tröster, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, in Proceedings of the 5th European Conference on Wireless Sensor Networks, EWSN’08, 2008, pp. 17–33
Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Daniel Roggen (University of Sussex) for his useful comments. This work has been supported by the Swiss Hasler Foundation project Smart-DAYS.

Author information

Authors and Affiliations

Wearable Computing Lab, ETH Zürich, ETZ H 95, Gloriastrasse 35, 8092, Zürich, Switzerland
Long-Van Nguyen-Dinh, Alberto Calatroni & Gerhard Tröster

Authors

Long-Van Nguyen-Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Calatroni
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Tröster
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long-Van Nguyen-Dinh .

Editor information

Editors and Affiliations

University of Barcelona, Barcelona, Spain
Sergio Escalera
ChaLearn, Berkeley, California, USA
Isabelle Guyon
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
Vassilis Athitsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nguyen-Dinh, LV., Calatroni, A., Tröster, G. (2017). Robust Online Gesture Recognition with Crowdsourced Annotations. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-57021-1_18
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics