ABSTRACT
The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.
- L. Batrinca, G. Stratou, A. Shapiro, L.-P. Morency, and S. Scherer. Cicero-towards a multimodal virtual audience platform for public speaking training. In Intelligent Virtual Agents, pages 116--128, 2013.Google ScholarCross Ref
- J. Bernstein, A. V. Moere, and J. Cheng. Validating automated speaking tests. Language Testing, 27(3):355, 2010.Google ScholarCross Ref
- P. Boersma. Praat, a system for doing phonetics by computer. Glot international, 5(9/10):341--345, 2002.Google Scholar
- R. E. Carlson and D. Smith-Howell. Classroom public speaking assessment: Reliability and validity of selected evaluation instruments. Communication Education, 44(2):87--97, 1995.Google ScholarCross Ref
- L. Chen, G. Feng, J. Joe, C. W. Leong, C. Kitchen, and C. M. Lee. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th international conference on multimodal interfaces. ACM, 2014. Google ScholarDigital Library
- L. Chen, K. Zechner, and X. Xi. Improved pronunciation features for construct-driven assessment of non-native spontaneous speech. In NAACL-HLT, 2009. Google ScholarDigital Library
- N. H. de Jong and T. Wempe. Praat script to detect syllable nuclei and measure speech rate automatically. Behavior research methods, 41(2):385--390, 2009.Google Scholar
- ESPOL. Description of the oral presentation quality corpus. http://www.sigmla.org/datasets/, 2014.Google Scholar
- H. Franco, H. Bratt, R. Rossier, V. R. Gadde, E. Shriberg, V. Abrash, and K. Precoda. EduSpeak: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Language Testing, 27(3):401, 2010.Google ScholarCross Ref
- J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002. Google ScholarDigital Library
- R. Hincks. Measures and perceptions of liveliness in student oral presentation speech: A proposal for an automatic feedback mechanism. System, 33(4):575--591, 2005.Google ScholarCross Ref
- J. B. Hirschberg and A. Rosenberg. Acoustic/prosodic and lexical correlates of charismatic speech. In Proc. of InterSpeech, 2005.Google Scholar
- M. Kuhn. Building predictive models in r using the caret package. Journal of Statistical Software, 28(5):1--26, 2008.Google ScholarCross Ref
- K. Kurihara, M. Goto, J. Ogata, Y. Matsusaka, and T. Igarashi. Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces, pages 358--365. ACM, 2007. Google ScholarDigital Library
- P. C. Kyllonen. Measurement of 21st century skills within the common core state standards. In Invitational Research Symposium on Technology Enhanced Assessments. May, pages 7--8, 2012.Google Scholar
- L.-P. Morency, J. Whitehill, and J. Movellan. Monocular head pose estimation using generalized adaptive view-based appearance model. Image and Vision Computing, 28(5):754--761, 2010. Google ScholarDigital Library
- A.-T. Nguyen, W. Chen, and M. Rauterberg. Online feedback system for public speakers. In IEEE Symp. e-Learning, e-Management and e-Services. Citeseer, 2012.Google ScholarCross Ref
- C. B. Pull. Current status of knowledge on public-speaking anxiety. Current opinion in psychiatry, 25(1):32--38, 2012.Google Scholar
- S. Scherer, G. Layher, J. Kane, H. Neumann, and N. Campbell. An audiovisual political speech analysis incorporating eye-tracking and perception data. In LREC, pages 1114--1120, 2012.Google Scholar
- S. Scherer, G. Stratou, and L.-P. Morency. Audiovisual behavior descriptors for depression assessment. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 135--140. ACM, 2013. Google ScholarDigital Library
- L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.Google ScholarCross Ref
- D. Silverstein and T. Zhang. System and method of providing evaluation feedback to a speaker while giving a real-time oral presentation, Oct. 2003. U.S. Classification 715/730; International Classification G09B19/04; Cooperative Classification G09B19/04; European Classification G09B19/04.Google Scholar
- A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199--222, 2004. Google ScholarDigital Library
- M. Swift, G. Ferguson, L. Galescu, Y. Chu, C. Harman, H. Jung, I. Perera, Y. C. Song, J. Allen, and H. Kautz. A multimodal corpus for integrated language and action. In Proc. of the Int. Workshop on MultiModal Corpora for Machine Learning, 2012.Google Scholar
- A. E. Ward. The assessment of public speaking: A pan-european view. In Information Technology Based Higher Education and Training (ITHET), 2013 International Conference on, pages 1--5. IEEE, 2013.Google ScholarCross Ref
- S. M. Witt. Use of Speech Recognition in Computer-assisted Language Learning. PhD thesis, University of Cambridge, 1999.Google Scholar
- Z. Zhang. Microsoft kinect sensor and its effect. Multimedia, IEEE, 19(2):4--10, 2012. Google ScholarDigital Library
Index Terms
- Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus: Presentation Delivery and Slides Quality
Recommendations
Towards Automated Assessment of Public Speaking Skills Using Multimodal Cues
ICMI '14: Proceedings of the 16th International Conference on Multimodal InteractionTraditional assessments of public speaking skills rely on human scoring. We report an initial study on the development of an automated scoring model for public speaking performances using multimodal technologies. Task design, rubric development, and ...
MLA'14: Third Multimodal Learning Analytics Workshop and Grand Challenges
ICMI '14: Proceedings of the 16th International Conference on Multimodal InteractionThis paper summarizes the third Multimodal Learning Analytics Workshop and Grand Challenges (MLA'14). This subfield of Learning Analytics focuses on the interpretation of the multimodal interactions that occurs in learning environments, both digital and ...
Multimodal corpus of multiparty conversations in L1 and L2 languages and findings obtained from it
To investigate the differences in communicative activities by the same interlocutors in Japanese (their L1) and in English (their L2), an 8-h multimodal corpus of multiparty conversations was collected. Three subjects participated in each conversational ...
Comments