research-article

Presentation sensei: a presentation training system using speech and image processing

Authors:

Kazutaka Kurihara,

Yosuke Matsusaka,

Takeo IgarashiAuthors Info & Claims

ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

Pages 358 - 365

https://doi.org/10.1145/1322192.1322256

Published: 12 November 2007 Publication History

Abstract

In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining speech and image processing techniques. Based on the results of the analysis, the system gives the speaker instant feedback with respect to the speaking rate, eye contact with the audience, and timing. It also alerts the speaker when some of these indices exceed predefined warning thresholds. After the presentation, the system generates visual summaries of the analysis results for the speaker's self-examinations. Our goal is not to improve the content on a semantic level, but to improve the delivery of it by reducing inappropriate basic behavior patterns. We asked a few test users to try the system and they found it very useful for improving their presentations. We also compared the system's output with the observations of a human evaluator. The result shows that the system successfully detected some inappropriate behavior. The contribution of this work is to introduce a practical recognition-based human training system and to show its feasibility despite the limitations of state-of-the-art speech and video recognition technologies.

References

[1]

AR-toolkit. http://www.hitl.washington.edu/artoolkit/

[2]

julian. http://julius.sourceforge.jp

[3]

PowerPoint. http://www.microsoft.com/office/powerpoint/prodinfo/

[4]

TalkMan. http://www.jp.playstation.com/scej/title/talkman/

[5]

Shibaimichi. http://www.jp.playstation.com/scej/title/shibaimichi/index.html

[6]

Heer et al. Presiding Over Accidents: System Mediation of Human Action. In CHI'04, pp.463--470, 2004.

Digital Library

[7]

Hindus et al. Ubiquitous Audio: Capturing Spontaneous Collaboration. In CSCW'02, pp.210--217, 1992.

Digital Library

[8]

Kurihara et al. Speech Pen: Predictive Handwriting based on Ambient Multimodal Recognition. In CHI'06, pp. 851--860, 2006.

Digital Library

[9]

Lyons et al. Augmenting Conversations Using Dual-Purpose Speech. In UIST'02, pp. 237--246, 2004.

Digital Library

[10]

A. Mehrabian. Silent messages, Implicit Communication of Emotions and Attitudes. 2nd Ed., Wadsworth Pub. Co., 1981.

[11]

S. Kori. The Acoustic Characteristics of Styles Seen in Announcements and Narrations. In 16th Conference of Acoustic Society Japan, pp.151--156, 2002, in Japanese.

[12]

Y. Matsusaka. 2D Omni Directional Head and Head-Parts Tracking Technique Using Subspace Method and SVM. IEICE Technical Report PRMU, Vol.106, no.72, pp.19--24, 2006, in Japanese.

[13]

Itou et al. A Japanese Spontaneous Speech Corpus Collected using Automatically Inferencing Wizard of OZ System. J. Acoust. Soc. Jpn. (E), Vol. 20, No. 3, 1999.

[14]

{anonimized}

[15]

Ikari et al. English CALL System with Functions of Speech Segmentation and Pronunciation Evaluation Using Speech Recognition Technology. In ICSLP'2002, pp.1229--1232, 2002.

[16]

Goto et al. A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In Eurospeech '99, pp.227--230, 1999.

[17]

Goto et al. Speech Completion: New Speech Interface with On-demand Completion Assistance, In HCI International 2001, Vol. 1, pp.198--202, 2001.

[18]

Kitayama et al. "SWITCH" on Speech. IPSJ SIG Technical Report, SLP-46-12, Vol.2003, No.58, pp.67--72, 2003, in Japanese.

[19]

I. Takeuchi. More Than 90% is Judged by Your Look. Shincho-sha Pub. Co., ISBN: 4106101378, 2005 in Japanese.

[20]

H. Yahata. Perfect Presentation. Seisansei Shuppan Pub. Co., ISBN: 4820115634, 1998 in Japanese.

[21]

Oviatt et al., Individual Differences in Multimodal Integration Patterns: What Are They And Why Do They Exist?. In CHI'05, pp.241--249, 2005.

Digital Library

[22]

Rosenberg et al. Acoustic/Prosodic Correlates of Charismatic Speech. In Eurospeech'05, pp.513--516, 2005.

Cited By

Chiba MYamada WOchiai K(2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.938
NAGAO K(2023)Virtual Reality Campuses as New Educational MetaversesIEICE Transactions on Information and Systems10.1587/transinf.2022ETI0001E106.D:2(93-100)Online publication date: 1-Feb-2023
https://doi.org/10.1587/transinf.2022ETI0001
Kwon SLee HPark SHeo YLee KKang Y(2023)‘Um, so like, is this how I speak?’: design implications for automated visual feedback systems on speechBehaviour & Information Technology10.1080/0144929X.2023.2271997(1-20)Online publication date: 9-Nov-2023
https://doi.org/10.1080/0144929X.2023.2271997
Show More Cited By

Index Terms

Presentation sensei: a presentation training system using speech and image processing
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Prominence detection for presentation training system
SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

We propose a method for detecting prominences in a Japanese presentation. A prominence is unclearly defined in Japanese, because it is covered only in phonetics and Japanese language education. Thus we describe the literature of phonetics and Japanese ...
MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Improving the intelligibility of dysarthric speech

Dysarthria is a speech motor disorder usually resulting in a substantive decrease in speech intelligibility by the general population. In this study, we have significantly improved the intelligibility of dysarthric vowels of one speaker from 48% to 54%, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

November 2007

402 pages

ISBN:9781595938176

DOI:10.1145/1322192

General Chairs:
Kenji Mase
Nagoya University, Japan
,
Dominic Massaro
UC Santa Cruz, USA
,
Program Chairs:
Kazuya Takeda
Nagoya University, Japan
,
Deb Roy
MIT, USA
,
Alexandros Potamianos
Technical University of Crete, Greece

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI07

Sponsor:

ICMI07: International Conference on Multimodal Interface

November 12 - 15, 2007

Aichi, Nagoya, Japan

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
756
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)4

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chiba MYamada WOchiai K(2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.938
NAGAO K(2023)Virtual Reality Campuses as New Educational MetaversesIEICE Transactions on Information and Systems10.1587/transinf.2022ETI0001E106.D:2(93-100)Online publication date: 1-Feb-2023
https://doi.org/10.1587/transinf.2022ETI0001
Kwon SLee HPark SHeo YLee KKang Y(2023)‘Um, so like, is this how I speak?’: design implications for automated visual feedback systems on speechBehaviour & Information Technology10.1080/0144929X.2023.2271997(1-20)Online publication date: 9-Nov-2023
https://doi.org/10.1080/0144929X.2023.2271997
Onishi TYamauchi AOgushi AIshii RFukayama ANakamura TMiyata A(2022)Modeling Japanese Praising Behavior by Analyzing Audio and Visual BehaviorsFrontiers in Computer Science10.3389/fcomp.2022.8151284Online publication date: 16-Mar-2022
https://doi.org/10.3389/fcomp.2022.815128
Okrasa MWoźniak MWalczak ABartłomiejczyk NMüller HGrudzień KRomanowski A(2022)Supporting Self-development of Speech Delivery for Education ProfessionalsProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3570588(251-253)Online publication date: 27-Nov-2022
https://dl.acm.org/doi/10.1145/3568444.3570588
Orii LOgawa NHatada YNarumi T(2022)Designing for Speech Practice Systems: How Do User-Controlled Voice Manipulation and Model Speakers Impact Self-Perceptions of Voice?Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502093(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502093
Nizer A FIksudha Bhargavi RAgalyah PRaveendran MKuruppu ARupasinghe S(2022)ELIZA: Smart Monitoring and Reporting Toast Master System2022 4th International Conference on Advancements in Computing (ICAC)10.1109/ICAC57685.2022.10025147(270-275)Online publication date: 9-Dec-2022
https://doi.org/10.1109/ICAC57685.2022.10025147
Rasouli SGupta GNilsen EDautenhahn K(2022)Potential Applications of Social Robots in Robot-Assisted Interventions for Social AnxietyInternational Journal of Social Robotics10.1007/s12369-021-00851-014:5(1-32)Online publication date: 25-Jan-2022
https://doi.org/10.1007/s12369-021-00851-0
Echigo HAbe KIgarashi YKobayashi M(2022)Presentation Method for Conveying Nonverbal Information in Online Conference Presentations with a Virtual StageCollaboration Technologies and Social Computing10.1007/978-3-031-20218-6_7(98-111)Online publication date: 23-Oct-2022
https://doi.org/10.1007/978-3-031-20218-6_7
Ochoa X(2022)Multimodal Systems for Automated Oral Presentation Feedback: A Comparative AnalysisThe Multimodal Learning Analytics Handbook10.1007/978-3-031-08076-0_3(53-78)Online publication date: 31-May-2022
https://doi.org/10.1007/978-3-031-08076-0_3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten