skip to main content
10.1145/1322192.1322256acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Presentation sensei: a presentation training system using speech and image processing

Published: 12 November 2007 Publication History

Abstract

In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining speech and image processing techniques. Based on the results of the analysis, the system gives the speaker instant feedback with respect to the speaking rate, eye contact with the audience, and timing. It also alerts the speaker when some of these indices exceed predefined warning thresholds. After the presentation, the system generates visual summaries of the analysis results for the speaker's self-examinations. Our goal is not to improve the content on a semantic level, but to improve the delivery of it by reducing inappropriate basic behavior patterns. We asked a few test users to try the system and they found it very useful for improving their presentations. We also compared the system's output with the observations of a human evaluator. The result shows that the system successfully detected some inappropriate behavior. The contribution of this work is to introduce a practical recognition-based human training system and to show its feasibility despite the limitations of state-of-the-art speech and video recognition technologies.

References

[1]
AR-toolkit. http://www.hitl.washington.edu/artoolkit/
[2]
julian. http://julius.sourceforge.jp
[3]
PowerPoint. http://www.microsoft.com/office/powerpoint/prodinfo/
[4]
TalkMan. http://www.jp.playstation.com/scej/title/talkman/
[5]
Shibaimichi. http://www.jp.playstation.com/scej/title/shibaimichi/index.html
[6]
Heer et al. Presiding Over Accidents: System Mediation of Human Action. In CHI'04, pp.463--470, 2004.
[7]
Hindus et al. Ubiquitous Audio: Capturing Spontaneous Collaboration. In CSCW'02, pp.210--217, 1992.
[8]
Kurihara et al. Speech Pen: Predictive Handwriting based on Ambient Multimodal Recognition. In CHI'06, pp. 851--860, 2006.
[9]
Lyons et al. Augmenting Conversations Using Dual-Purpose Speech. In UIST'02, pp. 237--246, 2004.
[10]
A. Mehrabian. Silent messages, Implicit Communication of Emotions and Attitudes. 2nd Ed., Wadsworth Pub. Co., 1981.
[11]
S. Kori. The Acoustic Characteristics of Styles Seen in Announcements and Narrations. In 16th Conference of Acoustic Society Japan, pp.151--156, 2002, in Japanese.
[12]
Y. Matsusaka. 2D Omni Directional Head and Head-Parts Tracking Technique Using Subspace Method and SVM. IEICE Technical Report PRMU, Vol.106, no.72, pp.19--24, 2006, in Japanese.
[13]
Itou et al. A Japanese Spontaneous Speech Corpus Collected using Automatically Inferencing Wizard of OZ System. J. Acoust. Soc. Jpn. (E), Vol. 20, No. 3, 1999.
[14]
{anonimized}
[15]
Ikari et al. English CALL System with Functions of Speech Segmentation and Pronunciation Evaluation Using Speech Recognition Technology. In ICSLP'2002, pp.1229--1232, 2002.
[16]
Goto et al. A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In Eurospeech '99, pp.227--230, 1999.
[17]
Goto et al. Speech Completion: New Speech Interface with On-demand Completion Assistance, In HCI International 2001, Vol. 1, pp.198--202, 2001.
[18]
Kitayama et al. "SWITCH" on Speech. IPSJ SIG Technical Report, SLP-46-12, Vol.2003, No.58, pp.67--72, 2003, in Japanese.
[19]
I. Takeuchi. More Than 90% is Judged by Your Look. Shincho-sha Pub. Co., ISBN: 4106101378, 2005 in Japanese.
[20]
H. Yahata. Perfect Presentation. Seisansei Shuppan Pub. Co., ISBN: 4820115634, 1998 in Japanese.
[21]
Oviatt et al., Individual Differences in Multimodal Integration Patterns: What Are They And Why Do They Exist?. In CHI'05, pp.241--249, 2005.
[22]
Rosenberg et al. Acoustic/Prosodic Correlates of Charismatic Speech. In Eurospeech'05, pp.513--516, 2005.

Cited By

View all
  • (2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
  • (2023)Virtual Reality Campuses as New Educational MetaversesIEICE Transactions on Information and Systems10.1587/transinf.2022ETI0001E106.D:2(93-100)Online publication date: 1-Feb-2023
  • (2023)‘Um, so like, is this how I speak?’: design implications for automated visual feedback systems on speechBehaviour & Information Technology10.1080/0144929X.2023.2271997(1-20)Online publication date: 9-Nov-2023
  • Show More Cited By

Index Terms

  1. Presentation sensei: a presentation training system using speech and image processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces
    November 2007
    402 pages
    ISBN:9781595938176
    DOI:10.1145/1322192
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. image processing
    2. presentation
    3. sensei
    4. speech processing
    5. training

    Qualifiers

    • Research-article

    Conference

    ICMI07
    Sponsor:
    ICMI07: International Conference on Multimodal Interface
    November 12 - 15, 2007
    Aichi, Nagoya, Japan

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
    • (2023)Virtual Reality Campuses as New Educational MetaversesIEICE Transactions on Information and Systems10.1587/transinf.2022ETI0001E106.D:2(93-100)Online publication date: 1-Feb-2023
    • (2023)‘Um, so like, is this how I speak?’: design implications for automated visual feedback systems on speechBehaviour & Information Technology10.1080/0144929X.2023.2271997(1-20)Online publication date: 9-Nov-2023
    • (2022)Modeling Japanese Praising Behavior by Analyzing Audio and Visual BehaviorsFrontiers in Computer Science10.3389/fcomp.2022.8151284Online publication date: 16-Mar-2022
    • (2022)Supporting Self-development of Speech Delivery for Education ProfessionalsProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3570588(251-253)Online publication date: 27-Nov-2022
    • (2022)Designing for Speech Practice Systems: How Do User-Controlled Voice Manipulation and Model Speakers Impact Self-Perceptions of Voice?Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502093(1-14)Online publication date: 29-Apr-2022
    • (2022)ELIZA: Smart Monitoring and Reporting Toast Master System2022 4th International Conference on Advancements in Computing (ICAC)10.1109/ICAC57685.2022.10025147(270-275)Online publication date: 9-Dec-2022
    • (2022)Potential Applications of Social Robots in Robot-Assisted Interventions for Social AnxietyInternational Journal of Social Robotics10.1007/s12369-021-00851-014:5(1-32)Online publication date: 25-Jan-2022
    • (2022)Presentation Method for Conveying Nonverbal Information in Online Conference Presentations with a Virtual StageCollaboration Technologies and Social Computing10.1007/978-3-031-20218-6_7(98-111)Online publication date: 23-Oct-2022
    • (2022)Multimodal Systems for Automated Oral Presentation Feedback: A Comparative AnalysisThe Multimodal Learning Analytics Handbook10.1007/978-3-031-08076-0_3(53-78)Online publication date: 31-May-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media