extended-abstract

Designing speech interaction for the Sony Xperia Ear and Oakley Radar Pace smartglasses

Authors:

MobileHCI '18: Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct

Pages 379 - 384

https://doi.org/10.1145/3236112.3236171

Published: 03 September 2018 Publication History

Get Access

Abstract

Speech synthesis is a key enabling technology for wearable technology. We discuss the design challenges in customising speech synthesis for the Sony Xperia Ear, and the Oakley Radar Pace smartglasses. In order to support speech interaction designers working on novel interactive eye-free mobile devices, specific functionality is required including: flexibility in terms of performance, memory footprint, disk requirements, server or local configurations, methods for personification and branding, architectures for fast reactive interfaces, and customisation for content, genres and speech styles. We describe implementations of this required functionality and how this functionality can be made available to engineers and designers working on 3rd party devices and the impact they can have on user experience. To conclude we discuss why some customers are reluctant to depend on speech services from well known providers such as Google and Amazon and consider the barrier to entry for custom built personal digital advisors.

References

[1]

M.P. Aylett and C.J. Pidcock. 2008. Adding and Controlling Emotion in Synthesised Speech. (10 September 2008). UK Patent GB2447263A.

Google Scholar

[2]

Matthew P Aylett, Per Ola Kristensson, Steve Whittaker, and Yolanda Vazquez-Alvarez. 2014. None of a CHInd: relationship counselling for HCI and speech technology. In CHI'14. ACM, 749--760.

Digital Library

Google Scholar

[3]

Matthew P Aylett and Shaun Lawson. 2016. The Smartphone: A Lacanian Stain, A Tech Killer, and an Embodiment of Radical Individualism. In CHI'16. ACM, 501--511.

Digital Library

Google Scholar

[4]

Matthew P. Aylett and Christopher J. Pidcock. 2007. The Cere Voice Characterful Speech Synthesiser SDK. In AISB. 174--8.

Google Scholar

[5]

Matthew P Aylett, Graham Pullin, David A Braude, Blaise Potard, Shannon Hennig, and Marilia Antunes Ferreira. 2016. Don't Say Yes, Say Yes: Interacting with Synthetic Speech Using Tonetable. In CHI'16. ACM, 3643--3646.

Digital Library

Google Scholar

[6]

Matthew P Aylett, Alessandro Vinciarelli, and Mirjam Wester. 2017. Speech Synthesis for the Generation of Artificial Personality. IEEE Transactions on Affective Computing (2017).

Google Scholar

[7]

Cosmin Munteanu, Matt Jones, Sharon Oviatt, Stephen Brewster, Gerald Penn, Steve Whittaker, Nitendra Rajput, and Amit Nanavati. 2013. We need to talk: HCI and the delicate topic of spoken language interaction. In CHI'13. ACM, 2459--2464.

Digital Library

Google Scholar

[8]

C. Nass and S. Brave. 2005. Wired for speech: How voice activates and advances the Human-Computer relationship. The MIT Press.

Digital Library

Google Scholar

[9]

Ben Shneiderman. 2000. The limits of speech recognition. Commun. ACM 43, 9 (2000), 63--65.

Digital Library

Google Scholar

[10]

Thad E Starner. 2002. The role of speech input in wearable computing. Pervasive Computing, IEEE 1, 3 (2002), 89--93.

Digital Library

Google Scholar

Cited By

View all

Aylett MClark LCowan BTorre I(2021)Building and Designing Expressive Speech SynthesisThe Handbook on Socially Interactive Agents10.1145/3477322.3477329(173-212)Online publication date: 10-Sep-2021
https://dl.acm.org/doi/10.1145/3477322.3477329

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction

Speech intelligibility is the most important parameter in evaluation of speech quality. In the contribution, a new objective intelligibility assessment of general speech processing algorithms is proposed. It is based on automatic recognition methods ...
Speech Synthesis Research Based on EGG
GREENCOM-ITHINGS-CPSCOM '13: Proceedings of the 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing

Objective: In this paper, Electroglottograph (EGG) is adopted to improve the naturalness of low bit-rate formant speech synthesis. Methods: EGG inverting waveform is used as glottal excitation of the formant speech synthesis. A new SUV divided method ...

Comments

Information & Contributors

Information

Published In

MobileHCI '18: Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct

September 2018

445 pages

ISBN:9781450359412

DOI:10.1145/3236112

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2018

Check for updates

Author Tags

Qualifiers

Extended-abstract

Conference

MobileHCI '18

Sponsor:

SIGCHI

MobileHCI '18: 20th International Conference on Human-Computer Interaction with Mobile Devices and Services

September 3 - 6, 2018

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 202 of 906 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
98
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Aylett MClark LCowan BTorre I(2021)Building and Designing Expressive Speech SynthesisThe Handbook on Socially Interactive Agents10.1145/3477322.3477329(173-212)Online publication date: 10-Sep-2021
https://dl.acm.org/doi/10.1145/3477322.3477329

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems

Speech Synthesis Research Based on EGG

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations