Multimedia Corpus of In-Car Speech Communication

Kawaguchi, Nobuo; Takeda, Kazuya; Itakura, Fumitada

doi:10.1023/B:VLSI.0000015094.60008.dc

Nobuo Kawaguchi¹,
Kazuya Takeda¹ &
Fumitada Itakura¹

84 Accesses
9 Citations
Explore all metrics

Abstract

An ongoing project for constructing a multimedia corpus of dialogues under the driving condition is reported. More than 500 subjects have been enrolled in this corpus development and more than 2 gigabytes of signals have been collected during approximately 60 minutes of driving per subject. Twelve microphones and three video cameras are installed in a car to obtain audio and video data. In addition, five signals regarding car control and the location of the car provided by the Global Positioning System (GPS) are recorded. All signals are simultaneously recorded directly onto the hard disk of the PCs onboard the specially designed data collection vehicle (DCV). The in-car dialogues are initiated by a human operator, an automatic speech recognition (ASR) system and a wizard of OZ (WOZ) system so as to collect as many speech disfluencies as possible.

In addition to the details of data collection, in this paper, preliminary results on intermedia signal conversion are described as an example of the corpus-based in-car speech signal processing research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An audio-visual corpus for multimodal automatic speech recognition

Article Open access 07 January 2017

Andrzej Czyzewski, Bozena Kostek, … Marcin Szykulski

A Corpus of Neutral Voice Speech in Brazilian Portuguese

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

References

J.C. Junqua and J.P. Haton, Robustness in Automatic Speech Recognition. Kluwer Academic Publishers, 1996.
D. Roy, ”Grounded' Speech Communication,’ in Proc. of the International Conference on Spoken Language Processing, ICSLP 2000, Beijin, 2000, pp. IV69-IV72
P. Gelin and J.C. Junqua, ‘Techniques for Robust Speech Recognition in the Car Environment,’ in Proc. of European Conference Speech Communication and Technology, EUROSPEECH'99, Budapest, 1999.
M.J. Hunt, ‘Some Experiences in In-Car Speech Recognition,’ in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, 1999, pp. 25-31
P. Geutner, L. Arevalo, and J. Breuninger, ‘VODIS-Voice-Operated Driver Information Systems: A Usability Study on Advanced Speech Technologies for Car Environments,’ in Proc. of International Conference on Spoken Language Processing, ICSLP2000, Beijin, 2000, pp. IV378-IV381.
A. Moreno, B. Lindberg, C. Draxler, G. Richard, K. Choukri, J. Allen, and Stephan Eule, ‘SpeechDat-Car: A Large Speech Database for Automotive Environments,’ in Proc. of 2nd Int'l Conference on Language Resources and Evaluation, Athens, LREC 2000.
N. Kawaguchi, S. Matsubara, H. Iwa, S. Kajita, K. Takeda, F. Itakura, and Y. Inagaki, ‘Construction of Speech Corpus in Moving Car Environment,’ in Proc. of International Conference on Spoken Language Processing, ICSLP2000, Beijin, 2000 pp. 362-365.
T. Kawahara, T. Kobayashi, K. Takeda, N. Minematsu, K. Itou, M. Yamamoto, A. Yamada, T. Utsuro, and K. Shikano, ‘Japanese Dictation Toolkit: Plug-and-Play Framework for Speech Recognition R&D,’ in Proc. of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU'99), 1999 pp. 393-396.
K. Itou, M. Yamamoto, K. Takeda, T. Takezawa, T. Matsuoka, T. Kobayashi, K. Shikano, and S. Itahashi, JNAS: Japanese Speech Corpus for Large Vocabulary Continuous Speech Recognition Research, J. Acoust. Soc. Jpn.(E), vol. 20, no. 3, 1999, pp. 199-206.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Integrated Acoustic Information Research, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
Nobuo Kawaguchi, Kazuya Takeda & Fumitada Itakura

Authors

Nobuo Kawaguchi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Fumitada Itakura
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kawaguchi, N., Takeda, K. & Itakura, F. Multimedia Corpus of In-Car Speech Communication. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 153–159 (2004). https://doi.org/10.1023/B:VLSI.0000015094.60008.dc

Download citation

Published: 01 February 2004
Issue Date: February 2004
DOI: https://doi.org/10.1023/B:VLSI.0000015094.60008.dc

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multimedia Corpus of In-Car Speech Communication

Abstract

Access this article

Similar content being viewed by others

An audio-visual corpus for multimodal automatic speech recognition

A Corpus of Neutral Voice Speech in Brazilian Portuguese

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimedia Corpus of In-Car Speech Communication

Abstract

Access this article

Similar content being viewed by others

An audio-visual corpus for multimodal automatic speech recognition

A Corpus of Neutral Voice Speech in Brazilian Portuguese

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation