skip to main content
10.1145/3126858.3131630acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Use of Automatic Speech Recognition Systems for Multimedia Applications

Published: 17 October 2017 Publication History

Abstract

The need to retrieve information in multimedia content increases the demand for systems that use automatic speech recognition. A speech recognition system enables the computer to interpret audio signals, generating approximate textual transcriptions. These systems are based on probabilistic models that create a robust and correct model for human speech. In this paper it is presented a speech recognition systems architecture and a description of its basic components: the acoustic model, language model, lexical and decoder. The training process of acoustic and language models is also presented. Finally, it its presented how these systems can be used in several applications.

References

[1]
Virginia Pinto Campos, TMU Araujo, and GL Souza Filho . 2014. CineAD: Um Sistema de Geracc ao Automática de Roteiros de Audiodescricc ao. Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia) (2014).
[2]
Sandro Athaide Coelho and Jairo Francisco de Souza . 2015. Anotação semântica de transcritos para indexação e busca de vídeos.
[3]
Carlos Daniel Riquelme Cuadros . 2007. Reconhecimento De Voz E De Locutor Em Ambientes Ruidosos: Comparac ao Das Técnicas MFCC E ZCPA.
[4]
Arlindo Oliveira da Veiga . 2013. Treino n ao supervisionado de modelos acústicos para reconhecimento de fala. Ph.D. Dissertation. bibinfoschoolUniversidade de Coimbra.
[5]
George E Dahl, Dong Yu, Li Deng, and Alex Acero . 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, Vol. 20, 1 (2012), 30--42.
[6]
Luis A de Sa Pessoa, Fábio Violaro, and Plinio A Barbosa . 1999. Modelos da lingua baseados em classes de palavras para sistema de reconhecimento de fala continua. Revista da Sociedade Brasileira de Telecomunicacc oes, Vol. 14, 2 (1999), 75--84.
[7]
Marco Gonzalez and Vera Lúcia Strube Lima . 2003. Recuperacc ao de informacc ao e processamento da linguagem natural XXIII Congresso da Sociedade Brasileira de Computaccao, Vol. Vol. 3. 347--395.
[8]
Annika H"am"al"ainen, Fernando Miguel Pinto, Silvia Rodrigues, Ana Júdice, Sandra Morgado Silva, António Calado, and Miguel Sales Dias . 2013. A Multimodal Educational Game for 3--10-year-old Children: Collecting and Automatically Recognising European Portuguese Children's Speech Speech and Language Technology in Education.
[9]
Eleanor L Higgins and Marshall H Raskind . 1999. Speaking to read: The effects of continuous vs. discrete speech recognition systems on the reading and spelling of children with learning disabilities. Journal of Special Education Technology Vol. 15, 1 (1999), 19--30.
[10]
Hugo Meinedo, Diamantino Caseiro, Joao Neto, and Isabel Trancoso . 2003. AUDIMUS. media: a Broadcast News speech recognition system for the European Portuguese language. Computational Processing of the Portuguese Language (2003), 196--196.
[11]
Yajie Miao . 2014. Kaldi+ PDNN: building DNN-based ASR systems with Kaldi and PDNN. arXiv preprint arXiv:1401.6984 (2014).
[12]
Nelson Neto, Ênio Silva, and Erick Sousa . 2005. Software usando reconhecimento e síntese de voz: o estado da arte para o Português brasileiro. In Proceedings of the 2005 Latin American conference on Human-computer interaction. ACM, 326--331.
[13]
Robert M Ochshorn and Max Hawkins. 2017. Gentle forced aligner [computer program]. (2017). https://lowerquality.com/gentle/Access date: 9 ago. 2017.
[14]
Lawrence Rabiner and B Juang . 1986. An introduction to hidden Markov models. IEEE assp magazine, Vol. 3, 1 (1986), 4--16.
[15]
Yves Raimond and Chris Lowis . 2012. Automated interlinking of speech radio archives. LDOW Vol. 937 (2012).
[16]
Roni Rosenfeld and Philip Clarkson . 1997. Statistical language modeling using the Carnegie Mellon University-Cambridge toolkit. (1997).
[17]
Carlos Patrick Alves da Silva . 2010. Um software de reconhecimento de voz para português brasileiro.
[18]
Andreas Stolcke et almbox. . 2002. SRILM-an extensible language modeling toolkit. In Interspeech, Vol. Vol. 2002. 2002.
[19]
Rafael Teruszkin Tevah . 2006. Implementacc ao de um sistema de reconhecimento de fala contínua com amplo vocabulário para o português brasileiro. Ph.D. Dissertation. bibinfoschoolUniversidade Federal do Rio de Janeiro.
[20]
Gokhan Tur, Andreas Stolcke, Lynn Voss, John Dowding, Beno^ıt Favre, Raquel Fernández, Matthew Frampton, Michael Frandsen, Clint Frederickson, Martin Graciarena, et almbox. . 2008. The CALO meeting speech recognition and understanding system Spoken Language Technology Workshop, 2008. SLT 2008. IEEE. IEEE, 69--72.
[21]
Haojin Yang and Christoph Meinel . 2014. Content based lecture video retrieval using speech and video text information. IEEE Transactions on Learning Technologies Vol. 7, 2 (2014), 142--154.
[22]
Carlos Alberto Ynoguti. 1999. Reconhecimento de fala contínua usando modelos ocultos de Markov. Ph.D. Dissertation. bibinfoschoolUniversidade Estadual de Campinas.
[23]
S Young, G Evermann, M Gales, T Hain, D Kershaw, X Liu, G Moore, J Odell, D Ollason, D Povey, et almbox. . 2006. The HTK book (v3. 4). Cambridge University (2006).
[24]
Luiz Baptista Ênio Silva, Helane Fernandes, and Aldebaro Klautau . 2005. Desenvolvimento de um sistema de reconhecimento automático de voz contınua com grande vocabulário para o Português Brasileiro.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebMedia '17: Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web
October 2017
522 pages
ISBN:9781450350969
DOI:10.1145/3126858
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

  • SBC: Brazilian Computer Society
  • CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
  • CGIBR: Comite Gestor da Internet no Brazil
  • CAPES: Brazilian Higher Education Funding Council

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acoustic model
  2. asr
  3. automatic speech recognition
  4. kaldi
  5. language model
  6. multimedia applications

Qualifiers

  • Short-paper

Funding Sources

  • Rede Nacional de Ensino e Pesquisa

Conference

Webmedia '17
Sponsor:
  • SBC
  • CNPq
  • CGIBR
  • CAPES
Webmedia '17: Brazilian Symposium on Multimedia and the Web
October 17 - 20, 2017
RS, Gramado, Brazil

Acceptance Rates

WebMedia '17 Paper Acceptance Rate 38 of 138 submissions, 28%;
Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 118
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media