Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet

Quénot, Georges; Tan, Tien Ping; Le, Viet Bac; Ayache, Stéphane; Besacier, Laurent; Mulhem, Philippe

doi:10.1007/s11042-009-0377-6

Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet

Published: 10 October 2009

Volume 48, pages 123–140, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Georges Quénot¹,
Tien Ping Tan¹,
Viet Bac Le²,
Stéphane Ayache³,
Laurent Besacier¹ &
…
Philippe Mulhem¹

291 Accesses
Explore all metrics

Abstract

We present in this paper an approach based on the use of the International Phonetic Alphabet (IPA) for content-based indexing and retrieval of multilingual audiovisual documents. The approach works even if the languages of the document are unknown. It has been validated in the context of the “Star Challenge” search engine competition organized by the Agency for Science, Technology and Research (A*STAR) of Singapore. Our approach includes the building of an IPA-based multilingual acoustic model and a dynamic programming based method for searching document segments by “IPA string spotting”. Dynamic programming allows for retrieving the query string in the document string even with a significant transcription error rate at the phone level. The methods that we developed ranked us as first and third on the monolingual (English) search task, as fifth on the multilingual search task and as first on the multimodal (audio and image) search task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

Article 19 February 2018

Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus

About Sound and Vision: CLEF Beyond Text Retrieval Tasks

Notes

References

Ayache S, Quénot G (2007) Image and video indexing using networks of operators. J Image Video Process 2007(4):1–13. doi:10.1155/2007/56928
Article Google Scholar
CCC (2005) http://www.dear.com/CCC/resources.htm
Clarkson P, Rosenfeld R (1997) Statistical language modeling using the CMU-Cambridge toolkit. In: Eurospeech’07, pp 2707–2710
Gauvain JL, Mariani JJ (1982) A method for connected word recognition and word spotting on a microprocessor. In: Proc. IEEE ICASSP 82, vol 2, pp 891–894
LDC (1993) http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6B
LDC (1997) http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98S71
Le VB, Do-Dat T, Casteli E, Besacier L, Serignat JF (2004) Spoken and written language resources for Vietnamese. In: LREC’04, pp 599–602
Le VB, Besacier L, Schultz T (2006) Acoustic-phonetic similarities for context dependent acoustic model portability. In: Proc. IEEE ICASSP 2006
Li H, Ma B, Lee CH (2007) A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing 15:91–110
Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Mäenpää Topi Pietikäinen Matti OT (2000) Texture classification by multi-predicate local binary pattern operators. In: 15th international conference on pattern recognition, vol 3, pp 951–954
Moraru D, Besacier L, Meignier S, Fredouille C, Bonastre JF (2004) Speaker diarization in the ELISA consortium over the last 4 years. In: RT2004 fall workshop
Placeway P, Chen S, Eskenazi M, Jain U, Parikh V, Raj B, Ravishankar M, Rosenfeld R, Seymore K, Siegler M, Stern R, Thayer (1997) The 1996 hub-4 sphinx-3 system. In: In DARPA speech recognition workshop. Chantilly
Schultz T, Waibel A (2001) Language independent and language adaptive acoustic modeling for speech recognition. Speech Commun 35:31–51
Article MATH Google Scholar
Singhal A, Buckley C, Mitra A (1996) Pivoted document length normalization. In: ACM SIGIR conference. ACM, New York, pp 21–29
Google Scholar
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: MIR’06: proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York, pp 321–330. doi:10.1145/1178677.1178722
Chapter Google Scholar
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Intl. conf. on spoken language processing. citeseer.ist.psu.edu/stolcke02srilm.html
Tan TP, Besacier L (2008) Improving pronunciation modeling for non-native speech recognition. In: Interspeech 2008

Download references

Acknowledgement

Part of this work has been supported by the Quaero programme.

Author information

Authors and Affiliations

Laboratoire d’Informatique de Grenoble, BP 53, 38041, Grenoble Cedex 9, France
Georges Quénot, Tien Ping Tan, Laurent Besacier & Philippe Mulhem
LIMSI-CNRS, BP 133, 91403, Orsay Cedex, France
Viet Bac Le
Laboratoire d’Informatique Fondamentale de Marseille, 163 avenue de Luminy - Case 901, 13288, Marseille Cedex 9, France
Stéphane Ayache

Authors

Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar
Tien Ping Tan
View author publications
You can also search for this author in PubMed Google Scholar
Viet Bac Le
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Ayache
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Besacier
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Mulhem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georges Quénot.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quénot, G., Tan, T.P., Le, V.B. et al. Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet. Multimed Tools Appl 48, 123–140 (2010). https://doi.org/10.1007/s11042-009-0377-6

Download citation

Published: 10 October 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s11042-009-0377-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus

About Sound and Vision: CLEF Beyond Text Retrieval Tasks

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus

About Sound and Vision: CLEF Beyond Text Retrieval Tasks

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation