Using the Logarithmic Generator Function in the Spoken Term Detection Task

Gosztolya, Gábor

doi:10.1007/978-3-642-41550-0_9

Gábor Gosztolya²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8234))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

802 Accesses

Abstract

Spoken term detection is a task in artificial intelligence where user-entered keywords are to be looked for in a huge audio database. In one common approach the recordings are first converted into phoneme-sequences, and the actual search is performed in this space. During search, instead of performing the default multiplication of basic phoneme operation probabilities, applying a triangular norm can significantly improve system accuracy. We used an application-oriented method for triangular norm representation and tuning, namely the logarithmic generator function. In practice this proved to be quite successful and led to a relative error reduction score of 16%.

This publication is supported by the European Union and co-funded by the European Social Fund. Project title: Telemedicine-focused research activities in the fields of mathematics, informatics and medical sciences. Project number: TÁMOP-4.2.2.A-11/1/KONV-2012-0073.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aczél, J., Alsina, C.: Characterizations of some classes of quasilinear functions with applications to triangular norms and to synthesizing judgements. Methods Oper. Res. 48, 3–22 (1984)
MATH Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
Google Scholar
Bonissone, P., Goebel, K., Yan, W.: Classifier fusion using triangular norms. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 154–163. Springer, Heidelberg (2004)
Chapter Google Scholar
Deng, G.: A parametric generalized linear system based on the notion of the t-norm. IEEE Transactions on Image Processing 22(7), 2903–2910 (2012)
Article Google Scholar
Dombi, J.: A general class of fuzzy operators, the De Morgan class of fuzzy operators and fuzziness measures induced by fuzzy operators. Fuzzy Sets and Systems 8, 149–163 (1982)
Article MathSciNet MATH Google Scholar
Dombi, J.: Towards a general class of operators for fuzzy systems. IEEE Transaction on Fuzzy Systems 16(2), 477–484 (2008)
Article Google Scholar
Dubois, D., Prade, H.: Fundamentals of Fuzzy Sets. Kluwer (2000)
Google Scholar
Fodor, J.C.: A remark on constructing t-norms. Fuzzy Sets and Systems 41(2), 195–199 (1991)
Article MathSciNet MATH Google Scholar
Gosztolya, G., Dombi, J., Kocsor, A.: Applying the Generalized Dombi Operator family to the speech recognition task. Journal of Computing and Information Technology 17(3), 285–293 (2009)
Google Scholar
Gosztolya, G., Kocsor, A.: A hierarchical evaluation methodology in speech recognition. Acta Cybernetica 17(2), 213–224 (2005)
MathSciNet MATH Google Scholar
Gosztolya, G., Kocsor, A.: Using triangular norms in a segment-based automatic speech recognition system. International Journal of Information Technology and Intelligent Computing (IT & IC) (IEEE) 1(3), 487–498 (2006)
Google Scholar
Gosztolya, G., Stachó, L.L.: Aiming for best fit t-norms in speech recognition. In: Proceedings of SISY (IEEE), Subotica, Serbia, pp. 1–5 (September 2008)
Google Scholar
Grundland, M., Vohra, R., Williams, G.P., Dodgson, N.A.: Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. In: Proceedings of Computer Graphics Forum, vol. 25, pp. 577–586 (2006)
Google Scholar
Hanmandlu, M., Grover, J., Gureja, A., Gupta, H.: Score level fusion of multimodal biometrics using triangular norms. Pattern Recognition Letters 32(14), 1843–1850 (2011)
Article Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing. Prentice Hall (2001)
Google Scholar
Huyer, W., Neumaier, A.: Snobfit – stable noisy optimization by branch and fit. ACM Transactions on Mathematical Software 35(2), 1–25 (2008)
Article MathSciNet Google Scholar
Jenei, S.: On Archimedean triangular norms. Fuzzy Sets and Systems 99(2), 179–186 (1998)
Article MathSciNet MATH Google Scholar
Jenei, S.: A general method for constructing left-continuous t-norms. Fuzzy Sets and Systems 136(3), 263–282 (2003)
Article MathSciNet MATH Google Scholar
Jenei, S., Pap, E.: Smoothly generated Archimedean approximation of continuous triangular norms. Fuzzy Sets and Systems (Special Issue “Triangular norms”) 104, 19–25 (1999)
Article MathSciNet MATH Google Scholar
Katsurada, K., Sawada, S., Teshima, S., Iribe, Y., Nitta, T.: Evaluation of fast spoken term detection using a suffix array. In: Proceedings of Interspeech, pp. 909–912 (2011)
Google Scholar
Kohavi, R., Provost, F.: Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process 30(2/3) (February/March 1998)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
MathSciNet Google Scholar
Ling, C.H.: Representation of associative functions. Publ. Math. Debrecen 12, 189–212 (1965)
MathSciNet Google Scholar
Pinto, J., Hermansky, H., Szöke, I., Prasanna, S.: Fast approximate spoken term detection from sequence of phonemes. In: Proceedings of SIGIR, Singapore (2008)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall (1993)
Google Scholar
Schweizer, B., Sklar, A.: Associative functions and statistical triangle inequalities. Publ. Math. Debrecen 8, 169–186 (1961)
MathSciNet MATH Google Scholar
Szöke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., Cernocky, J.: Comparison of keyword spotting approaches for informal continuous speech. In: Proceedings of Interspeech, pp. 633–636 (2005)
Google Scholar
Tejedor, J., Wang, D., King, S., Frankel, J., Colas, J.: A posterior probability-based system hybridisation and combination for spoken term detection. In: Proceedings of Interspeech, Brighton, UK, pp. 2131–2134 (September 2009)
Google Scholar
Tóth, L.: A hierarchical, context-dependent Neural Network architecture for improved phone recognition. In: Proceedings of ICASSP, pp. 5040–5043 (2011)
Google Scholar
Young, S.: The HMM Toolkit (HTK) (software and manual) (1995), http://htk.eng.cam.ac.uk/
Young, S.: Statistical modelling in continuous speech recognition. In: Proceedings of UAI, Seattle, pp. 562–571 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

MTA-SZTE Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, H-6720, Szeged, Tisza Lajos krt. 103., Hungary
Gábor Gosztolya

Authors

Gábor Gosztolya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IIIA-CSIC, Campüus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra
Toho Gakuen, 3-1-10, Naka, 186-0004, Kunitachi, Tokyo, Japan
Yasuo Narukawa
Departament d’Enginyeria de la Informació i de les Comunicacions, Universitat Autonoma de Barcelona, 08193, Bellaterra, Catalonia, Spain
Guillermo Navarro-Arribas
Internet Interdisciplinary Institute (IN3); Estudis d’Informàtica, Multimèdia i Telecomunicació, Universitat Oberta de Catalunya, Rambla del Poblenou, 156, 08018, Barcelona, Catalonia, Spain
David Megías

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gosztolya, G. (2013). Using the Logarithmic Generator Function in the Spoken Term Detection Task. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Megías, D. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2013. Lecture Notes in Computer Science(), vol 8234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41550-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-41550-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41549-4
Online ISBN: 978-3-642-41550-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics