Skip to main content

Using the Logarithmic Generator Function in the Spoken Term Detection Task

  • Conference paper
Modeling Decisions for Artificial Intelligence (MDAI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8234))

  • 802 Accesses

Abstract

Spoken term detection is a task in artificial intelligence where user-entered keywords are to be looked for in a huge audio database. In one common approach the recordings are first converted into phoneme-sequences, and the actual search is performed in this space. During search, instead of performing the default multiplication of basic phoneme operation probabilities, applying a triangular norm can significantly improve system accuracy. We used an application-oriented method for triangular norm representation and tuning, namely the logarithmic generator function. In practice this proved to be quite successful and led to a relative error reduction score of 16%.

This publication is supported by the European Union and co-funded by the European Social Fund. Project title: Telemedicine-focused research activities in the fields of mathematics, informatics and medical sciences. Project number: TÁMOP-4.2.2.A-11/1/KONV-2012-0073.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aczél, J., Alsina, C.: Characterizations of some classes of quasilinear functions with applications to triangular norms and to synthesizing judgements. Methods Oper. Res. 48, 3–22 (1984)

    MATH  Google Scholar 

  2. Bishop, C.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)

    Google Scholar 

  3. Bonissone, P., Goebel, K., Yan, W.: Classifier fusion using triangular norms. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 154–163. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Deng, G.: A parametric generalized linear system based on the notion of the t-norm. IEEE Transactions on Image Processing 22(7), 2903–2910 (2012)

    Article  Google Scholar 

  5. Dombi, J.: A general class of fuzzy operators, the De Morgan class of fuzzy operators and fuzziness measures induced by fuzzy operators. Fuzzy Sets and Systems 8, 149–163 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dombi, J.: Towards a general class of operators for fuzzy systems. IEEE Transaction on Fuzzy Systems 16(2), 477–484 (2008)

    Article  Google Scholar 

  7. Dubois, D., Prade, H.: Fundamentals of Fuzzy Sets. Kluwer (2000)

    Google Scholar 

  8. Fodor, J.C.: A remark on constructing t-norms. Fuzzy Sets and Systems 41(2), 195–199 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  9. Gosztolya, G., Dombi, J., Kocsor, A.: Applying the Generalized Dombi Operator family to the speech recognition task. Journal of Computing and Information Technology 17(3), 285–293 (2009)

    Google Scholar 

  10. Gosztolya, G., Kocsor, A.: A hierarchical evaluation methodology in speech recognition. Acta Cybernetica 17(2), 213–224 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Gosztolya, G., Kocsor, A.: Using triangular norms in a segment-based automatic speech recognition system. International Journal of Information Technology and Intelligent Computing (IT & IC) (IEEE) 1(3), 487–498 (2006)

    Google Scholar 

  12. Gosztolya, G., Stachó, L.L.: Aiming for best fit t-norms in speech recognition. In: Proceedings of SISY (IEEE), Subotica, Serbia, pp. 1–5 (September 2008)

    Google Scholar 

  13. Grundland, M., Vohra, R., Williams, G.P., Dodgson, N.A.: Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. In: Proceedings of Computer Graphics Forum, vol. 25, pp. 577–586 (2006)

    Google Scholar 

  14. Hanmandlu, M., Grover, J., Gureja, A., Gupta, H.: Score level fusion of multimodal biometrics using triangular norms. Pattern Recognition Letters 32(14), 1843–1850 (2011)

    Article  Google Scholar 

  15. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing. Prentice Hall (2001)

    Google Scholar 

  16. Huyer, W., Neumaier, A.: Snobfit – stable noisy optimization by branch and fit. ACM Transactions on Mathematical Software 35(2), 1–25 (2008)

    Article  MathSciNet  Google Scholar 

  17. Jenei, S.: On Archimedean triangular norms. Fuzzy Sets and Systems 99(2), 179–186 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  18. Jenei, S.: A general method for constructing left-continuous t-norms. Fuzzy Sets and Systems 136(3), 263–282 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  19. Jenei, S., Pap, E.: Smoothly generated Archimedean approximation of continuous triangular norms. Fuzzy Sets and Systems (Special Issue “Triangular norms”) 104, 19–25 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  20. Katsurada, K., Sawada, S., Teshima, S., Iribe, Y., Nitta, T.: Evaluation of fast spoken term detection using a suffix array. In: Proceedings of Interspeech, pp. 909–912 (2011)

    Google Scholar 

  21. Kohavi, R., Provost, F.: Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process 30(2/3) (February/March 1998)

    Google Scholar 

  22. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  23. Ling, C.H.: Representation of associative functions. Publ. Math. Debrecen 12, 189–212 (1965)

    MathSciNet  Google Scholar 

  24. Pinto, J., Hermansky, H., Szöke, I., Prasanna, S.: Fast approximate spoken term detection from sequence of phonemes. In: Proceedings of SIGIR, Singapore (2008)

    Google Scholar 

  25. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall (1993)

    Google Scholar 

  26. Schweizer, B., Sklar, A.: Associative functions and statistical triangle inequalities. Publ. Math. Debrecen 8, 169–186 (1961)

    MathSciNet  MATH  Google Scholar 

  27. Szöke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., Cernocky, J.: Comparison of keyword spotting approaches for informal continuous speech. In: Proceedings of Interspeech, pp. 633–636 (2005)

    Google Scholar 

  28. Tejedor, J., Wang, D., King, S., Frankel, J., Colas, J.: A posterior probability-based system hybridisation and combination for spoken term detection. In: Proceedings of Interspeech, Brighton, UK, pp. 2131–2134 (September 2009)

    Google Scholar 

  29. Tóth, L.: A hierarchical, context-dependent Neural Network architecture for improved phone recognition. In: Proceedings of ICASSP, pp. 5040–5043 (2011)

    Google Scholar 

  30. Young, S.: The HMM Toolkit (HTK) (software and manual) (1995), http://htk.eng.cam.ac.uk/

  31. Young, S.: Statistical modelling in continuous speech recognition. In: Proceedings of UAI, Seattle, pp. 562–571 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gosztolya, G. (2013). Using the Logarithmic Generator Function in the Spoken Term Detection Task. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Megías, D. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2013. Lecture Notes in Computer Science(), vol 8234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41550-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41550-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41549-4

  • Online ISBN: 978-3-642-41550-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics