Hard vs. Fuzzy Clustering for Speech Utterance Categorization

Albalate, Amparo; Suendermann, David

doi:10.1007/978-3-540-69369-7_11

Amparo Albalate¹ &
David Suendermann²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Included in the following conference series:

International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems

1401 Accesses

Abstract

To detect and describe categories in a given set of utterances without supervision, one may apply clustering to a space therein representing the utterances as vectors. This paper compares hard and fuzzy word clustering approaches applied to ‘almost’ unsupervised utterance categorization for a technical support dialog system. Here, ‘almost’ means that only one sample utterance is given per category to allow for objectively evaluating the performance of the clustering techniques. For this purpose, categorization accuracy of the respective techniques are measured against a manually annotated test corpus of more than 3000 utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Fuzzy Classification of Multi-intent Utterances

Lightweight Spoken Utterance Classification with CFG, tf-idf and Dynamic Programming

Comparative Analysis of Neuro-Fuzzy Based Approaches for Speech Data Clustering

References

Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., Pieraccini, R.: Technical Support Dialog Systems: Issues, Problems, and Solutions. In: Proc. of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, Rochester, USA (2007)
Google Scholar
Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Cornell University, Ithaca, USA (1985)
Google Scholar
Cleuziou, G., Martin, L., Vrain, C.: PoBOC: An Overlapping Clustering Algorithm. Application to Rule-Based Classication and Textual Data. In: Proc. of the ECAI, Valencia, Spain (2004)
Google Scholar
Evanini, K., Suendermann, D., Pieraccini, R.: Call Classification for Automated Troubleshooting on Large Corpora. In: Proc. of the ASRU, Kyoto, Japan (2007)
Google Scholar
Gorin, A., Riccardi, G., Wright, J.: How I Help You? Speech Communication 23(1/2) (1997)
Google Scholar
Johnson, S.: Hierarchical Clustering Schemes. Psychometrika 32 (1967)
Google Scholar
Minnen, G., Carrol, J., Pearce, D.: Applied Morphological Processing of English. Natural Language Engineering 7(3) (2001)
Google Scholar
Montgomery, C.A.: A Vector Space Model for Automatic Indexing. Communication of the ACM 18(11) (1975)
Google Scholar
Picard, J.: Finding Content-Bearing Terms using Term Similarities. In: Proc. of the EACL 1999, Bergen, Norway (1999)
Google Scholar
Toutanova, K., Manning, C.: Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In: Proc. of the EMNLP/VLC, Hong Kong, China (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, University of Ulm,
Amparo Albalate
SpeechCycle Inc., NY, USA
David Suendermann

Authors

Amparo Albalate
View author publications
You can also search for this author in PubMed Google Scholar
David Suendermann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Albalate, A., Suendermann, D. (2008). Hard vs. Fuzzy Clustering for Speech Utterance Categorization. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-69369-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics