Automatic Stop List Generation for Clustering Recognition Results of Call Center Recordings

Popova, Svetlana; Krivosheeva, Tatiana; Korenevsky, Maxim

doi:10.1007/978-3-319-11581-8_17

Svetlana Popova^22,23,
Tatiana Krivosheeva²⁴ &
Maxim Korenevsky²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1302 Accesses
4 Citations

Abstract

The paper deals with the problem of automatic stop list generation for processing recognition results of call center recordings, in particular for the purpose of clustering. We propose and test a supervised domain dependent method of automatic stop list generation. The method is based on finding words whose removal increases the dissimilarity between documents in different clusters, and decreases dissimilarity between documents within the same cluster. This approach is shown to be efficient for clustering recognition results of recordings with different quality, both on datasets that contain the same topics as the training dataset, and on datasets containing other topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zipf, K.: Selective Studies and the Principle of Relative Frequency in Language. MIT Press, Cambridge (1932)
Google Scholar
Dragut, E., Fang, F., Sistla, P., Yu, C., Meng, W.: Stop Word and Related Problems in Web Interface Integration. In: 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, France, pp. 349–360 (2009)
Google Scholar
Yang, Y.: Noise Reduction in a Statistical Approach to Text Categorization (pdf). In: Proc of the SIGIR 1995, pp. 256–263 (1995)
Google Scholar
Popova, S., Kovriguina, L., Mouromtsev, D., Khodyrev, I.: Stop-words in keyphrase extraction problem. In: Open Innovations Association (FRUCT), pp. 131–121 (2013)
Google Scholar
Popova, S., Khodyrev, I., Ponomareva, I., Krivosheeva, T.: Automatic speech recognition texts clustering. In: Sojka, P. (ed.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 489–498. Springer, Heidelberg (2014)
Chapter Google Scholar
Korenevsky, M., Bulusheva, A., Levin, K.: Unknown Words Modeling in Training and Using Language Models for Russian LVCSR System. In: Proc. of the SPECOM 2014, Kazan, Russia (2011)
Google Scholar
Tomashenko, N.A., Khokhlov, Y.Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 146–153. Springer, Heidelberg (2013)
Chapter Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context, Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)
Google Scholar
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Meyer zu Eissen, S., Stein, B.: Analysis of Clustering Algorithms for Web-based Search. In: Karagiannis, D., Reimer, U. (eds.) PAKM 2002. LNCS (LNAI), vol. 2569, pp. 168–178. Springer, Heidelberg (2002)
Chapter Google Scholar
Stein, B., Meyer zu Eissen, S., Wilbrock, F.: On Cluster Validity and the Information Need of Users. In: Hanza, M.H. (ed.) 3rd IASTED Int. Conference on Artificial Intelligence and Applications (AIA 2003), Benalmadena, Spain, pp. 216–221. ACTA Press, IASTED (2003) ISBN 0-88986-390-3
Google Scholar
Rousseeuw, P.J.: Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics 20, 53–65 (1987)
Article MATH Google Scholar
Cagnina, L., Errecalde, M., Ingaramo, D., Rosso, P.: A discrete particle swarm optimizer for clustering short text corpora. In: BIOMA 2008, pp. 93–103 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Saint-Petersburg State University, Saint-Petersburg, Russia
Svetlana Popova
Scrol, Saint-Petersburg, Russia
Svetlana Popova
STC-innovations Ltd., Saint-Petersburg, Russia
Tatiana Krivosheeva & Maxim Korenevsky

Authors

Svetlana Popova
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Krivosheeva
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Korenevsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popova, S., Krivosheeva, T., Korenevsky, M. (2014). Automatic Stop List Generation for Clustering Recognition Results of Call Center Recordings. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics