Skip to main content

Automatic Stop List Generation for Clustering Recognition Results of Call Center Recordings

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

Abstract

The paper deals with the problem of automatic stop list generation for processing recognition results of call center recordings, in particular for the purpose of clustering. We propose and test a supervised domain dependent method of automatic stop list generation. The method is based on finding words whose removal increases the dissimilarity between documents in different clusters, and decreases dissimilarity between documents within the same cluster. This approach is shown to be efficient for clustering recognition results of recordings with different quality, both on datasets that contain the same topics as the training dataset, and on datasets containing other topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zipf, K.: Selective Studies and the Principle of Relative Frequency in Language. MIT Press, Cambridge (1932)

    Google Scholar 

  2. Dragut, E., Fang, F., Sistla, P., Yu, C., Meng, W.: Stop Word and Related Problems in Web Interface Integration. In: 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, France, pp. 349–360 (2009)

    Google Scholar 

  3. Yang, Y.: Noise Reduction in a Statistical Approach to Text Categorization (pdf). In: Proc of the SIGIR 1995, pp. 256–263 (1995)

    Google Scholar 

  4. Popova, S., Kovriguina, L., Mouromtsev, D., Khodyrev, I.: Stop-words in keyphrase extraction problem. In: Open Innovations Association (FRUCT), pp. 131–121 (2013)

    Google Scholar 

  5. Popova, S., Khodyrev, I., Ponomareva, I., Krivosheeva, T.: Automatic speech recognition texts clustering. In: Sojka, P. (ed.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 489–498. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  6. Korenevsky, M., Bulusheva, A., Levin, K.: Unknown Words Modeling in Training and Using Language Models for Russian LVCSR System. In: Proc. of the SPECOM 2014, Kazan, Russia (2011)

    Google Scholar 

  7. Tomashenko, N.A., Khokhlov, Y.Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 146–153. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Schwarz, P.: Phoneme recognition based on long temporal context, Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)

    Google Scholar 

  9. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  10. Meyer zu Eissen, S., Stein, B.: Analysis of Clustering Algorithms for Web-based Search. In: Karagiannis, D., Reimer, U. (eds.) PAKM 2002. LNCS (LNAI), vol. 2569, pp. 168–178. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Stein, B., Meyer zu Eissen, S., Wilbrock, F.: On Cluster Validity and the Information Need of Users. In: Hanza, M.H. (ed.) 3rd IASTED Int. Conference on Artificial Intelligence and Applications (AIA 2003), Benalmadena, Spain, pp. 216–221. ACTA Press, IASTED (2003) ISBN 0-88986-390-3

    Google Scholar 

  12. Rousseeuw, P.J.: Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  13. Cagnina, L., Errecalde, M., Ingaramo, D., Rosso, P.: A discrete particle swarm optimizer for clustering short text corpora. In: BIOMA 2008, pp. 93–103 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Popova, S., Krivosheeva, T., Korenevsky, M. (2014). Automatic Stop List Generation for Clustering Recognition Results of Call Center Recordings. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics