Skip to main content

Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Abstract

Speech analytics suffer from poor automatic transcription quality. To tackle this difficulty, a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to work around drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet Allocation (LDA). During the LDA learning process, distribution of words into each topic is estimated automatically. Nonetheless, in the context of a classification task, LDA model does not take into account the targeted classes. The supervised Latent Dirichlet Allocation (sLDA) model overcomes this weakness by considering the class, as a response, as well as the document content itself. In this paper, we propose to compare these two classical topic-based representations of a dialogue (LDA and sLDA), with a new one based not only on the dialogue content itself (words), but also on the theme related to the dialogue. This original Author-topic Latent Variables (ATLV) representation is based on the Author-topic (AT) model. The effectiveness of the proposed ATLV representation is evaluated on a classification task from automatic dialogue transcriptions of the Paris Transportation customer service call. Experiments confirmed that this ATLV approach outperforms by far the LDA and sLDA approaches, with a substantial gain of respectively 7.3 and 5.8 points in terms of correctly labeled conversations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Morchid, M., Dufour, R., Bousquet, P.M., Bouallegue, M., Linarès, G., De Mori, R.: Improving dialogue classification using a topic space representation and a gaussian classifier based on the decision rule. In: ICASSP (2014)

    Google Scholar 

  3. Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Information Processing Systems, pp. 121–128 (2008)

    Google Scholar 

  4. Morchid, M., Dufour, R., Bouallegue, M., Linarès, G.: Author-topic based representation of call-center conversations. In: 2014 IEEE International Spoken Language Technology Workshop (SLT) (2014) (to appear)

    Google Scholar 

  5. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)

    Google Scholar 

  6. Vapnik, V.: Pattern recognition using generalized portrait method. Automation and Remote Control 24, 774–780 (1963)

    Google Scholar 

  7. Bechet, F., Maza, B., Bigouroux, N., Bazillon, T., El-Beze, M., De Mori, R., Arbillot, E.: Decoda: a call-centre human-human spoken conversation corpus. In: LREC 2012 (2012)

    Google Scholar 

  8. Salton, G.: Automatic text processing: the transformation. Analysis and Retrieval of Information by Computer (1989)

    Google Scholar 

  9. Bellegarda, J.: Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE 88, 1279–1296 (2000)

    Article  Google Scholar 

  10. Suzuki, Y., Fukumoto, F., Sekiguchi, Y.: Keyword extraction using term-domain interdependence for dictation of radio news. In: 17th International Conference on Computational Linguistics, vol. 2, pp. 1272–1276. ACL (1998)

    Google Scholar 

  11. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)

    Article  MATH  Google Scholar 

  12. Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)

    Google Scholar 

  13. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)

    Article  Google Scholar 

  14. Heinrich, G.: Parameter estimation for text analysis. Web: http://www.arbylon.net/publications/text-est.pdf (2005)

  15. Morchid, M., Linarès, G., El-Beze, M., De Mori, R.: Theme identification in telephone service conversations using quaternions of speech features. In: INTERSPEECH (2013)

    Google Scholar 

  16. Linarès, G., Nocera, P., Massonié, D., Matrouf, D.: The lia speech recognition system: from 10xrt to 1xrt. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 302–308. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Wang, C., Blei, D., Li, F.F.: Simultaneous image classification and annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1903–1910. IEEE (2009)

    Google Scholar 

  18. Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification 100, 2584–2603 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Morchid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Morchid, M., Dufour, R., Linarès, G., Hamadi, Y. (2015). Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_44

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics