Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions

Morchid, Mohamed; Dufour, Richard; Linarès, Georges; Hamadi, Youssef

doi:10.1007/978-3-319-18117-2_44

Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions

Mohamed Morchid¹⁴,
Richard Dufour¹⁴,
Georges Linarès¹⁴ &
…
Youssef Hamadi¹⁵

Conference paper

3321 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Abstract

Speech analytics suffer from poor automatic transcription quality. To tackle this difficulty, a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to work around drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet Allocation (LDA). During the LDA learning process, distribution of words into each topic is estimated automatically. Nonetheless, in the context of a classification task, LDA model does not take into account the targeted classes. The supervised Latent Dirichlet Allocation (sLDA) model overcomes this weakness by considering the class, as a response, as well as the document content itself. In this paper, we propose to compare these two classical topic-based representations of a dialogue (LDA and sLDA), with a new one based not only on the dialogue content itself (words), but also on the theme related to the dialogue. This original Author-topic Latent Variables (ATLV) representation is based on the Author-topic (AT) model. The effectiveness of the proposed ATLV representation is evaluated on a classification task from automatic dialogue transcriptions of the Paris Transportation customer service call. Experiments confirmed that this ATLV approach outperforms by far the LDA and sLDA approaches, with a substantial gain of respectively 7.3 and 5.8 points in terms of correctly labeled conversations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Morchid, M., Dufour, R., Bousquet, P.M., Bouallegue, M., Linarès, G., De Mori, R.: Improving dialogue classification using a topic space representation and a gaussian classifier based on the decision rule. In: ICASSP (2014)
Google Scholar
Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Information Processing Systems, pp. 121–128 (2008)
Google Scholar
Morchid, M., Dufour, R., Bouallegue, M., Linarès, G.: Author-topic based representation of call-center conversations. In: 2014 IEEE International Spoken Language Technology Workshop (SLT) (2014) (to appear)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Google Scholar
Vapnik, V.: Pattern recognition using generalized portrait method. Automation and Remote Control 24, 774–780 (1963)
Google Scholar
Bechet, F., Maza, B., Bigouroux, N., Bazillon, T., El-Beze, M., De Mori, R., Arbillot, E.: Decoda: a call-centre human-human spoken conversation corpus. In: LREC 2012 (2012)
Google Scholar
Salton, G.: Automatic text processing: the transformation. Analysis and Retrieval of Information by Computer (1989)
Google Scholar
Bellegarda, J.: Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE 88, 1279–1296 (2000)
Article Google Scholar
Suzuki, Y., Fukumoto, F., Sekiguchi, Y.: Keyword extraction using term-domain interdependence for dictation of radio news. In: 17th International Conference on Computational Linguistics, vol. 2, pp. 1272–1276. ACL (1998)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)
Article MATH Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Article Google Scholar
Heinrich, G.: Parameter estimation for text analysis. Web: http://www.arbylon.net/publications/text-est.pdf (2005)
Morchid, M., Linarès, G., El-Beze, M., De Mori, R.: Theme identification in telephone service conversations using quaternions of speech features. In: INTERSPEECH (2013)
Google Scholar
Linarès, G., Nocera, P., Massonié, D., Matrouf, D.: The lia speech recognition system: from 10xrt to 1xrt. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 302–308. Springer, Heidelberg (2007)
Chapter Google Scholar
Wang, C., Blei, D., Li, F.F.: Simultaneous image classification and annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1903–1910. IEEE (2009)
Google Scholar
Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification 100, 2584–2603 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

LIA, University of Avignon, Avignon, France
Mohamed Morchid, Richard Dufour & Georges Linarès
Microsoft Research, Cambridge, United Kingdom
Youssef Hamadi

Authors

Mohamed Morchid
View author publications
You can also search for this author in PubMed Google Scholar
Richard Dufour
View author publications
You can also search for this author in PubMed Google Scholar
Georges Linarès
View author publications
You can also search for this author in PubMed Google Scholar
Youssef Hamadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Morchid .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morchid, M., Dufour, R., Linarès, G., Hamadi, Y. (2015). Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-18117-2_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics