Skip to main content
Log in

The Construction-Integration framework: a means to diminish bias in LSA-based call routing

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Semantic technology is commonly used for two purposes in the field of IVR (Interactive Voice Response). The first is to correct the output of voice recognition devices based on coherence with a context. The second is to perform what is referred to as “call routing”, requiring technology that categorizes utterances and returns a list of the most credible routes. Our paper focuses on the latter, aiming to use the Latent Semantic Analysis (LSA henceforth) computational model (Deerwester et al. in J. Am. Soc. Inf. Sci. 41:391–407, 1990) together with the Construction-Integration model (C-I henceforth), a psycholinguistically motivated algorithm (Kintsch in Int. J. Psychol. 33(6):411–420, 1998), to interpret, manage and successfully route user requests in an efficient and reliable manner. By efficient we mean that training is unnecessary when the destination model is altered, and exhaustive labeling of all utterances is not required, concentrating instead only on some sample destinations. By reliable we mean that the construction-integration algorithm attenuates the risks from intra-destination variability and word saliency. Technical and theoretical aspects are discussed. In addition, some destination assignment methods are tested and debated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Folding-In is specified later.

  2. Matrix of contexts is generically known in Information Retrieval as matrix of documents. In order to be accurate in naming the context window, we prefer to call it “utterances”.

  3. Strictly speaking, in LSA training these labels do not need to be exhaustive, so they can originate from different means of classification or even different corpora. In extreme cases, a sufficiently large corpus might even be trained without labels. Testing this last hypothesis is one of the aims of the present study.

  4. The space is set up according to the method followed by Cox and Shahshahani (2001), with a matrix built from terms and utterances, and not terms and grouped categories, like Chu-Carroll and Carpenter (1999). The rows of the occurrence matrix were terms (and also labels in the labeled condition) and the columns were utterances.

  5. As well as having no labels, this decrease in the size of the matrix is due to single-term utterances (for example “credit”) being excluded from the training—undoubtedly this is a disadvantage of an LSA model without labels.

  6. We chose such a dimensionalization based on the assumptions made in some previous studies. In those studies it has been suggested that the optimal number of dimensions for specific domain corpora does not have to be extremely low, sometimes even approaching the 300 dimensions recommended by Landauer and Dumais (1997) for general domain corpora (see Jorge-Botana et al. 2010b). Some of the most recent studies simply use 300 dimensions (Wild et al. 2011).

  7. These models must cover all the functionality of the service.

  8. Because call utterances are shorter and simpler than propositions within colloquial language, the algorithm which is used is not exactly the original Construction-Integration algorithm. The integration part proposed by Kintsch is a spreading activation algorithm which is iterative until the net is stable (the cycle when the change in the mean activation is lower than a parameterized value), whereas our algorithm is a “one-shot” mechanism. The activation of each node is calculated based on the connections received. Another difference with the CI algorithm as proposed by Kintsch is that we only consider words and not propositions nor situations. In any case, note that the original C-I is more complete and fine grained, but our mechanism is sufficient for our purposes and may be more flexibly programmed, because an OOP (Object Oriented Programming) paradigm has been used, with classes such as net, layer, node, connection, etc., instead of the iterative vector * matrix multiplication in the original (see Kintsch and Welsch 1991 for details of the original conception).

  9. The cosines are calculated using the previously trained semantic space, in other words each of the terms to be compared is represented by a vector in this space. Any term vector might then be compared with another term vector using the cosine.

References

  • Bellegarda, J. R. (2000). Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, 88(8), 1279–1296.

    Article  Google Scholar 

  • Chu-Carroll, J., & Carpenter, B. (1999). Vector-based natural language call routing. Computational Linguistics, 25(3), 361–388.

    Google Scholar 

  • Cox, S., & Shahshahani, B. (2001). A comparison of some different techniques for vector based call-routing. In Proceedings of 7th European conf. on speech communication and technology, Aalborg.

    Google Scholar 

  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.

    Article  Google Scholar 

  • Foltz, P. W. (1996). Latent semantic analysis for text-based research. Behavior Research Methods, Instruments, & Computers, 28(2), 197–202.

    Article  Google Scholar 

  • Haley, D. T., Thomas, P., De Roeck, A., & Petre, M. (2005). A research taxonomy for latent semantic analysis based educational applications. Technical Report no. 2005/09, Open University.

  • Haley, D. T., Thomas, P., Petre, P., & De Roeck, A. (2007). Seeing the whole picture: comparing computer assisted assessment systems using LSA-based systems as an example. Technical Report Number 2007/07, Open University.

  • Jones, M. P., & Martin, J. H. (1997). Contextual spelling correction using latent semantic analysis. In Proceedings of the fifth conference on applied natural language processing (pp. 163–176).

    Google Scholar 

  • Jorge-Botana, G., Olmos, R., & León, J. A. (2009). Using LSA and the predication algorithm to improve extraction of meanings from a diagnostic corpus. Spanish Journal of Psychology, 12(2), 424–440.

    Google Scholar 

  • Jorge-Botana, G., León, J. A., Olmos, R., & Escudero, I. (2010a). Latent semantic analysis parameters for essay evaluation using small-scale corpora. Journal of Quantitative Linguistics, 17(1), 1–29.

    Article  Google Scholar 

  • Jorge-Botana, G., León, J. A., Olmos, R., & Hassan-Montero, Y. (2010b). Visualizing polysemy using LSA and the predication algorithm. Journal of the American Society for Information Science and Technology, 61(8), 1706–1724.

    Google Scholar 

  • Jorge-Botana, G., León, J. A., Olmos, R., & Escudero, I. (2011). The representation of polysemy through vectors: some building blocks for constructing models and applications with LSA. International Journal of Continuing Engineering Education and Long Learning, 21(4).

  • Kintsch, W. (1998). The representation of knowledge in minds and machines. International Journal of Psychology, 33(6), 411–420.

    Article  Google Scholar 

  • Kintsch, W. (2000). Metaphor comprehension: a computational theory. Psychonomic Bulletin & Review, 7, 257–266.

    Article  Google Scholar 

  • Kintsch, W. (2001). Predication. Cognitive Science, 25, 173–202.

    Article  Google Scholar 

  • Kintsch, W. (2002). On the notions of theme and topic in psychological process models of text comprehension. In M. Louwerse & W. van Peer (Eds.), Thematics: interdisciplinary studies (pp. 157–170). Amsterdam: Benjamins.

    Google Scholar 

  • Kintsch, W. (2007). Meaning in context. In T. K. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 89–105). Mahwah: Erlbaum.

    Google Scholar 

  • Kintsch, W. (2008). Symbol systems and perceptual representations. In M. de Vega, A. M. Glenberg, & A. C. Graesser (Eds.), Symbols and embodiment: debates on meaning and cognition (pp. 145–164). Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Kintsch, W., & Bowles, A. (2002). Metaphor comprehension: what makes a metaphor difficult to understand? Metaphor and Symbol, 17, 249–262.

    Article  Google Scholar 

  • Kintsch, W., & Welsch, D. (1991). The construction-integration model: a framework for studying memory for text. In W. E. Hockley & S. Lewandowsky (Eds.), Relating theory and data: essays on human memory in honor of Bennet B. Murdock (pp. 367–385). Hillsdale: Erlbaum.

    Google Scholar 

  • Kintsch, W., Patel, V., & Ericsson, K. A. (1999). The role of long-term working memory in text comprehension. Psychologia, 42, 186–198.

    Google Scholar 

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.

    Article  Google Scholar 

  • Li, L., & Chou, W. (2002). Improving latent semantic indexing based classifier with information gain. In Proceedings of the 7th international conference on spoken language processing, ICSLP-2002, Denver, Colorado, USA, September 16–20, 2002 (pp. 1141–1144).

    Google Scholar 

  • Lim, B. P., Ma, B., & Li, H. (2005). Using semantic context to improve voice keyword mining. In Proceedings of the international conference on Chinese computing (ICCC 2005), Singapore, 21–23 March 2005.

    Google Scholar 

  • Louwerse, M. M. (2008). Embodied representations are encoded in language. Psychonomic Bulletin & Review, 15, 838–844.

    Article  Google Scholar 

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.

    MATH  Google Scholar 

  • McCauley, L. (unpublished). Using latent semantic analysis to aid speech recognition and understanding. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.3694.

  • Nakov, P., Popova, A., & Mateev, P. (2001). Weight functions impact on LSA performance. In Proceedings of the recent advances in natural language processing conference—RANLP 2001, Tzigov Chark, Bulgaria.

    Google Scholar 

  • Olmos, R., León, J. A., Jorge-Botana, G., & Escudero, I. (2009). New algorithms assessing short summaries in expository texts using latent semantic analysis. Behavior Research Methods, 41(3), 944–950.

    Article  Google Scholar 

  • Quesada, J. (2008). Latent problem solving analysis (LPSA): a computational theory of representation in complex, dynamic problem solving tasks. PhD thesis, Psychology, University of Granada.

  • Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGrawHill.

    MATH  Google Scholar 

  • Serafin, R., & Di Eugenio, B. (2004). FLSA: extending latent semantic analysis with features for dialogue act classification. In Proceedings of ACL04, 42nd annual meeting of the association for computational linguistics Barcelona, Spain, July.

    Google Scholar 

  • Shi, Y. (2008). An investigation of linguistic information for speech recognition error detection. PhD University of Maryland, Baltimore County, Baltimore.

  • Tyson, N., & Matula, V. C. (2004). Improved LSI-based natural language call routing using speech recognition confidence scores. In Proceedings of EMNLP.

    Google Scholar 

  • Wild, F., Haley, D., & Bülow, K. (2011). Using latent-semantic analysis and network analysis for monitoring conceptual development. Journal for Language Technology and Computational Linguistics, 26(1), 9–21.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillermo Jorge-Botana.

Appendix

Appendix

Destinations: Activate or disactivate a line, Check balance, Voicemail, Coverage, Contract details, Internet tariff, Businesses, Phonebill, Lost or stolen, General information, Internet, Logistics, Migration, Games and Software, Special offers, Call options, Agent, Switching service provider, Top-ups, Complaint, Roaming, Credit Balance, SMS and MMS, Call rates, TV, Phones, Shops, Languages, Incidents.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jorge-Botana, G., Olmos, R. & Barroso, A. The Construction-Integration framework: a means to diminish bias in LSA-based call routing. Int J Speech Technol 15, 151–164 (2012). https://doi.org/10.1007/s10772-012-9129-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9129-5

Keywords

Navigation