Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 427))

Abstract

Text mining (TM) and computational linguistics (CL) are computationally intensive fields where many tools are becoming available to study large text corpora and exploit the use of corpora for various purposes. In this chapter we will address the problem of building conversational agents or chatbots from corpora for domain-specific educational purposes. After addressing some linguistic issues relevant to the development of chatbot tools from corpora, a methodology to systematically analyze large text corpora about a limited knowledge domain will be presented. Given the Artificial Intelligence Markup Language as the “assembly language” for the artificial intelligence conversational agents we present a way of using text corpora as seed from which a set of “source files” can be derived. More specifically we will illustrate how to use corpus data to extract relevant keywords, multiword expressions, glossary building and text patterns in order to build an AIML knowledge base that could be later used to build interactive conversational systems. The approach we propose does not require deep understanding techniques for the analysis of text.

As a case study it will be shown how to build the knowledge base of an English conversational agent for educational purpose from a child story that can answer question about characters, facts and episodes of the story. A discussion of the main linguistic and methodological issues and further improvements is offered in the final part of the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agostaro, F., Augello, A., Pilato, G., Vassallo, G., Gaglio, S.: A Conversational Agent Based on a Conceptual Interpretation of a Data Driven Semantic Space. In: Bandini, S., Manzoni, S. (eds.) AI*IA 2005. LNCS (LNAI), vol. 3673, pp. 381–392. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  • Augello, A., Vassallo, G., Gaglio, S., Pilato, G.: A Semantic Layer on Semi-Structured Data Sources for Intuitive Chatbots. In: International Conference on Complex, Intelligent and Software Intensive Systems, pp. 760–765 (2009)

    Google Scholar 

  • Augello, A., Gambino, O., Cannella, V., Pirrone, R., Gaglio, S., Pilato, G.: An Emotional Talking Head for a Humoristic Chatbot. In: Applications of Digital Signal Processing. InTech (2011)

    Google Scholar 

  • Batacharia, B., Levy, D., Catizone, R., Krotov, A., Wilks, Y.: CONVERSE: a conversational companion. Kluwer Iternational Series in Engineering and Computer Science, pp. 205–216. Kluwer Academic Publishers Group (1999)

    Google Scholar 

  • Chantarotwong, B.: The learning chatbot. Ph.D. Thesis. UC Berkeley School of Information (2006)

    Google Scholar 

  • Chomsky, N.: Turing on the ”Imitation game”. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, pp. 103–106. Springer, New York (2008)

    Google Scholar 

  • Colby, K.M., Weber, S., Hilf, F.D.: Artificial Paranoia. Artificial Intelligence 2(1), 1–15 (1971)

    Article  Google Scholar 

  • Cliff, D., Atwell, E.: Leeds Unix Knowledge Expert: a domain-dependent Expert System generated with domain-independent tools. BCS-SGES: British Computer Society Specialist Group on Expert Systems Journal 19, 49–51 (1987)

    Google Scholar 

  • De Gasperis, G.: Building an AIML Chatter Bot Knowledge-Base Starting from a FAQ and a Glossary. JE-LKS. Journal of e-Learning and Knowledge Society 2, 79–88 (2010)

    Google Scholar 

  • De Gasperis, G., Florio, N.: Learning to read/type a second language in a chatbot enhanced environment. In: Proceedings of ebTEL 2012: International Workshop on Evidenced-based Technology Enhanced Learning, University of Salamanca, March 28-30 (accepted for publication, 2012)

    Google Scholar 

  • De Pietro, O., Frontera, G.: TutorBot: An Application AIML-based for Web-Learning. In: Advanced Technology for Learning, vol. 2(1), ACTA Press (2005)

    Google Scholar 

  • Epstein, R., Roberts, G., Beber, G.: Parsing the Turing test: philosophical and methodological issues in the quest for the thinking computer. Springer, New York (2008)

    Google Scholar 

  • Eynon, R., Davie, C., Wilks, Y.: The Learning Companion: an Embodied Conversational Agent for Learning. In: Conference on WebSci 2009: Society On-Line (2009)

    Google Scholar 

  • Fellbaum, C.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  • Fellbaum, C.: WordNet and wordnets. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, pp. 665–670. Elsevier, Oxford (2005)

    Google Scholar 

  • Feng, D., Shaw, E., Kim, J., Hovy, E.: An intelligent Discussion-bot for answering student queries in threaded discussions. In: Proceeding of the International Conference on Intelligent User Interfaces, IUI, pp. 171–177 (2006)

    Google Scholar 

  • Guiraud, P.: Problèmes et méthodes de la statistique linguistique. Presses universitaires de France, Paris (1960)

    Google Scholar 

  • Heller, B., Procter, M., Mah, D., Jewell, L., Cheung, B.: Freudbot: An investigation of chatbot technology in distance education. In: Proceedings of the World Conference on Multimedia, Hypermedia and Telecommunication (2005)

    Google Scholar 

  • Hutchens, J.L.: How to pass the Turing test by cheating. School of Electrical, Electronic and Computer Engineering research report TR97-05. University of Western Australia, Perth (1996)

    Google Scholar 

  • Hutchens, J.L., Alder, M.D.: Introducing MegaHAL. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 271–274 (1998)

    Google Scholar 

  • Jia, J.: The study of the application of a keywords-based chatbot system on the teaching of foreign languages, Arxiv preprint cs/0310018 (2003)

    Google Scholar 

  • Jia, J.: The study of the application of a web-based chatbot system on the teaching of foreign languages. In: Ferdig, R.E., Crawford, C., Carlsen, R., Davis, N., Price, J., Weber, R., Willis, D.A. (eds.) Proceedings of Society for Information Technology and Teacher Education International Conference 2004, pp. 1201–1207 (2004)

    Google Scholar 

  • Jia, J.: CSIEC: A computer assisted English learning chatbot based on textual knowledge and reasoning. Knowledge-Based Systems 22(4), 249–255 (2009)

    Article  Google Scholar 

  • Kerly, A., Hall, P., Bull, S.: Bringing chatbots into education: Towards natural language negotiation of open learner models. Know.-Based Syst. 20(2), 177–185 (2007)

    Article  Google Scholar 

  • Kerry, A., Ellis, R., Bull, S.: Conversational Agents in E-Learning. In: Applications and Innovations in Intelligent Systems XVI, pp. 169–182 (2009)

    Google Scholar 

  • Kim, Y.G., Lee, C.H., Han, S.G.: Educational Application of Dialogue System to Support e-Learning. In: Association for the Advancement of Computing in Education, AACE (2002)

    Google Scholar 

  • Knill, O., Carlsson, J., Chi, A., Lezama, M.: An artificial intelligence experiment in college math education (2004), Preprint, http://www.math.harvard.edu/~knill/preprints/sofia.Pdf

  • Leech, G., Rayson, P., Wilson, A.: Word frequencies in written and spoken English: based on the British National Corpus. Longman, London (2001)

    Google Scholar 

  • Mauldin, M.L.: Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition. In: AAAI 1994 Proceedings of the Twelfth National Conference on Artificial Intelligence, vol. 1, pp. 16–21 (1994)

    Google Scholar 

  • Moor, J.: The Turing test: the elusive standard of artificial intelligence, vol. 6, p. 273. Kluwer Academic Publishers, Dordrecht (2003)

    MATH  Google Scholar 

  • Pirner, J.: The beast can talk (2012), Pdf. Published online, http://www.abenteuermedien.de/jabberwock/how-jabberwock-works.pdf (accessed February 2012)

  • Pirrone, R., Cannella, V., Russo, G.: Awareness mechanisms for an intelligent tutoring system. In: Proc. of 23th Association for the Advancement of Artificial Intelligence (2008)

    Google Scholar 

  • Santos-Pérez, M., González-Parada, E., Cano-García, J.M.: AVATAR: An Open Source Architecture for Embodied Conversational Agents in Smart Environments. In: Bravo, J., Hervás, R., Villarreal, V. (eds.) IWAAL 2011. LNCS, vol. 6693, pp. 109–115. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  • Schmid, H.: Probabilistic Part-of-Speech Tagging Using DecisionTrees. Paperpresented to the Proceedings of International Conference on New Methods in Language Processing (1994)

    Google Scholar 

  • Shawar, B.A., Atwell, E.: Using dialogue corpora to train a chatbot. In: Archer, D., Rayson, P., Wilson, A., McEnery, T. (eds.) Proceedings of the Corpus Linguistics 2003 Conference, pp. 681–690. Lancaster University (2003)

    Google Scholar 

  • Shawar, B.A., Atwell, E.: Machine Learning from dialogue corpora to generate chatbots. Expert Update Journal 6(3), 25–29 (2003)

    Google Scholar 

  • Shawar, B.A., Atwell, E.: A chatbot system as a tool to animate a corpus. ICAME J. 29, 5–24 (2005)

    Google Scholar 

  • Shawar, B.A., Atwell, E.: Chatbots: are they really useful? LDV Forum 22, 29–49 (2007)

    Google Scholar 

  • Shawar, B.A., Atwell, E.: Different measurements metrics to evaluate a chatbot system. In: Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pp. 89–96 (2007)

    Google Scholar 

  • Shieber, S.M.: The Turing test: verbal behavior as the hallmark of intelligence. MIT Press, Cambridge (2004)

    MATH  Google Scholar 

  • Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)

    Article  MathSciNet  Google Scholar 

  • Ueno, M., Mori, N., Matsumoto, K.: Novel Chatterbot System Utilizing Web Information. In: Distributed Computing and Artificial Intelligence, pp. 605–612 (2010)

    Google Scholar 

  • Veletsianos, G., Heller, R., Overmyer, S., Procter, M.: Conversational agents in virtual worlds: Bridging disciplines. Wiley Online Library, British Journal of Educational Technology 41(1), 123–140 (2010)

    Google Scholar 

  • Vieira, A.C., Teixeria, L., Timteo, A., Tedesco, P., Barros, F.: Analyzing online collaborative dialogues: The OXEnTCH-Chat. In: Proceedings of the Intelligent Tutoring Systems 7th International Conference, pp. 72–101. IEEE (2004)

    Google Scholar 

  • Vrajitoru, D.: Evolutionary sentence building for chatterbots. In: GECCO 2003 Late Breaking Papers, pp. 315–321 (2003)

    Google Scholar 

  • Vrajitoru, D.: NPCs and Chatterbots with Personality and Emotional Response. In: 2006 IEEE Symposium on Computational Intelligence and Games, pp. 142–147 (2006)

    Google Scholar 

  • Wallace, R.S., Tomabechi, H., Aimless, D.: Chatterbots Go Native: Considerations for an eco-system fostering the development of artificial life forms in a human world (2003), http://www.pandorabots.com/pandora/pics/chatterbotsgonative.doc (accessed February 2012)

  • Wallace, R.S.: The Anatomy of A.L.I.C.E. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing Test, pp. 181–210. Springer, Netherlands (2009)

    Chapter  Google Scholar 

  • Weizenbaum, J.: ELIZA A computer program for the study of natural language communication between man and machine. Communications of the ACM 10(8), 36–45 (1966)

    Article  Google Scholar 

  • Wilensky, R., Chin, D.N., Luria, M., Martin, J., Mayfield, J., Wu, D.: The Berkeley UNIX consultant project. Computational Linguistics 14(4), 35–84 (1988)

    Google Scholar 

  • Wu, Y., Wang, G., Li, W., Li, Z.: Automatic Chatbot Knowledge Acquisition from Online Forum via Rough Set and Ensemble Learning. In: IFIP International Conference on Network and Parallel Computing, NPC 2008, pp. 242–246. IEEE (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni De Gasperis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

De Gasperis, G., Chiari, I., Florio, N. (2013). AIML Knowledge Base Construction from Text Corpora. In: Yang, XS. (eds) Artificial Intelligence, Evolutionary Computing and Metaheuristics. Studies in Computational Intelligence, vol 427. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29694-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29694-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29693-2

  • Online ISBN: 978-3-642-29694-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics