Abstract
Script is the structured knowledge representation of prototypical real-life event sequences. Learning the commonsense knowledge inside the script can be helpful for machines in understanding natural language and drawing commonsensible inferences. Script learning is an interesting and promising research direction, in which a trained script learning system can process narrative texts to capture script knowledge and draw inferences. However, there are currently no survey articles on script learning, so we are providing this comprehensive survey to deeply investigate the standard framework and the major research topics on script learning. This research field contains three main topics: event representations, script learning models, and evaluation approaches. For each topic, we systematically summarize and categorize the existing script learning systems, and carefully analyze and compare the advantages and disadvantages of the representative systems. We also discuss the current state of the research and possible future directions.
摘要
脚本是现实世界中日常事件的结构化知识表示. 学习脚本中蕴含的丰富常识知识可以帮助机器理解自然语言并做出常识性推理. 脚本学习是一个颇具用途及潜力的研究方向, 一个经过训练的脚本学习系统可以处理叙事文本, 捕捉其中的脚本知识进而做出推理. 然而目前尚不存在针对脚本学习的综述性文章, 因此我们写作本文以深入研究脚本学习的基本框架和主要研究方向. 脚本学习主要包括3个重点研究内容: 事件表示方式、 脚本学习模型以及性能评估方法. 针对每一主题, 对现有脚本学习系统进行了系统总结和分类, 仔细分析和比较了其中代表性系统的优缺点. 此外, 研究并讨论了脚本学习的发展现状以及未来研究方向.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arrieta AB, Díaz-Rodríguez N, Del Ser J, et al., 2020. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform Fus, 58:82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Balasubramanian N, Soderland S, Mausam, et al., 2013. Generating coherent event schemas at scale. Proc Conf on Empirical Methods in Natural Language Processing, p.1721–1731.
Baroni M, Zamparelli R, 2010. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space. Proc Conf on Empirical Methods in Natural Language Processing, p.1183–1193. https://doi.org/10.5555/1870658.1870773
Bengio Y, Ducharme R, Vincent P, et al., 2003. A neural probabilistic language model. J Mach Learn Res, 3:1137–1155. https://doi.org/10.5555/944919.944966
Bordes A, Usunier N, Garcia-Durán A, et al., 2013. Translating embeddings for modeling multi-relational data. Proc 26th Int Conf on Neural Information Processing Systems, p.2787–2795. https://doi.org/10.5555/2999792.2999923
Bower GH, Black JB, Turner TJ, 1979. Scripts in memory for text. Cogn Psychol, 11(2):177–220. https://doi.org/10.1016/0010-0285(79)90009-4
Chambers N, 2017. Behind the scenes of an evolving event cloze test. Proc 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-Level Semantics, p.41–45. https://doi.org/10.18653/v1/w17-0905
Chambers N, Jurafsky D, 2008. Unsupervised learning of narrative event chains. Proc 46th Annual Meeting of the Association for Computational Linguistics, p.789–797.
Chambers N, Jurafsky D, 2009. Unsupervised learning of narrative schemas and their participants. Proc Joint Conf of the 47th Annual Meeting of the ACL and the 4th Int Joint Conf on Natural Language Processing of the AFNLP, p.602–610. https://doi.org/10.5555/1690219.1690231
Chung J, Gulcehre C, Cho K, et al., 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. https://arxiv.org/abs/1412.3555
Church KW, Hanks P, 1990. Word association norms, mutual information, and lexicography. Comput Ling, 16(1):22–29. https://doi.org/10.5555/89086.89095
Cullingford RE, 1978. Script Application: Computer Understanding of Newspaper Stories. PhD Thesis, Yale University, New Haven, CT, USA.
DeJong GF, 1979. Skimming Stories in Real Time: an Experiment in Integrated Understanding. PhD Thesis, Yale University, New Haven, CT, USA.
Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171–4186. https://doi.org/10.18653/v1/n19-1423
Ding X, Li ZY, Liu T, et al., 2019a. ELG: an event logic graph. https://arxiv.org/abs/1907.08015
Ding X, Liao K, Liu T, et al., 2019b. Event representation learning enhanced with external commonsense knowledge. Proc Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing, p.4896–4905. https://doi.org/10.18653/v1/D19-1495
Erk K, Padó S, 2008. A structured vector space model for word meaning in context. Proc Conf on Empirical Methods in Natural Language Processing, p.897–906. https://doi.org/10.5555/1613715.1613831
Fillmore CJ, 1976. Frame semantics and the nature of language. Ann N Y Acad Sci, 280(1):20–32. https://doi.org/10.1111/j.1749-6632.1976.tb25467.x
Glavaš G, Šnajder J, 2015. Construction and evaluation of event graphs. Nat Lang Eng, 21(4):607–652. https://doi.org/10.1017/S1351324914000060
Gordon AS, 2001. Browsing image collections with representations of common-sense activities. J Am Soc Inform Sci Technol, 52(11):925–929. https://doi.org/10.1002/asi.1143
Granroth-Wilding M, Clark S, 2016. What happens next? Event prediction using a compositional neural network model. Proc 30th AAAI Conf on Artificial Intelligence, p.2727–2733. https://doi.org/10.5555/3016100.3016283
Gupta R, Kochenderfer MJ, 2004. Common sense data acquisition for indoor mobile robots. Proc 19th National Conf on Artifical Intelligence, p.605–610. https://doi.org/10.5555/1597148.1597246
Harris ZS, 1954. Distributional structure. Word, 10(2–3):146–162.
Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu LM, Li JZ, Nie LQ, et al., 2017. What happens next? Future subevent prediction using contextual hierarchical LSTM. Proc 31st AAAI Conf on Artificial Intelligence, p.3450–3456. https://doi.org/10.5555/3298023.3298070
Jans B, Bethard S, Vulic, et al., 2012. Skip N-grams and ranking functions for predicting script events. Proc 13th Conf of the European Chapter of the Association for Computational Linguistics, p.336–344.
Jones MP, Martin JH, 1997. Contextual spelling correction using latent semantic analysis. Proc 5th Conf on Applied Natural Language Processing, p.166–173. https://doi.org/10.3115/974557.974582
Kaelbling LP, Littman ML, Moore AW, 1996. Reinforcement learning: a survey. J Artif Intell Res, 4:237–285. https://doi.org/10.1613/jair.301
Khan A, Salim N, Kumar YJ, 2015. A framework for multi-document abstractive summarization based on semantic role labelling. Appl Soft Comput, 30:737–747. https://doi.org/10.1016/j.asoc.2015.01.070
Kiros R, Zhu YK, Salakhutdinov R, et al., 2015. Skip-thought vectors. Proc 28th Int Conf on Neural Information Processing Systems, p.3294–3302. https://doi.org/10.5555/2969442.2969607
Koh PW, Liang P, 2017. Understanding black-box predictions via influence functions. Proc 34th Int Conf on Machine Learning, p.1885–1894.
Laender AHF, Ribeiro-Neto BA, Da Silva AS, et al., 2002. A brief survey of web data extraction tools. ACM SIGMOD Rec, 31(2):84–93. https://doi.org/10.1145/565117.565137
Lee G, Flowers M, Dyer MG, 1992. Learning distributed representations of conceptual knowledge and their application to script-based story processing. In: Sharkey N (Ed.), Connectionist Natural Language Processing. Springer, Dordrecht, p.215–247. https://doi.org/10.1007/978-94-011-2624-3_11
Lee IT, Goldwasser D, 2018. FEEL: featured event embedding learning. Proc 32nd AAAI Conf on Artificial Intelligence.
Lee IT, Goldwasser D, 2019. Multi-relational script learning for discourse relations. Proc 57th Annual Meeting of the Association for Computational Linguistics, p.4214–4226. https://doi.org/10.18653/v1/p19-1413
Li JW, Monroe W, Ritter A, et al., 2016. Deep reinforcement learning for dialogue generation. Proc Conf on Empirical Methods in Natural Language Processing, p.1192–1202. https://doi.org/10.18653/v1/D16-1127
Li Q, Li ZW, Wei JM, et al., 2018. A multi-attention based neural network with external knowledge for story ending predicting task. Proc 27th Int Conf on Computational Linguistics, p.1754–1762.
Li ZY, Ding X, Liu T, 2018. Constructing narrative event evolutionary graph for script event prediction. Proc 27th Int Joint Conf on Artificial Intelligence, p.4201–4207. https://doi.org/10.5555/3304222.3304354
Li ZY, Ding X, Liu T, 2019. Story ending prediction by transferable BERT. Proc 28th Int Joint Conf on Artificial Intelligence, p.1800–1806. https://doi.org/10.24963/ijcai.2019/249
Lin YK, Liu ZY, Sun MS, et al., 2015. Learning entity and relation embeddings for knowledge graph completion. Pro 29th AAAI Conf on Artificial Intelligence.
Lin ZH, Feng MW, Dos Santos CN, et al., 2017. A structured self-attentive sentence embedding. Proc 5th Int Conf on Learning Representations.
Luong T, Pham H, Manning CD, 2015. Effective approaches to attention-based neural machine translation. Proc Conf on Empirical Methods in Natural Language Processing, p.1412–1421. https://doi.org/10.18653/v1/d15-1166
Lv SW, Qian WH, Huang LT, et al., 2019. SAM-Net: integrating event-level and chain-level attentions to predict what happens next. Proc AAAI Conf on Artificial Intelligence, p.6802–6809. https://doi.org/10.1609/aaai.v33i01.33016802
Mausam, Schmitz M, Bart R, et al., 2012. Open language learning for information extraction. Proc Joint Conf on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, p.523–534. https://doi.org/10.5555/2390948.2391009
McCann B, Bradbury J, Xiong CM, et al., 2017. Learned in translation: contextualized word vectors. Proc 31st Int Conf on Neural Information Processing Systems, p.6297–6308. https://doi.org/10.5555/3295222.3295377
Miikkulainen R, 1992. Discern: a distributed neural network model of script processing and memory. University Twente, Connectionism and Natural Language Processing, p.115–124.
Miikkulainen R, 1993. Subsymbolic Natural Language Processing: an Integrated Model of Scripts, Lexicon, and Memory. MIT Press, Cambridge, USA.
Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
Miller GA, 1995. WordNet: a lexical database for English. Commun ACM, 38(11):39–41. https://doi.org/10.1145/219717.219748
Minsky M, 1975. A framework for representing knowledge. In: Winston PH (Ed.), The Psychology of Computer Vision. McGraw-Hill Book, New York, USA.
Mnih A, Hinton G, 2007. Three new graphical models for statistical language modelling. Proc 24th Int Conf on Machine Learning, p.641–648. https://doi.org/10.1145/1273496.1273577
Modi A, 2016. Event embeddings for semantic script modeling. Proc 20th SIGNLL Conf on Computational Natural Language Learning, p.75–83. https://doi.org/10.18653/v1/k16-1008
Modi A, Titov I, 2014a. Inducing neural models of script knowledge. Proc 18th Conf on Computational Natural Language Learning, p.49–57. https://doi.org/10.3115/v1/w14-1606
Modi A, Titov I, 2014b. Learning semantic script knowledge with event embeddings. Proc 2nd Int Conf on Learning Representations.
Modi A, Anikina T, Ostermann S, et al., 2016. InScript: narrative texts annotated with script information. Proc 10th Int Conf on Language Resources and Evaluation.
Modi A, Titov I, Demberg V, et al., 2017. Modeling semantic expectation: using script knowledge for referent prediction. Trans Assoc Comput Ling, 5(2):31–44. https://doi.org/10.1162/tacl_a_00044
Mostafazadeh N, Chambers N, He XD, et al., 2016. A corpus and cloze evaluation for deeper understanding of commonsense stories. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.839–849. https://doi.org/10.18653/v1/n16-1098
Mueller ET, 1998. Natural Language Processing with ThoughtTreasure. Signiform, New York, USA.
Navigli R, 2009. Word sense disambiguation: a survey. ACM Comput Surv, 41(2):10. https://doi.org/10.1145/1459352.1459355
Orr JW, Tadepalli P, Doppa JR, et al., 2014. Learning scripts as hidden Markov models. Proc 28th AAAI Conf on Artificial Intelligence, p.1565–1571. https://doi.org/10.5555/2892753.2892770
Osman AH, Salim N, Binwahlan MS, et al., 2012. Plagiarism detection scheme based on semantic role labeling. Proc Int Conf on Information Retrieval & Knowledge Management, p.30–33. https://doi.org/10.1109/InfRKM.2012.6204978
Pei KX, Cao YZ, Yang JF, et al., 2017. DeepXplore: automated whitebox testing of deep learning systems. Proc 26th Symp on Operating Systems Principles, p.1–18. https://doi.org/10.1145/3132747.3132785
Pennington J, Socher R, Manning C, 2014. GloVe: global vectors for word representation. Proc Conf on Empirical Methods in Natural Language Processing, p.1532–1543. https://doi.org/10.3115/v1/d14-1162
Perozzi B, Al-Rfou R, Skiena S, 2014. DeepWalk: online learning of social representations. Proc 20th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.701–710. https://doi.org/10.1145/2623330.2623732
Peters M, Neumann M, Iyyer M, et al., 2018. Deep contextualized word representations. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.2227–2237. https://doi.org/10.18653/v1/n18-1202
Pichotta K, Mooney R, 2014. Statistical script learning with multi-argument events. Proc 14th Conf of the European Chapter of the Association for Computational Linguistics, p.220–229. https://doi.org/10.3115/v1/e14-1024
Pichotta K, Mooney RJ, 2016a. Learning statistical scripts with LSTM recurrent neural networks. Proc 30th AAAI Conf on Artificial Intelligence, p.2800–2806. https://doi.org/10.5555/3016100.3016293
Pichotta K, Mooney RJ, 2016b. Using sentence-level LSTM language models for script inference. Proc 54th Annual Meeting of the Association for Computational Linguistics, p.279–289. https://doi.org/10.18653/v1/p16-1027
Prasad R, Dinesh N, Lee A, et al., 2008. The Penn discourse Treebank 2.0. Proc Int 6th Conf on Language Resources and Evaluation, p.2961–2968.
Qiu XP, Sun TX, Xu YG, et al., 2020. Pre-trained models for natural language processing: a survey. https://arxiv.org/abs/2003.08271
Radford A, Narasimhan K, Salimans T, et al., 2019. Improving language understanding by generative pre-training. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171–4186.
Radinsky K, Agichtein E, Gabrilovich E, et al., 2011. A word at a time: computing word relatedness using temporal semantic analysis. Proc 20th Int Conf on World Wide Web, p.337–346. https://doi.org/10.1145/1963405.1963455
Rashkin H, Sap M, Allaway E, et al., 2018. Event2Mind: commonsense inference on events, intents, and reactions. Proc 56th Annual Meeting of the Association for Computational Linguistics, p.463–473. https://doi.org/10.18653/v1/P18-1043
Regneri M, Koller A, Pinkal M, 2010. Learning script knowledge with web experiments. Proc 48th Annual Meeting of the Association for Computational Linguistics, p.979–988. https://doi.org/10.5555/1858681.1858781
Rudinger R, Rastogi P, Ferraro F, et al., 2015. Script induction as language modeling. Proc Conf on Empirical Methods in Natural Language Processing, p.1681–1686. https://doi.org/10.18653/v1/d15-1195
Rumelhart DE, 1980. Schemata: the building blocks of cognition. In: Spiro RJ (Ed.), Theoretical Issues in Reading Comprehension. Erlbaum, Hillsdale, p.33–58.
Sap M, Le Bras R, Allaway E, et al., 2019. ATOMIC: an atlas of machine commonsense for if-then reasoning. Proc AAAI Conf on Artificial Intelligence, p.3027–3035. https://doi.org/10.1609/aaai.v33i01.33013027
Schank RC, 1983. Dynamic Memory: a Theory of Reminding and Learning in Computers and People. Cambridge University Press, New York, USA.
Schank RC, 1990. Tell Me a Story: a New Look at Real and Artificial Memory. Charles Scribner, New York, USA.
Schank RC, Abelson RP, 1977. Scripts, Plans, Goals and Understanding: an Inquiry into Human Knowledge Structures. L. Erlbaum, Hillsdale, USA.
Schuler KK, 2005. VerbNet: a Broad-Coverage, Comprehensive Verb Lexicon. PhD Thesis, University of Pennsylvania, Pennsylvania, USA.
Shen D, Lapata M, 2007. Using semantic roles to improve question answering. Proc Joint Conf on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, p.12–21.
Socher R, Huval B, Manning CD, et al., 2012. Semantic compositionality through recursive matrix-vector spaces. Proc Joint Conf on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, p.1201–1211. https://doi.org/10.5555/2390948.2391084
Sutton RS, Barto AG, 2018. Reinforcement Learning: an Introduction (2nd Ed.). MIT Press, Cambridge, USA.
Taylor WL, 1953. “Cloze procedure”: a new tool for measuring readability. J Mass Commun Q, 30(4):415–433. https://doi.org/10.1177/107769905303000401
Terry WS, 2006. Learning and Memory: Basic Principles, Processes, and Procedures. Allyn and Bacon, Boston, USA.
Tulving E, 1983. Elements of Episodic Memory. Oxford University Press, New York, USA.
Wang Z, Zhang JW, Feng JL, et al., 2014. Knowledge graph embedding by translating on hyperplanes. Proc 28th AAAI Conf on Artificial Intelligence, p.1112–1119. https://doi.org/10.5555/2893873.2894046
Wang ZQ, Zhang Y, Chang CY, 2017. Integrating order information and event relation for script event prediction. Proc Conf on Empirical Methods in Natural Language Processing, p.57–67. https://doi.org/10.18653/v1/d17-1006
Weber N, Balasubramanian N, Chambers N, 2018. Event representations with tensor-based compositions. Proc 32nd AAAI Conf on Artificial Intelligence, p.4946–4953.
Weston J, Chopra S, Bordes A, 2015. Memory networks. https://arxiv.org/abs/1410.3916
Zhao SD, Wang Q, Massung S, et al., 2017. Constructing and embedding abstract event causality networks from text snippets. Proc 10th ACM Int Conf on Web Search and Data Mining, p.335–344. https://doi.org/10.1145/3018661.3018707
Zheng JM, Cai F, Chen HH, 2020. Incorporating scenario knowledge into a unified fine-tuning architecture for event representation. Proc 43rd Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.249–258. https://doi.org/10.1145/3397271.3401173
Zhou MT, Huang ML, Zhu XY, 2019. Story ending selection by finding hints from pairwise candidate endings. IEEE/ACM Trans Audio Speech Lang Process, 27(4):719–729. https://doi.org/10.1109/TASLP.2019.2893499
Author information
Authors and Affiliations
Contributions
Yi HAN drafted the manuscript. Linbo QIAO, Jianming ZHENG, and Hefeng WU helped organize the manuscript. Dongsheng LI and Xiangke LIAO led the preparation of the manuscript. Yi HAN and Linbo QIAO revised and finalized the paper.
Corresponding author
Ethics declarations
Yi HAN, Linbo QIAO, Jianming ZHENG, Hefeng WU, Dongsheng LI, and Xiangke LIAO declare that they have no conflict of interest.
Additional information
Project supported by the National Natural Science Foundation of China (No. 61806216)
Yi HAN, first author of this invited paper, received his BS and MS degrees from the National University of Defense Technology (NUDT), Changsha, China in 2016 and 2018, respectively, and is currently a PhD candidate at the College of Computer Science, NUDT. His research interests include natural language processing, event extraction, and few-shot learning.
Linbo QIAO, corresponding author of this invited paper, received his BS, MS, and PhD degrees in computer science and technology from NUDT, Changsha, China in 2010, 2012, and 2017, respectively. Now, he is an assistant research fellow at the National Lab for Parallel and Distributed Processing, NUDT. He worked as a research assistant at the Chinese University of Hong Kong from May to Oct. 2014. His research interests include structured sparse learning, online and distributed optimization, and deep learning for graph and graphical models.
Jianming ZHENG received the BS and MS degrees from NUDT, China in 2016 and 2018, respectively, and is currently a PhD candidate at the School of System Engineering, NUDT. His research interests include semantics representation, few-shot learning and its applications in information retrieval. He has several papers published in SIGIR, WWW, COLING, IPM, Cognitive Computation, etc.
Hefeng WU received the BS degree in Computer Science and Technology and the PhD degree in Computer Application Technology from Sun Yat-sen University, China in 2008 and 2013, respectively. He is currently a full research scientist with the School of Data and Computer Science, Sun Yat-sen University. His research interests include computer vision, multimedia, and machine learning. He has served as a reviewer for many academic journals and conferences, including TIP, TCYB, TSMC, TCSVT, PR, CVPR, ICCV, NeurIPS, and ICML.
Dongsheng LI received his BS degree (with honors) and PhD degree (with honors) in computer science from the College of Computer Science, NUDT, Changsha, China in 1999 and 2005, respectively. He was awarded the prize of National Excellent Doctoral Dissertation by the Ministry of Education of China in 2008, and the National Science Fund for Distinguished Young Scholars in 2020. He is now a full professor at the National Lab for Parallel and Distributed Processing, NUDT. He is a corresponding expert of Frontiers of Information Technology & Electronic Engineering. His research interests include parallel and distributed computing, cloud computing, and large-scale data management.
Xiangke LIAO received his BS degree from the Department of Computer Science and Technology, Tsinghua University, Beijing, China in 1985, and his MS degree from NUDT, Changsha, China in 1988. He is currently a full professor of NUDT, and an academician of the Chinese Academy of Engineering. His research interests include parallel and distributed computing, high-performance computer systems, operating systems, cloud computing, and networked embedded systems.
Rights and permissions
About this article
Cite this article
Han, Y., Qiao, L., Zheng, J. et al. A survey of script learning. Front Inform Technol Electron Eng 22, 341–373 (2021). https://doi.org/10.1631/FITEE.2000347
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2000347