Abstract
Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna stecs, which focused on Referring Expression Generation (reg), have been part of this development since its inception. This chapter looks back on the experience of organising the three tuna Challenges, which came to an end in 2009. While we discuss the role of the stecs in yielding a substantial body of research on the reg problem, which has opened new avenues for future research, our main focus is on the role of different evaluation methods in assessing the output quality of reg algorithms, and on the relationship between such methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Appelt, D.: Planning English referring expressions. Artificial Intelligence 26(1), 1–33 (1985)
Appelt, D., Kronfeld, A.: A computational model of referring. In: Proceedings of the 10th International Joint Conference on Artificial Intelligence (IJCAI 1987), pp. 640–647 (1987)
Bard, E.G., Robertson, D., Sorace, A.: Magnitude estimation of linguistic acceptability. Language 72(1), 32–68 (1996)
Belke, E., Meyer, A.: Tracking the time course of multidimensional stimulus discrimination: Analysis of viewing patterns and processing times during same-different decisions. European Journal of Cognitive Psychology 14(2), 237–266 (2002)
Belz, A.: Statistical generation: Three methods compared and evaluated. In: Proceedings of the 10th European Workshop on Natural Language Generation (ENLG 2005), pp. 15–23 (2005)
Belz, A., Gatt, A.: The attribute selection for gre challenge: Overview and evaluation results. In: Proceedings of UCNLG+MT: Language Generation and Machine Translation, pp. 75–83 (2007)
Belz, A., Gatt, A.: Intrinsic vs. extrinsic evaluation measures for referring expression generation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), pp. 197–200 (2008)
Belz, A., Kow, E.: System-building cost vs. output quality in data-to-text generation. In: Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), pp. 16–24 (2009)
Belz, A., Kow, E., Viethen, J., Gatt, A.: Generating referring expressions in context: The grec task evaluation challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 294–328. Springer, Heidelberg (2010)
Belz, A., Reiter, E.: Comparing automatic and human evaluation of nlg systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 313–320 (2006)
Belz, A.: That’s nice.. what can you do with it? Computational Linguistics 35(1), 111–118 (2009)
Belz, A., Kilgarriff, A.: Shared task evaluations in HLT: Lessons for NLG. In: Proceedings of INLG 2006, pp. 133–135 (2006)
Black, A., Taylor, P., Caley, R.: The Festival speech synthesis system: System documentation. Tech. Rep. 1.4 edition., University of Edinburgh (1999)
Bohnet, B.: is-fbn, is-fbs, is-iac: The adaptation of two classic algorithms for the generation of referring expressions in order to produce expressions like humans do. In: Proceedings of UCNLG+MT: Language Generation and Machine Translation, pp. 84–86 (2007)
Bohnet, B.: The fingerprint of human referring expressions and their surface realization with graph transducers. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 207–210 (2008)
Bohnet, B., Dale, R.: Viewing referring expression generation as search. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 1004–1009 (2005)
Cahill, A., van Genabith, J.: Robust pcfg-based generation using automatically acquired lfg approximations. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), pp. 1033–1040 (2006)
Callaway, C.B.: Evaluating coverage for large symbolic nlg grammars. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 811–817 (2003)
Callaway, C.B., Lester, J.C.: Narrative prose generation. Artificial Intelligence 139(2), 213–252 (2002)
Calliston-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of bleu in machine translation research. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 249–256 (2006)
Dale, R., Reiter, E.: Computational interpretation of the Gricean maxims in the generation of referring expressions. Cognitive Science 19(8), 233–263 (1995)
Dale, R., White, M. (eds.): Shared Tasks and Comparative Evaluation in Natural Language Generation: Workshop Report (2007), http://www.ling.ohio-state.edu/nlgeval07/NLGEval07-Report.pdf
Dale, R.: Cooking up referring expressions. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, ACL 1989, pp. 68–75 (1989)
van Deemter, K.: Generating referring expressions: Boolean extensions of the incremental algorithm. Computational Linguistics 28(1), 37–52 (2002)
van Deemter, K., Gatt, A.: Content determination in GRE: Evaluating the evaluator. In: Proceedings of the 2nd UCNLG Workshop: Language Generation and Machine Translation (2007)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the 2nd International Conference on Human Language Technology Research (HLT 2002), pp. 138–145 (2002)
Dorr, B.J., Monz, C., President, S., Schwartz, R., Zajic, D.: A methodology for extrinsic evaluation of text summarization: Does rouge correlate? In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarisation, pp. 1–8 (2005)
Engelhardt, P.E., Bailey, K., Ferreira, F.: Do speakers and listeners observe the Gricean Maxim of Quantity? Journal of Memory and Language 54, 554–573 (2006)
Fabbrizio, G.D., Stent, A.J., Bangalore, S.: Referring expression generation using speaker-based attribute selection and trainable realization (att-reg). In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 211–214 (2008)
Fabbrizio, G.D., Stent, A.J., Bangalore, S.: Trainable speaker-based referring expression generation. In: Proceedings of the 12th Conference on Computational Natural Language Learning (CONLL 2008), pp. 151–158 (2008)
Foster, M.: Automated metrics that agree with human judgements on generated output for an embodied conversational agent. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 95–103 (2008)
Gardent, C.: Generating minimal definite descriptions. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 96–103 (2002)
Gatt, A., Belz, A.: Attribute selection for referring expression generation: New algorithms and evaluation methods. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 50–58 (2008)
Gatt, A., Belz, A., Kow, E.: The tuna Challenge 2008: Overview and evaluation results. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 198–206 (2008)
Gatt, A., Belz, A., Kow, E.: The tuna-reg Challenge 2009: Overview and evaluation results. In: Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), pp. 198–206 (2009)
Gatt, A., van Deemter, K.: Lexical choice and conceptual perspective in the generation of plural referring expressions. Journal of Logic, Language and Information 16(4), 423–443 (2007)
Gatt, A., van der Sluis, I., van Deemter, K.: Evaluating algorithms for the generation of referring expressions using a balanced corpus. In: Proceedings of the 11th European Workshop on Natural Language Generation (ENLG 2007), pp. 45–56 (2007)
Grice, H.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and Semantics: Speech Acts, vol. III. Academic Press, London (1975)
Gupta, S., Stent, A.J.: Automatic evaluation of referring expression generation using corpora. In: Proceedings of the 1st Workshop on Using Corpora in NLG (UCNLG 2005), pp. 1–6 (2005)
Hervás, R., Gervás, P.: Evolutionary and case-based approaches to reg. In: Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), pp. 187–188 (2009)
Jordan, P., Walker, M.: Learning attribute selections for non-pronominal expressions. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (2000)
Jordan, P.W.: Can nominal expressions achieve multiple goals? In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), pp. 142–149 (2000)
Jordan, P.W., Walker, M.: Learning content selection rules for generating object descriptions in dialogue. Journal of Artificial Intelligence Research 24, 157–194 (2005)
Karasimos, A., Isard, A.: Multi-lingual evaluation of a natural language generation system. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004 (2004)
Kelleher, J., Namee, B.M.: Referring expression generation challenge 2008 DIT system descriptions. In: Proceedings of the 5th International Conference on Natural Langauge Generation (INLG 2008), pp. 221–224 (2008)
King, J.: OSU-GP: Attribute selection using genetic programming. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 225–226 (2008)
Koller, A., Striegnitz, K., Byron, D., Cassell, J., Dale, R., Moore, J., Oberlander, J.: The first challenge on generating instructions in virtual environments. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 329–353. Springer, Heidelberg (2010)
Koolen, R., Gatt, A., Goudbeek, M., Krahmer, E.: Need I say more? On factors causing referential overspecification. In: Proceedings of the Workshop on Production of Referring Expressions: Bridging Computational and Psycholinguistic Approaches (pre-cogsci 2009) (2009)
Krahmer, E., van Erk, S., Verleg, A.: Graph-based generation of referring expressions. Computational Linguistics 29(1), 53–72 (2003)
Kronfeld, A.: Conversationally relevant descriptions. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL 1989), pp. 60–67 (1989)
Langkilde-Geary, I.: An empirical verification of coverage and correctness for a general-purpose sentence generator. In: Proceedings of the 2nd International Conference on Natural Language Generation, INLG 2002 (2002)
Law, A.S., Freer, Y., Hunter, J., Logie, R., McIntosh, N., Quinn, J.: A comparison of graphical and textual presentations of time series data to support medical decision making in the neonatal intensive care unit. Journal of Clinical Monitoring and Computing 19, 183–194 (2005)
Lester, J., Porter, B.: Developing and empirically evaluating robust explanation generators: The knight experiments. Computational Linguistics 23(1), 65–101 (1997)
Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of HLT-NAACL 2003, pp. 71–78 (2003)
de Lucena, D., Paraboni, I.: usp-each frequency-based greedy attribute selection for referring expressions generation. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 219–220 (2008)
Maes, A., Arts, A., Noordman, L.: Reference management in instructive discourse. Discourse Processes 37(2), 117–144 (2004)
Miyao, Y., Saetre, R., Sagae, K., Matsuzaki, T., Tsujii, J.: Task-oriented evaluations of syntactic parsers and their representations. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), pp. 46–54 (2008)
Papineni, S., Roukos, T., Ward, W., Zhu., W.: bleu: A method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 311–318 (2002)
Passonneau, R.: Measuring agreement on set-valued items (masi) for semantic and pragmatic annotation. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006)
Pechmann, T.: Incremental speech production and referential overspecification. Linguistics 27, 89–110 (1989)
Pereira, D.B., Paraboni, I.: From TUNA attribute sets to Portuguese text: A first report. In: Proceedings of the 5th International Conference on Natural Language Generation (INLG 2008), pp. 232–234 (2008)
Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer, Y., Sykes, C.: Automatic generation of textual summaries from neonatal intensive care data. Artificial Intelligence 173(7–8), 789–816 (2009)
Reips, U.D.: The Web Experimental Psychology Lab: Five years of data collection on the Internet. Behavioral Research Methods and Computers 33(2), 201–211 (2001)
Reiter, E., Belz, A.: An investigation into the validity of some metrics for automatically evaluating Natural Language Generation systems. Computational Linguistics 35(4), 529–558 (2009)
Reiter, E., Robertson, R., Osman, L.: Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence 144, 41–58 (2003)
Reiter, E., Sripada, S.: Should corpora texts be gold standards for nlg? In: Proceedings of the 2nd International Conference on Natural Language Generation, INLG 2002 (2002)
Reiter, E., Sripada, S., Hunter, J., Yu, J., Davy, I.: Choosing words in computer-generated weather forecasts. Artificial Intelligence 167, 137–169 (2005)
van der Sluis, I., Gatt, A., van Deemter, K.: Evaluating algorithms for the generation of referring expressions: Going beyond toy domains. In: Proceedings of the Conference on Recent Advances in Natural Language Processing, RANLP 2007 (2007)
van der Sluis, I., Krahmer, E.: Generating multimodal referring expressions. Discourse Processes 44(3), 145–174 (2007)
Spanger, P., Kurosawa, T., Tokunaga, T.: TITCH: Attribute selection based on discrimination power and frequency. In: Proceedings of UCNLG+MT: Language Generation and Machine Translation, pp. 98–100 (2008)
Spärck Jones, K., Galliers, J.R.: Evaluating natural language processing systems: An analysis and review. Springer, Berlin (1996)
Stock, O., Zancanaro, M., Busetta, P., Callaway, C., Krueger, A., Kruppa, M., Kuflik, T., Not, E., Rocchi, C.: Adaptive, intelligent presentation of information for the museum visitor in peach. User Modeling and User-Adapted Interaction 17(3), 257–304 (2007)
von Stutterheim, C., Mangold-Allwinn, R., Barattelli, S., Kohlmann, U., Kölbing, H.G.: Reference to objects in text production. Belgian Journal of Linguistics 8, 99–125 (1993)
Tanenhaus, M.K., Spivey-Knowlton, M.J., Eberhard, K.M., Sedivy, J.G.: Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632–1634 (1995)
Theune, M., Touset, P., Viethen, J., Krahmer, E.: Cost-based attribute selection for generating referring expressions (graph-fp and graph-sc). In: Proceedings of UCNLG+MT: Language Generation and Machine Translation, pp. 95–97 (2007)
Viethen, J., Dale, R.: Algorithms for generating referring expressions: Do they do what people do? In: Proceedings of the 4th International Conference on Natural Language Generation (INLG 2006), pp. 63–70 (2006)
Viethen, J., Dale, R.: Evaluation in natural language generation: Lessons from referring expression generation. Traitement Automatique des Langues 48(1), 141–160 (2007)
White, M., Rajkumar, R., Martin, S.: Towards broad coverage surface realization with ccg. In: Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation, UCNLG+MT (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gatt, A., Belz, A. (2010). Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-15573-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15572-7
Online ISBN: 978-3-642-15573-4
eBook Packages: Computer ScienceComputer Science (R0)