Skip to main content
Log in

REX-J: Japanese referring expression corpus of situated dialogs

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Identifying objects in conversation is a fundamental human capability necessary to achieve efficient collaboration on any real world task. Hence the deepening of our understanding of human referential behaviour is indispensable for the creation of systems that collaborate with humans in a meaningful way. We present the construction of REX-J, a multi-modal Japanese corpus of referring expressions in situated dialogs, based on the collaborative task of solving the Tangram puzzle. This corpus contains 24 dialogs with over 4 h of recordings and over 1,400 referring expressions. We outline the characteristics of the collected data and point out the important differences from previous corpora. The corpus records extra-linguistic information during the interaction (e.g. the position of pieces, the actions on the pieces) in synchronization with the participants’ utterances. This in turn allows us to discuss the importance of creating a unified model of linguistic and extra-linguistic information from a new perspective. Demonstrating the potential uses of this corpus, we present the analysis of a specific type of referring expression (“action-mentioning expression”) as well as the results of research into the generation of demonstrative pronouns. Furthermore, we discuss some perspectives on potential uses of this corpus as well as our planned future work, underlining how it is a valuable addition to the existing databases in the community for the study and modeling of referring expressions in situated dialog.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. In Japanese culture, there exists the so-called senpai-kouhai relationship (relationship of senior to junior or socially higher to lower placed). Any different selection of experiment participants than we carried out would run the risk of including the effects of this social relationship and thus skew the collected data: such as the possible use of overly polite and indirect language, reluctance to correct mistakes etc. There has been some recent work on dealing with such cultural factors and creating standardized resources (Rehm et al. 2008). In our study, we sought to avoid cultural effects as far as possible.

  2. Since Japanese has no article marking definiteness, distinguishing between definite and indefinite expressions depends on their contexts.

  3. http://www.lat-mpi.eu/tools/elan/.

  4. The REX-J corpus will be distributed through GSK (Language Resources Association in Japan; http://www.gsk.or.jp/index_e.html).

References

  • Anderson A. H., Bader, M., Bard E. G., Boyle E., Doherty G., Garrod S., et al. (1991). The HCRC map task corpus. Language and Speech, 34(4), 351–366.

    Google Scholar 

  • Artstein, R., & Poesio, M. (2005). Kappa 3 = Alpha (or Beta). Technical Report CSM-437, University of Essex.

  • Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.

    Article  Google Scholar 

  • Baran B., Dogusoy, B., & Cagiltay K. (2007). How do adults solve digital tangram problems? Analyzing cognitive strategies through eye tracking approach. In HCI International 2007—12th international conference—Part III (pp. 555–563).

  • Bard, E. G., Hill, R., Arai, M., & Foster, M. E. (2009). Accessibility and attention in situated dialogue: Roles and regulations. In Proceedings of the workshop on production of referring expressions Pre-CogSci 2009.

  • Blache, P., Bertrand, R., & Ferré, G. (2009). Creating and exploiting multimodal annotated corpora: The ToMA project. In M. Kipp, J.-C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora (pp. 38–53). Berlin: Springer.

    Chapter  Google Scholar 

  • Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on computer graphics and interactive techniques (SIGRAPH 1980) (pp. 262–270). ACM.

  • Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition, 22(6), 1482–1493.

    Article  Google Scholar 

  • Brennan, S. E., Friedman, M. W., & Pollard, C. J. (1987). A centering approach to pronouns. In Proceedings of the 25th annual meeting on association for computational linguistics (pp. 155–162). Morristown, NJ. Association for Computational Linguistics.

  • Buschmeier, H., Bergmann, K., & Kopp, S. (2009). An alignment-capable microplanner for natural language generation. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 82–89), Athens, Greece. Association for Computational Linguistics.

  • Byron, D., Mampilly, T., Sharma, V., & Xu, T. (2005). Utilizing visual attention for cross-modal coreference interpretation. In Modeling and using context—5th international and interdisciplinary conference CONTEXT 2005 (pp. 83–96).

  • Byron, D. K., & Fosler-Lussier, E. (2006). The OSU Quake 2004 corpus of two-party situated problem-solving dialogs. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2006).

  • Byron, D. K., & Stoia, L. (2005). An analysis of proximity markers in collaborative dialogs. In Proceedings of the 41st annual meeting of the Chicago Linguistic Society.

  • Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.

    Google Scholar 

  • Cavicchio, F., & Poesio, M. (2009). Multimodal corpora annotation: Validation methods to assess coding scheme reliability. In M. Kipp, J.-C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora (pp. 109–121). Berlin: Springer.

    Chapter  Google Scholar 

  • Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39.

    Article  Google Scholar 

  • Dale, R. (1989). Cooking up referring expressions. In Proceedings of 27th annual meeting of the association for computational linguistics (pp. 68–75).

  • Dale, R., & Reiter, E. (1995). Computational interpretation of the Gricean maxims in the generation of referring expressions. Cognitive Science, 19(2), 233–263.

    Article  Google Scholar 

  • Dale, R., & Viethen, J. (2009). Referring expression generation through attribute-based heuristics. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 58–65).

  • Di Eugenio, B., Jordan, P. W., Thomason R. H., & Moore, J. D. (2000). The agreement process: An empirical investigation of human-human computer-mediated collaborative dialogues. International Journal of Human-Computer Studies, 53(6), 1017–1076.

    Article  Google Scholar 

  • Diessel, H. (2006). Demonstratives, joint attention, and the emergence of grammar. Cognitive Linguistics, 17(4), 463–489.

    Article  Google Scholar 

  • Foster, M. E., Bard, E. G., Guhe, M., Hill, R. L., Oberlander, J., & Knoll, A. (2008). The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue. In Proceedings of 3rd human–robot interaction (pp. 295–302).

  • Foster, M. E., & Oberlander, J. (2007). Corpus-based generation of head and eyebrow motion for an embodied conversational agent. Language Resources and Evaluation, 41(3–4), 305–323.

    Article  Google Scholar 

  • Funakoshi, K., & Tokunaga, S. W. T. (2006). Group-based generation of referring expressions. In Proceedings of the 4th international natural language generation conference (INLG 2006) (pp. 73–80).

  • Gatt, A., Belz, A., & Kow, E. (2009). The TUNA-REG challenge 2009: Overview and evaluation results. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 174–182).

  • Gatt, A., van der Sluis, I., & van Deemter, K. (2007). Evaluating algorithms for the generation of referring expressions using a balanced corpus. In Proceedings of the 11th European workshop on natural language generation (ENLG 2007) (pp. 49–56).

  • Gergle, D., & Kraut, C. P. R. R. E. (2007). Modeling the impact of shared visual information on collaborative reference. In Proceedings of 25th computer/human interaction conference (pp. 1543–1552).

  • Grishman, R., & Sundheim, B. (1996). Message understanding conference 6: A brief history. In Proceedings of the 16th international conference on computational linguistics (COLING 1996) (pp. 466–471).

  • Grosz, B. J., Joshi, A. K., & Weinstein, S. (1983). Providing a unified account of definite noun phrases in discourse. In Proceedings of the 21st annual meeting of the association for computational linguistics (ACL 1983) (pp. 44–50).

  • Grosz, B. J., Joshi, A. K., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2), 203–225.

    Google Scholar 

  • Gupta, S., & Stent, A. J. (2005). Automatic evaluation of referring expression generation using corpora. In Proceedings of the 1st workshop on using corpora in NLG.

  • Halliday, M. A. K., & Hassan, R. (1976). Cohesion in English. London: Longaman.

    Google Scholar 

  • Heeman, P. A., & Hirst, G. (1995). Collaborating on referring expressions. Computational Linguistics, 21, 351–382.

    Google Scholar 

  • Hobbs, J. R. (1978). Resolving pronoun references. Lingua, 44, 311–338.

    Article  Google Scholar 

  • Iida, R., Kobayashi, S., & Tokunaga, T. (2010). Incorporating extra-linguistic information into reference resolution in collaborative task dialogue. In Proceedings of 48th annual meeting of the association for computational linguistics (pp. 1259–1267).

  • Janarthanam, S., & Lemon, O. (2009). Learning lexical alignment policies for generating referring expressions for spoken dialogue systems. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 74–81). Association for Computational Linguistics.

  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of European conference on machine learning (ECML 1998) (pp. 137–142).

  • Jokinen, K. (2010). Non-verbal signals for turn-taking and feedback. In Proceedings of the 7th conference on international language resources and evaluation (LREC 2010), Valletta, Malta (pp. 2961–2967). European Language Resources Association (ELRA).

  • Jordan, P. W., & Walker, M. A. (2005). Learning content selection rules for generating object descriptions in dialogue. Journal of Artificial Intelligence Research, 24, 157–194.

    Google Scholar 

  • Kameyama, M. (1998). Intrasentential centering. In Centering in discourse (pp. 89–114). Oxford University Press.

  • Kelleher, J., Costello, F., & van Genabith. J. (2005). Dynamically structuring updating and interrelating representations of visual and linguistic discourse. Artificial Intelligence, 167, 62–102.

    Article  Google Scholar 

  • Kiyokawa, S., & Nakazawa, M. (2006). Effects of reflective verbalization on insight problem solving. In Proceedings of 5th international conference of the cognitive science (pp. 137–139).

  • Kranstedt, A., Lücking, A., Pfeiffer, T., Rieser, H., & Wachsmuth, I. (2006). Deixis: How to determine demonstrated objects using a pointing cone. In Gesture in human-computer interaction and simulation (pp. 300–311). Springer.

  • Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Newbury Park, CA: Sage.

    Google Scholar 

  • Kruijff, G.-J. M., Lison, P., Benjamin, T., Jacobsson, H., Zender, H., & Kruijff-Korbayova, I. (2010). Situated dialogue processing for human-robot interaction. In Cognitive systems: Final report of the CoSy project (pp. 311–364). Springer.

  • Kudo, T., Yamamoto, K., & Matsumoto, Y. (2004). Applying conditional random fields to japanese morphological analysis. In Proceedings of the 2004 conference on empirical methods in natural language processing.

  • Kuriyama, N., Terai, A., Yasuhara, M., Tokunaga, T., Yamagishi, K., & Kusumi, T. (2009). The role of gaze agreement in collaborative problem solving. In Proceedings of the 26th annual conference of the Japanese cognitive science society (pp. 390–391) (in Japanese).

  • Mitkov, R. (2002). Anaphora resolution. London: Longman.

    Google Scholar 

  • Nakatani, C., & Hirschberg, J. (1993). A speech-first model for repair identification and correction. In Proceedings of 31th annual meeting of ACL (pp. 200–207).

  • Noguchi, M., Miyoshi, K., Tokunaga, T., Iida, R., Komachi, M., & Inui, K. (2008). Multiple purpose annotation using SLAT-Segment and link-based annotation tool. In Proceedings of 2nd linguistic annotation workshop (pp. 61–64).

  • Novak, H.-J. (1986). Generating a coherent text describing a traffic scene. In Proceedings of the 11th coference on computational linguistics (pp. 570–575).

  • Piwek, P. L. A. (2007). Modality choise for generation of referring acts. In Proceedings of the workshop on multimodal output generation (MOG 2007) (pp. 129–139).

  • Poesio, M., Cheng, H., Henschel, R., Hitzeman, J. M., Kibble, R. &, Stevenson, R. J. (2000). Specifying the parameters of centering theory: A corpus-based evaluation using text from application-oriented domains. In ACL 2000 (pp. 400–407), Hong Kong.

  • Prasov, Z., & Chai, J. Y. (2008). What’s in a gaze?: The role of eye-gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 13th international conference on intelligent user interfaces (pp. 20–29).

  • Qvarfordt, P., Beymer, D., & Zhai, S. (2005). RealTourist—A study of augmenting human–human and human–computer dialogue with eye-gaze overlay. In M. F. Costabile & F. Paternò (Eds.), Human–computer interaction-INTERACT 2005 (LNCS 3585, pp. 767–780). Springer.

  • Rehm, M., Nakano, Y., Huang, H.-H., Lipi, A. A., Yamaoka, Y., & Gruneberg, F. (2008). Creating a standardized corpus of multimodal interactions for enculturating conversational interfaces. In Workshop on enculturating conversational interfaces by socio-cultural aspects of communication (ECI 2008).

  • Schiel, F., & Mögele, H. (2008). Talking and looking: The SmartWeb multimodal interaction corpus. In E. L. R. A. (ELRA) (Ed.), Proceedings of the 6th international language resources and evaluation (LREC 2008), Marrakech, Morocco.

  • Spanger, P., Yasuhara, M., Iida, R., & Tokunaga, T. (2009a). A Japanese corpus of referring expressions used in a situated collaboration task. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 110–113).

  • Spanger, P., Yasuhara, M., Iida, R., & Tokunaga, T. (2009b). Using extra linguistic information for generating demonstrative pronouns in a situated collaboration task. In Proceedings of PreCogSci 2009: Production of referring expressions: Bridging the gap between computational and empirical approaches to reference.

  • Sternberg, R. J., & Davidson, J. E. (Eds.) (1996). The nature of insight. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Stoia, L., Shockley, D. M., Byron, D. K., & Fosler-Lussier, E. (2006). Noun phrase generation for situated dialogs. In Proceedings of the 4th international natural language generation conference (INLG 2006) (pp. 81–88).

  • Stoia, L., Shockley, D. M., Byron, D. K., & Fosler-Lussier, E. (2008). SCARE: A situated corpus with annotated referring expressions. In Proceedings of the 6th international conference on language resources and evaluation (LREC 2008) (pp. 28–30).

  • Strassel, S., Przybocki, M., Peterson, K., Song, Z., & Maeda, K. (2008). Linguistic resources and evaluation techniques for evaluation of cross-document automatic content extraction. In Proceedings of the 6th international language resources and evaluation (LREC 2008), Marrakech, Morocco.

  • Suzuki, H., Abe, K., Hiraki, K., & Miyazaki, M. (2001). Cue-readiness in insight problem-solving. In Proceedings of the 23rd annual meeting of the cognitive science society (pp. 1012–1017).

  • Tokunaga, T., Huang, C.-R., & Lee, S.Y.M. (2008). Asian language resources: The state-of-the-art. Language Resources and Evaluation, 42(2), 109–116.

    Article  Google Scholar 

  • Tokunaga, T., Iida, R., Yasuhara, M., Terai, A., Morris, D., & Belz, A. (2010). Construction of bilingual multimodal corpora of referring expressions in collaborative problem solving. In Proceedings of 8th workshop on asian language resources (pp. 38–46).

  • van Deemter, K. (2007). TUNA: Towards a unified algorithm for the generation of referring expressions. Technical report, Aberdeen University. http://www.csd.abdn.ac.uk/research/tuna/pubs/TUNA-final-report.pdf.

  • van Deemter, K., Gatt, A., van Gompel R., & Krahmer, E. (Eds.). (2009). Production of referring expressions (PRE-CogSci) 2009: Bridging the gap between computational and empirical approaches to reference.

  • van der Sluis, I., Piwek, P., Gatt, A., & Bangerter, A. (2008). Towards a balanced corpus of multimodal referring expressions in dialogue. In Proceedings of the symposium on multimodal output generation (MOG 2008).

  • Vapnik, V.N. (1998). Statistical learning theory, adaptive and learning systems for signal processing communications, and control. New york: Wiley.

    Google Scholar 

  • Viethen J., & Dale, R. (2008). The use of spatial relations in referring expression generation. In Proceesings of 5th international natural language generation conference (pp. 59–67).

  • Walker, M., M. Iida, & Cote, S. (1994). Japanese discourse and the process of centering. Computational Linguistics, 20(2), 193–232.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philipp Spanger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spanger, P., Yasuhara, M., Iida, R. et al. REX-J: Japanese referring expression corpus of situated dialogs. Lang Resources & Evaluation 46, 461–491 (2012). https://doi.org/10.1007/s10579-010-9134-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-010-9134-8

Keywords

Navigation