Skip to main content
Log in

MEDIA: a semantically annotated corpus of task oriented dialogs in French

Results of the French media evaluation campaign

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The aim of the French Media project was to define a protocol for the evaluation of speech understanding modules for dialog systems. Accordingly, a corpus of 1,257 real spoken dialogs related to hotel reservation and tourist information was recorded, transcribed and semantically annotated, and a semantic attribute-value representation was defined in which each conceptual relationship was represented by the names of the attributes. Two semantic annotation levels are distinguished in this approach. At the first level, each utterance is considered separately and the annotation represents the meaning of the statement without taking into account the dialog context. The second level of annotation then corresponds to the interpretation of the meaning of the statement by taking into account the dialog context; in this way a semantic representation of the dialog context is defined. This paper discusses the data collection, the detailed definition of both annotation levels, and the annotation scheme. Then the paper comments on both evaluation campaigns which were carried out during the project and discusses some results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. http://www.limsi.fr/Individu/hbm/.

  2. http://www.cs.rochester.edu/research/speech/damsl/RevisedManual/.

  3. http://www.limsi.fr/Individu/hbm/.

  4. Exception: indefinite alterity expressions (e.g. another N) are annotated. In this case, the excluded entity has been annotated instead of the actual referent, which is undetermined. This is observed in turn C 16 of the dialog given in the Appendix.

  5. http://catalog.elra.info/product_info.php?products_id=998&language=fr.

References

  • Allemandou, J. (2007). SIMDIAL, un paradigme d’évaluation automatique de systèmes de dialogue homme-machine par simulation déterministe d’utilisateurs. Ph.D. thesis, Université Paris XI, Orsay.

  • Barras C., Geoffrois E., et al. (2001). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication, 33(1–2), 5–22.

    Article  Google Scholar 

  • Bonneau-Maynard, H., Ayache, C., Bechet, F., et al. (2006). Results of the French Evalda-Media evaluation campaign for literal understanding. In Proceedings of the international conference on language resources and evaluation (LREC), Genoa (pp. 2054–2059).

  • Bonneau-Maynard, H., Devillers, L., & Rosset, S. (2000). Predictive performance of dialog systems. In Proceedings of the international conference on language resources and evaluation (LREC), Athens. (pp. 177–181).

  • Bonneau-Maynard, H., & Lefevre, F. (2005). A 2+1-level stochastic understanding model. In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU), San Juan (pp. 256–261).

  • Bonneau-Maynard, H., & Rosset, S. (2003). Semantic representation for spoken dialog. In Proceedings of the European conference on speech communication and technology (Eurospeech), Geneva (pp. 253–256).

  • Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistics. Computational Linguistics, 2(22), 249–254.

    Google Scholar 

  • Chinchor, N., & Hirschmann, L. (1997). MUC-7 coreference task definition (version 3.0). In Proceedings of message understanding conference (MUC-7).

  • Denis, A. (2008). Robustesse dans les systèmes de dialogue finalisés: Modélisation et évaluation du processus d’ancrage pour la gestion de l’incompréhension. Ph.D. thesis, Université Henri Poincaré, Nancy.

  • Denis, A., Béchet, F., & Quignard, M. (2007). Résolution de la référence dans des dialogues homme-machine : évaluation sur corpus de deux approches symbolique et probabiliste. In: Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Toulouse (pp. 261–270).

  • Denis, A., Quignard, M., & Pitel, G. (2006). A deep-parsing approach to natural language understanding in dialogue system: Results of a corpus-based evaluation. In Proceedings of the international conference on language resources and evaluation (LREC) (pp. 339–344).

  • Devillers, L., Bonneau-Maynard, H., et al. (2003). The PEACE SLDS understanding evaluation paradigm of the French MEDIA campaign. In EACL workshop on evaluation initiatives in natural language processing, Budapest (pp. 11–18).

  • FIPA. (2002). Communicative act library specification. Technical report SC00037J. Foundations for Intelligent Physical Agents, http://www.fipa.org/specs/fipa00037/.

  • Fiscus, J. (1997). A post-processing system to yield reduced word error rates: Recogniser output voting error reduction (ROVER). In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU), Santa Barbara, CA (pp. 347–352).

  • Giachim, E., & McGlashan, S. (1997). Spoken language dialog systems. In S. Young & G. Bloothooft (Eds.), Corpus based methods in language and speech processing (pp. 69–117). Dordrecht: Kluwer.

  • Gibbon, D., Moore, P., & Winski, R. (1997). Handbook of standards and resources for spoken language resources. New York: Mouton de Gruyter.

    Google Scholar 

  • Hirschman, L. (1992). Multi-site data collection for a spoken language corpus. In Proceedings of the DARPA speech and natural language Workshop (pp. 7–14).

  • King, M., Maegaard, B., Schutz, J., et al. (1996). EAGLES—evaluation of natural language processing systems. Technical report EAG-EWG-PR.2, Centre for Language Technology, University of Copenhagen.

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning (ICML), Williamstown, MA (pp. 282–289).

  • Lamel, L., Rosset, S., et al. (1999). The LIMSI ARISE system for train travel information. In IEEE conference on acoustics, speech, and signal processing (pp. 501–504).

  • Lefévre, F., & Bonneau-Maynard, H. (2002). Issues in the development of a stochastic speech understanding system. In Proceedings of the international conference on spoken language processing (ICSLP), Denver (pp. 365–368).

  • Popescu-Belis, A., Rigouste, L., Salmon-Alt, S., & Romary, L. (2004). Online evaluation of coreference resolution. In Proceedings of the international conference on language resources and evaluation (LREC), Lisbon. (pp. 1507–1510).

  • Raymond, C., Béchet, F., De Mori, R., & Damnati, G. (2006). On the use of finite state transducers for semantic interpretation. Speech Communication, 48(3–4), 288–304.

    Article  Google Scholar 

  • Rosset, S., & Tribout, D. (2005). Multi-level information and automatic dialog acts detection in human–human spoken dialogs’. In Proceedings of ISCA InterSpeech 2005, Lisbon (pp. 2789–2792).

  • Salmon-Alt, S. (2001). Référence et Dialogue finalisé : de la linguistique à un modéle opérationnel. Ph.D. thesis, Université Henri Poincaré, Nancy.

  • Salmon-Alt, S., & Romary, L. (2004). Towards a reference annotation framework. In Proceedings of the international conference on language resources and evaluation (LREC), Lisbon.

  • van Deemter, K., & Kibble, R. (2000). On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics, 26(4):629–637.

    Article  Google Scholar 

  • Vanderveken, D. (1990). Meaning and speech acts. Cambridge: Cambridge University Press.

    Google Scholar 

  • Villaneau, J., Antoine, J.-Y., & Ridoux, O. (2004). Logical approach to natural language understanding in a spoken dialogue system. In Proceedings of the 7th international conference on text, speech and dialogue (TSD), Brno (pp. 637–644).

  • Walker, M., Litman, D., et al. (1998). Evaluating spoken cialogue agents with PARADISE: 2 Cases studies. Computer Speech and Language, 3(12), 317–347.

    Article  Google Scholar 

  • Walker, M., Passonneau, R., & Boland, J. (2001). Quantitative and qualitative evaluation of Darpa communicator sopken dialog systems. In Proceedings of the annual meeting of the association for computational linguistics (ACL), Toulouse (pp. 515–522).

  • Walker, M., Rudnicky, A., et al. (2002). Darpa communicator: cross-system results for the 2001 evaluation. In Proceedings of the international conference on spoken language processing (ICSLP), Denver (pp. 269–272).

Download references

Acknowledgments

Thanks to Christelle Ayache, Frédéric Béchet, Laurence Devillers, Anne Kuhn, Fabrice Lefévre, Djamel Mostefa, Sophie Rosset and Jeanne Villaneau for their participation in the project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hélène Bonneau-Maynard.

Appendix

Appendix

We give a full annotated dialog (#1037) from the media corpus, where W is the wizard, and C the client. Below each utterance the sequence of segments with their corresponding contextual annotation is given. The segment numbers (1–85) may be referred to, for referring expression annotation.

W 1

“...quelle information désirez-vous

“...which information would you like

C 2

je voudrais faire une réservation pour le trente et un mai deux jours deux nuits à Paris mais dans un hôtel qui se trouverait près de la place de la Bastille s’ il vous plaît pour six chambres individuelles

“I’d like to book for may the 31st 2 days 2 nights in Paris but in an hotel which is near place de la Bastille please six single

1

je voudrais faire une réservation

+/command-task: reservation

2

pour le trente et un mai

+/time-date: 05/31

3

deux jours deux nuits

+/stay-nbNight-reservation: 2

4

à Paris

+/location-city: paris

5

mais

+/connectProp: addition

6

dans un hôtel

+/DBObject: hotel

7

près de

+/location-relativeDistance-hotel: near

8

la place de la Bastille

+/location-street-hotel: bastille

9

pour six

+/number-room-reservation: 6

10

chambres individuelles

+/room-type: single

W 3

vous souhaitez faire une réservation à Paris

You’d like to book in Paris

C 4

oui {yes}”

11

oui

+/response: yes

W 5

près de la place de la Bastille

near the place de la Bastille

C 6

oui madame {yes}

12

oui

+/response: yes

W 7

veuillez patienter je recherche vos informations

please wait I’m looking for you information

C 8

merci bien {thanks}

W 9

à Paris je vous propose trois hôtels le Méridien Bastille la chambre est à soixante euros l’ athanor hôtel la chambre est à quatre-vingt-cinq euros l’ hôtel Richard Lenoir la chambre est à cinquante-cinq euros voulez-vous réserver dans l’ un de ces hôtels ou obtenir plus d’ informations

in Paris I propose you 3 hotels the Bastille Méridien the room is 60 euros the Athanor hotel the room is 85 euros the Richard Lenoir hotel the room is 55 euros do you want to book in one of those hotels or ask for more information

13

à Paris

+/location-city-hotel: paris

14

le Méridien

+/hotel-trademark: Méridien

15

Bastille

+/name-hotel: bastille

16

soixante

+/payment-amount-integer-room: 60

17

euros

+/payment-unit: euro

18

l’ athanor hôtel

+/name-hotel: athanor

19

quatre-vingt-cinq

+/payment-amount-integer-room: 85

20

euros

+/payment-unit: euro

21

l’ hôtel Richard Lenoir

+/name-hotel: richard lenoir

22

cinquante-cinq

+/payment-amount-integer-room: 55

23

euros

+/payment-unit: euro

C 10

je veux dire je voudrais savoir si les chambres que je vais réserver les chambres six chambres individuelles donnent sur une cour et est-ce qu’ il y a un parking privé

I mean I’d like to know if the rooms I’m going to book the rooms six single rooms overlook a courtyard and if there is a private parking

24

les

+/refLink-coRef: plural

reference="13,14,15,16,17; 13,18,19,20; 13,21,22,23"

25

chambres

+/object: room

26

que je vais réserver

+/command-task: reservation

27

les

+/refLink-coRef: plural

reference="13,14,15,16,17; 13,18,19,20; 13,21,22,23"

28

chambres

+/object: room

29

six

+/number-room-reservation: 6

30

chambres individuelles

+/room-type: single

31

donnent sur

?/location-relativeDistance-hotel: near

32

une cour

?/location-relativePlace-general-hotel: unknown

33

et

+/connectProp: addition

34

un parking privé

?/hotel-parking: private

W 11

veuillez patienter je recherche cette information je vous propose l’ hôtel Richard Lenoir cet hôtel se situe dans un endroit calme près de la place de la Bastille l’ hôtel est équipé d’ un parking privé surveillé souhaitez-vous faire une réservation dans cet hôtel

please wait I’m looking for your information I propose the Richard Lenoir hotel this hotel is located in a quiet place near the place de la Bastille and has got a private parking do you want to book in this hotel

C 12

euh j(e) il y a le parking privé mais c’est un hôtel vous me dites qui est très calme donc il ne donne pas sur une cour il donne sur un boulevard ou pouvez-vous me le situer s’ il vous plaît

euh I there is a private parking but you tell me it is a very quiet hotel so it does not overlook a courtyard or can you locate it for me please

35

le parking privé

+/hotel-parking: private

36

mais

+/connectProp: opposition

37

c’est

+/refLink-coRef: singular

reference="13,21"

38

un hôtel

+/DBObject: hotel

39

très calme

-/location-relativePlace-general-hotel: livelyDistrict

40

donc

+/connectProp: implies

41

il

+/refLink-coRef: singular

reference="13,21"

42

donne pas sur

-/location-relativeDistance-hotel: near

43

une cour

-/location-relativePlace-general-hotel: unknown

44

il

+/refLink-coRef: singular

reference="13,21"

45

donne sur

?/location-relativeDistance-hotel: near

46

un boulevard

?/location-relativePlace-general-hotel: unknown

47

le

+/refLink-coRef: singular

reference="13,21"

48

situer

?/object: location-hotel

W 13

je suis désolée je n’ ai pas ce type d’ informations

Sorry I don’t have that kind of information

C 14

bon ben écoutez je vais réserver dans cet hôtel hôtel Richard Lenoir donc six chambres individuelles pour le trente et un mai deux jours et deux nuits hein

well listen I’ll book in this hotel hotel Richard Lenoir so 6 single rooms on the 31st of may 2 days and 2 nights OK

49

je vais réserver

+/command-task: reservation

50

dans cet hôtel hôtel Richard Lenoir

+/name-hotel: richard lenoir

51

six

+/number-room-reservation: 6

52

chambres individuelles

+/room-type: single

53

pour le trente et un mai

+/time-date-reservation: 05/31

54

deux jours et deux nuits

+/stay-nbNight-reservation: 2

W 15

merci de patienter je vérifie les disponibilités cet hôtel est complet il n’ y a plus de chambres libres correspondant à vos critères souhaitez-vous changer de dates ou réserver dans un autre hôtel

please wait I’m checking for the availability this hotel is full there is no more free room corresponding to your choices do you wish to change the date or book in another hotel

C 16

alors je réserve dans un autre hôtel qui a les mêmes critères hein

so I book in another hotel with the same conditions OK

55

je réserve

+/command-task: reservation

56

un

+/number-hotel: 1

57

autre

+/refLink-coDom-exclusion: singular

reference="13,21"

58

hôtel

+/DBObject: hotel

59

les mêmes critères

+/object: undetermined

W 17

merci de patienter je vous propose le Méridien Bastille la chambre est à soixante euros souhaitez-vous faire une réservation dans cet hôtel

please wait I propose the Méridien Bastille the room is 60 euros do you wish to book in this hotel

C 18

mais écoutez je vais faire la réservation dans cet hôtel il y a bien un parking privé et ça donne s() est-ce que ça donne sur une cour ou sur une rue tranquille

but listen I will book in this hotel there is indeed a private parking and it overlooks does it overlook a courtyard or a quiet road

60

je vais faire la réservation

+/command-task: reservation

61

cet

+/refLink-coRef: singular

reference="13,14,15"

62

hôtel

+/DBObject: hotel

63

il y a bien

+/command-dial: confirmation-request

64

un parking privé

?/hotel-parking: private

65

et

+/connectProp: addition

66

donne sur

?/location-relativeDistance-hotel: near

67

une cour

?/location-relativePlace-general-hotel: unknown

68

ou

+/connectProp: alternative

69

sur

?/location-relativeDistance-hotel: near

70

une rue tranquille

?/location-relativePlace-general-hotel: livelyDistrict

W 19

cet hôtel se situe dans un endroit calme près de la place de la Bastille l’ hôtel est équipé d’ un parking privé souhaitez-vous faire une réservation dans cet hôtel

this hotel is located in a quiet place near the place de la Batille the hotel has got a private parking do you want to book in this hotel

C 20

ben écoutez je vais faire une réservation dans cet hôtel pour six chambres individuelles hein

well listen I’me going to book in this hotel 6 single OK

71

je vais faire une réservation

+/command-task: reservation

72

cet

+/refLink-coRef: singular

reference="13,14,15"

73

hôtel

+/DBObject: hotel

74

pour six

+/number-room-reservation: 6

75

chambres individuelles

+/room-type: single

W 21

j’ effectue votre réservation le montant de votre séjour s’ élève à sept cent vingt euros le numéro de dossier correspondant est le zéro soixante-neuf cent quatre-vingts désirez-vous une autre information

I’m doing your reservation the amount of you stay will be 720 euros the file number is 069180 would you like another information

C 22

oui euh j’ aimerais savoir est-ce que le petit déjeuner est compris dans la réservation enfin de la réservation dans le prix de la chambre

euh yes I’d like to know if breakfast is included in the reservation well the reservation the price for the room

76

oui

+/response: oui

77

le petit déjeuner est compris

?/hotel-services : breakfastInclude

78

dans la réservation

+/command-task: reservation

79

le

+/refLink-coRef: singular

reference="16,17"

80

prix

+/object: payment-amount-reservation-room

81

la

+/refLink-coRef: singular

reference "13,14,15,16,17,10"

82

chambre

?/object: room

W 23

il vous sera demandé cinq euros supplémentaires pour une formule petit déjeuner

breakfast is 5 euros more

C 24

bon ben écoutez je vous remercie de tous ces renseignements donc je confirme et je réserve

well listen I thank you for this information so I confirm and I book

83

je confirme

+/command-dial: confirmation-notice

84

et

+/connectProp: addition

85

je réserve

+/command-task: reservation

W 25

merci d’ avoir utilisé le serveur vocal MEDIA au revoir

thanks to have called the MEDIA vocal server goodbye

C 26

au revoir madame et à bientôt au revoir

goodbye madam and see you soon goodbye

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonneau-Maynard, H., Quignard, M. & Denis, A. MEDIA: a semantically annotated corpus of task oriented dialogs in French. Lang Resources & Evaluation 43, 329–354 (2009). https://doi.org/10.1007/s10579-009-9103-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9103-2

Keywords

Navigation