Skip to main content

Question-Answering Dialog System for Large Audiovisual Archives

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11697))

Included in the following conference series:

  • 915 Accesses

Abstract

In this paper, we present our spoken dialog system that serves as a search interface of the MALACH archive. The voice interface and natural language input allow the users to retrieve information contained in large audiovisual archives more comfortably. Especially, finding answers to a more structured question should be easier in comparison with typical search input options. The dialog is build on top of a system that automatically annotates and indexes the archive using automatic speech recognition. These indexes were searchable so far only in a full-text search for any arbitrary text query. Our proposed approach improves this system and leverages named entity recognition to create a knowledge base of semantic information contained in the recognized utterances. We describe the design of the dialog system, as well as the automatic knowledge base generation and the approach to creating queries using a spoken natural language as an input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://malach.umiacs.umd.edu/.

  2. 2.

    https://sfi.usc.edu/.

References

  1. Bordes, A., Boureau, Y.L., Weston, J.: Learning end-to-end goal-oriented dialog. In: ICLR (2017). http://arxiv.org/abs/1605.07683

  2. Choi, E., et al.: QuAC: question answering in context. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2174–2184. Association for Computational Linguistics, Brussels, Belgium (2018). https://www.aclweb.org/anthology/D18-1241

  3. Dubey, M., Dasgupta, S., Sharma, A., Höffner, K., Lehmann, J.: AskNow: a framework for natural language query formalization in SPARQL. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 300–316. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_19

    Chapter  Google Scholar 

  4. Gambino, S.L., Zerrieß, S., Schlangen, D.: Testing strategies for bridging time-to-content in spoken dialogue systems. In: Proceedings of the Ninth International Workshop on Spoken Dialogue Systems Technology, pp. 1–7 (2018)

    Google Scholar 

  5. Gurevych, I., Porzel, R., Slinko, E., Pfleger, N., Alexandersson, J., Merten, S.: Less is more: using a single knowledge representation in dialogue systems. In: Proceedings of the HLT-NAACL Workshop on Text Meaning, pp. 14–21 (2003)

    Google Scholar 

  6. Kadlec, R., Vodolan, M., Libovicky, J., Macek, J., Kleindienst, J.: Knowledge-based dialog state tracking. In: 2014 IEEE Spoken Language Technology Workshop (SLT), No. 1, pp. 348–353. IEEE, December 2014. http://ieeexplore.ieee.org/document/7078599/

  7. Lee, L.S., Glass, J., Lee, H.Y., Chan, C.A.: Spoken content retrieval - beyond cascading speech recognition with text retrieval. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1389–1420 (2015). http://ieeexplore.ieee.org/document/7114229/

    Article  Google Scholar 

  8. Lopez, V.: PowerAqua: open question answering on the semantic web. Ph.D. thesis (2011)

    Google Scholar 

  9. Neo4j, Inc: The Neo4j Cypher Manual v3.5 (2019). https://neo4j.com/docs/cypher-manual/3.5/

  10. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2015, pp. 5206–5210, August 2015

    Google Scholar 

  11. Popel, M., Žabokrtský, Z.: TectoMT: modular NLP framework. In: IceTAL, 7th International Conference on Natural Language Processing, Reykjavik, pp. 293–304 (2010). https://ufal.mff.cuni.cz/treex

    Chapter  Google Scholar 

  12. Psutka, J., Radová, V., Ircing, P., Matoušek, J., Müller, L.: USC-SFI MALACH Interviews and Transcripts Czech LDC2014S04 (2014). https://catalog.ldc.upenn.edu/LDC2014S04

  13. Ramabhadran, B., et al.: USC-SFI MALACH Interviews and Transcripts English (2012). https://catalog.ldc.upenn.edu/LDC2012S05

  14. Stanislav, P., Švec, J., Ircing, P.: An engine for online video search in large archives of the holocaust testimonies. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 08–12 September, pp. 2352–2353 (2016)

    Google Scholar 

  15. Stede, M., Schlangen, D.: Information-seeking chat: dialogue management by topic structure. In: Proceedings of the 8th Workshop on the Semantics and Pragmatics of Dialogue, pp. 117–124 (2004)

    Google Scholar 

  16. Švec, J., Ircing, P., Šmídl, L.: Semantic entity detection from multiple ASR hypotheses within the WFST framework. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings, pp. 84–89 (2013)

    Google Scholar 

  17. Švec, J., Psutka, J.V., Trmal, J., Šmídl, L., Ircing, P., Sedmidubsky, J.: On the use of grapheme models for searching in large spoken archives. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018, pp. 6259–6263, April 2018

    Google Scholar 

  18. Unger, C., Bühmann, L.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, pp. 639–648 (2012). http://dl.acm.org/citation.cfm?id=2187923

  19. Webber, J.: A programmatic introduction to Neo4j. In: Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, p. 217 (2012)

    Google Scholar 

  20. Williams, J.D., Henderson, M., Raux, A., Thomson, B., Black, A., Ramachandran, D.: The dialog state tracking challenge series. AI Mag. 35(4), 121 (2017)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the European Regional Development Fund under the project Robotics for Industry 4.0 (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000470), by the Technology Agency of the Czech Republic, project No. TE01020197 and by the grant of the University of West Bohemia, project No. SGS-2019-027.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Chýlek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chýlek, A., Šmídl, L., Švec, J. (2019). Question-Answering Dialog System for Large Audiovisual Archives. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27947-9_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27946-2

  • Online ISBN: 978-3-030-27947-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics