Abstract
In this paper, we present our spoken dialog system that serves as a search interface of the MALACH archive. The voice interface and natural language input allow the users to retrieve information contained in large audiovisual archives more comfortably. Especially, finding answers to a more structured question should be easier in comparison with typical search input options. The dialog is build on top of a system that automatically annotates and indexes the archive using automatic speech recognition. These indexes were searchable so far only in a full-text search for any arbitrary text query. Our proposed approach improves this system and leverages named entity recognition to create a knowledge base of semantic information contained in the recognized utterances. We describe the design of the dialog system, as well as the automatic knowledge base generation and the approach to creating queries using a spoken natural language as an input.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bordes, A., Boureau, Y.L., Weston, J.: Learning end-to-end goal-oriented dialog. In: ICLR (2017). http://arxiv.org/abs/1605.07683
Choi, E., et al.: QuAC: question answering in context. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2174–2184. Association for Computational Linguistics, Brussels, Belgium (2018). https://www.aclweb.org/anthology/D18-1241
Dubey, M., Dasgupta, S., Sharma, A., Höffner, K., Lehmann, J.: AskNow: a framework for natural language query formalization in SPARQL. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 300–316. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_19
Gambino, S.L., Zerrieß, S., Schlangen, D.: Testing strategies for bridging time-to-content in spoken dialogue systems. In: Proceedings of the Ninth International Workshop on Spoken Dialogue Systems Technology, pp. 1–7 (2018)
Gurevych, I., Porzel, R., Slinko, E., Pfleger, N., Alexandersson, J., Merten, S.: Less is more: using a single knowledge representation in dialogue systems. In: Proceedings of the HLT-NAACL Workshop on Text Meaning, pp. 14–21 (2003)
Kadlec, R., Vodolan, M., Libovicky, J., Macek, J., Kleindienst, J.: Knowledge-based dialog state tracking. In: 2014 IEEE Spoken Language Technology Workshop (SLT), No. 1, pp. 348–353. IEEE, December 2014. http://ieeexplore.ieee.org/document/7078599/
Lee, L.S., Glass, J., Lee, H.Y., Chan, C.A.: Spoken content retrieval - beyond cascading speech recognition with text retrieval. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1389–1420 (2015). http://ieeexplore.ieee.org/document/7114229/
Lopez, V.: PowerAqua: open question answering on the semantic web. Ph.D. thesis (2011)
Neo4j, Inc: The Neo4j Cypher Manual v3.5 (2019). https://neo4j.com/docs/cypher-manual/3.5/
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2015, pp. 5206–5210, August 2015
Popel, M., Žabokrtský, Z.: TectoMT: modular NLP framework. In: IceTAL, 7th International Conference on Natural Language Processing, Reykjavik, pp. 293–304 (2010). https://ufal.mff.cuni.cz/treex
Psutka, J., Radová, V., Ircing, P., Matoušek, J., Müller, L.: USC-SFI MALACH Interviews and Transcripts Czech LDC2014S04 (2014). https://catalog.ldc.upenn.edu/LDC2014S04
Ramabhadran, B., et al.: USC-SFI MALACH Interviews and Transcripts English (2012). https://catalog.ldc.upenn.edu/LDC2012S05
Stanislav, P., Švec, J., Ircing, P.: An engine for online video search in large archives of the holocaust testimonies. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 08–12 September, pp. 2352–2353 (2016)
Stede, M., Schlangen, D.: Information-seeking chat: dialogue management by topic structure. In: Proceedings of the 8th Workshop on the Semantics and Pragmatics of Dialogue, pp. 117–124 (2004)
Švec, J., Ircing, P., Šmídl, L.: Semantic entity detection from multiple ASR hypotheses within the WFST framework. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings, pp. 84–89 (2013)
Švec, J., Psutka, J.V., Trmal, J., Šmídl, L., Ircing, P., Sedmidubsky, J.: On the use of grapheme models for searching in large spoken archives. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018, pp. 6259–6263, April 2018
Unger, C., Bühmann, L.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, pp. 639–648 (2012). http://dl.acm.org/citation.cfm?id=2187923
Webber, J.: A programmatic introduction to Neo4j. In: Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, p. 217 (2012)
Williams, J.D., Henderson, M., Raux, A., Thomson, B., Black, A., Ramachandran, D.: The dialog state tracking challenge series. AI Mag. 35(4), 121 (2017)
Acknowledgement
This work was supported by the European Regional Development Fund under the project Robotics for Industry 4.0 (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000470), by the Technology Agency of the Czech Republic, project No. TE01020197 and by the grant of the University of West Bohemia, project No. SGS-2019-027.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chýlek, A., Šmídl, L., Švec, J. (2019). Question-Answering Dialog System for Large Audiovisual Archives. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-27947-9_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)