Efficient Answer-Annotation for Frequent Questions

Zlabinger, Markus; Rekabsaz, Navid; Zlabinger, Stefan; Hanbury, Allan

doi:10.1007/978-3-030-28577-7_8

Efficient Answer-Annotation for Frequent Questions

Markus Zlabinger¹⁷,
Navid Rekabsaz¹⁸,
Stefan Zlabinger¹⁷ &
…
Allan Hanbury¹⁷

Conference paper
First Online: 03 August 2019

1097 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Abstract

Ground truth is a crucial resource for the creation of effective question-answering (Q-A) systems. When no appropriate ground truth is available, as it is often the case in domain-specific Q-A systems (e.g. customer-support, tourism) or in languages other than English, new ground truth can be created by human annotation. The annotation process in which a human annotator looks up the corresponding answer label for each question from an answer catalog (\(\textsc {Sequential}\) approach), however, is usually time-consuming and costly. In this paper, we propose a new approach, in which the annotator first manually groups questions that have the same intent as a candidate question, and then, labels the entire group in one step (\(\textsc {Group}\text {-}\textsc {Wise}\) approach). To retrieve same-intent questions effectively, we evaluate various unsupervised semantic similarity methods from recent literature, and implement the most effective one in our annotation approach. Afterwards, we compare the \(\textsc {Group}\text {-}\textsc {Wise}\) approach with the \(\textsc {Sequential}\) approach with respect to answer look-ups, annotation time, and label-quality. We show based on 500 German customer-support questions that the \(\textsc {Group}\text {-}\textsc {Wise}\) approach requires 51% fewer answer look-ups, is 41% more time-efficient, and retains the same label-quality as the \(\textsc {Sequential}\) approach. Note that the described approach is limited to Q-A systems where frequently asked questions occur.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/epfml/sent2vec.
2.
Questions are translated from German.

References

Arora, S., Liang, Y., Ma, T.: Simple but tough-to-beat baseline for sentence embeddings. In: International Conference on Learning Representations (2017)
Google Scholar
Bajaj, P., et al.: MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs], November 2016
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. arXiv:1607.04606 [cs], July 2016
Chahuara, P., Lampert, T., Gancarski, P.: Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression. arXiv:1606.03783 [cs], June 2016
Charlet, D., Damnati, G.: SimBow at SemEval-2017 task 3: soft-cosine semantic similarity between questions for community question answering. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 315–319. Association for Computational Linguistics, Vancouver, August 2017
Google Scholar
Franco-Salvador, M., Kar, S., Solorio, T., Rosso, P.: UH-PRHLT at SemEval-2016 task 3: combining lexical and semantic-based features for community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 814–821. Association for Computational Linguistics, San Diego, June 2016
Google Scholar
Goyal, N.: LearningToQuestion at SemEval 2017 task 3: ranking similar questions by learning to rank using rich features. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 310–314 (2017)
Google Scholar
Hazem, A., El Amal Boussaha, B., Hernandez, N.: MappSent: a textual mapping approach for question-to-question similarity. In: Recent Advances in Natural Language Processing Meet Deep Learning, RANLP 2017, pp. 291–300, November 2017
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)
Article MathSciNet Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Nakov, P., et al.: SemEval-2017 task 3: community question answering. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 27–48. Association for Computational Linguistics, Vancouver, August 2017
Google Scholar
Nakov, P., et al.: SemEval-2016 task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 525–545. Association for Computational Linguistics, San Diego, June 2016
Google Scholar
Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-gram features. In: Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2018 (2018)
Google Scholar
Pathak, S., Mishra, N.: Context aware restricted tourism domain question answering system. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pp. 534–539, October 2016. https://doi.org/10.1109/NGCT.2016.7877473
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs], June 2016
Wang, D.S.: A domain-specific question answering system based on ontology and question templates. In: 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 151–156, June 2010. https://doi.org/10.1109/SNPD.2010.31
Wen, J.R., Nie, J.Y., Zhang, H.J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 23 (2002)
Article Google Scholar
Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific FAQ retrieval using independent aspects. ACM Trans. Asian Lang. Inf. Process. 4(1), 17 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

TU Wien, Vienna, Austria
Markus Zlabinger, Stefan Zlabinger & Allan Hanbury
Idiap Research Institute, Martigny, Switzerland
Navid Rekabsaz

Authors

Markus Zlabinger
View author publications
You can also search for this author in PubMed Google Scholar
Navid Rekabsaz
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Zlabinger
View author publications
You can also search for this author in PubMed Google Scholar
Allan Hanbury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Zlabinger .

Editor information

Editors and Affiliations

Universita della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
University of Neuchâtel, Neuchâtel, Switzerland
Jacques Savoy
Technische Universität Wien, Vienna, Austria
Andreas Rauber
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Swiss Alliance for Data-Intensive Services, Thun, Switzerland
Gundula Heinatz Bürki
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zlabinger, M., Rekabsaz, N., Zlabinger, S., Hanbury, A. (2019). Efficient Answer-Annotation for Frequent Questions. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-28577-7_8
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics