Skip to main content

Efficient Answer-Annotation for Frequent Questions

  • Conference paper
  • First Online:
  • 1097 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Abstract

Ground truth is a crucial resource for the creation of effective question-answering (Q-A) systems. When no appropriate ground truth is available, as it is often the case in domain-specific Q-A systems (e.g. customer-support, tourism) or in languages other than English, new ground truth can be created by human annotation. The annotation process in which a human annotator looks up the corresponding answer label for each question from an answer catalog (\(\textsc {Sequential}\) approach), however, is usually time-consuming and costly. In this paper, we propose a new approach, in which the annotator first manually groups questions that have the same intent as a candidate question, and then, labels the entire group in one step (\(\textsc {Group}\text {-}\textsc {Wise}\) approach). To retrieve same-intent questions effectively, we evaluate various unsupervised semantic similarity methods from recent literature, and implement the most effective one in our annotation approach. Afterwards, we compare the \(\textsc {Group}\text {-}\textsc {Wise}\) approach with the \(\textsc {Sequential}\) approach with respect to answer look-ups, annotation time, and label-quality. We show based on 500 German customer-support questions that the \(\textsc {Group}\text {-}\textsc {Wise}\) approach requires 51% fewer answer look-ups, is 41% more time-efficient, and retains the same label-quality as the \(\textsc {Sequential}\) approach. Note that the described approach is limited to Q-A systems where frequently asked questions occur.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/epfml/sent2vec.

  2. 2.

    Questions are translated from German.

References

  1. Arora, S., Liang, Y., Ma, T.: Simple but tough-to-beat baseline for sentence embeddings. In: International Conference on Learning Representations (2017)

    Google Scholar 

  2. Bajaj, P., et al.: MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs], November 2016

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. arXiv:1607.04606 [cs], July 2016

  4. Chahuara, P., Lampert, T., Gancarski, P.: Retrieving and Ranking Similar Questions from Question-Answer Archives Using Topic Modelling and Topic Distribution Regression. arXiv:1606.03783 [cs], June 2016

  5. Charlet, D., Damnati, G.: SimBow at SemEval-2017 task 3: soft-cosine semantic similarity between questions for community question answering. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 315–319. Association for Computational Linguistics, Vancouver, August 2017

    Google Scholar 

  6. Franco-Salvador, M., Kar, S., Solorio, T., Rosso, P.: UH-PRHLT at SemEval-2016 task 3: combining lexical and semantic-based features for community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 814–821. Association for Computational Linguistics, San Diego, June 2016

    Google Scholar 

  7. Goyal, N.: LearningToQuestion at SemEval 2017 task 3: ranking similar questions by learning to rank using rich features. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 310–314 (2017)

    Google Scholar 

  8. Hazem, A., El Amal Boussaha, B., Hernandez, N.: MappSent: a textual mapping approach for question-to-question similarity. In: Recent Advances in Natural Language Processing Meet Deep Learning, RANLP 2017, pp. 291–300, November 2017

    Google Scholar 

  9. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  10. McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)

    Article  MathSciNet  Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  12. Nakov, P., et al.: SemEval-2017 task 3: community question answering. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 27–48. Association for Computational Linguistics, Vancouver, August 2017

    Google Scholar 

  13. Nakov, P., et al.: SemEval-2016 task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 525–545. Association for Computational Linguistics, San Diego, June 2016

    Google Scholar 

  14. Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-gram features. In: Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2018 (2018)

    Google Scholar 

  15. Pathak, S., Mishra, N.: Context aware restricted tourism domain question answering system. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pp. 534–539, October 2016. https://doi.org/10.1109/NGCT.2016.7877473

  16. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs], June 2016

  17. Wang, D.S.: A domain-specific question answering system based on ontology and question templates. In: 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 151–156, June 2010. https://doi.org/10.1109/SNPD.2010.31

  18. Wen, J.R., Nie, J.Y., Zhang, H.J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 23 (2002)

    Article  Google Scholar 

  19. Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific FAQ retrieval using independent aspects. ACM Trans. Asian Lang. Inf. Process. 4(1), 17 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Zlabinger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zlabinger, M., Rekabsaz, N., Zlabinger, S., Hanbury, A. (2019). Efficient Answer-Annotation for Frequent Questions. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28577-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28576-0

  • Online ISBN: 978-3-030-28577-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics