Semantic Definition Ranking

Hao, Zehui; Wang, Zhongyuan; Meng, Xiaofeng; Yan, Jun; Wang, Qiuyue

doi:10.1007/978-3-319-55699-4_10

Semantic Definition Ranking

Zehui Hao¹⁸,
Zhongyuan Wang¹⁹,
Xiaofeng Meng¹⁸,
Jun Yan¹⁹ &
…
Qiuyue Wang¹⁸

Conference paper
First Online: 22 March 2017

2490 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Abstract

Question answering has been a focus of much attention from academia and industry. Search engines have already tried to provide direct answers for question-like queries. Among these queries, “What” is one of the biggest segments. Since results excerpted from Wikipedia often have a coverage problem, some models begin to rank definitions that are extracted from web documents, including Ranking SVM and Maximum Entropy Context Model. But they only adopt syntactic features and cannot understand definitions semantically. In this paper, we propose a language model incorporating knowledge bases to learn the regularities behind good definitions. It combines recurrent neural network based language model with a process of mapping words to context-appropriate concepts. Using the knowledge learnt from neural networks, we define two semantic features to evaluate definitions, one of which is confirmed to be effective by experiments. Results show that our model improves precision a lot. Our approach has been applied in production.

This research was partially supported by the grants from the National Key Research and Development Program of China (No. 2016YFB1000603, 2016YFB1000602); the Natural Science Foundation of China (No. 61532010, 61379050, 91646203, 61532016); Specialized Research Fund for the Doctoral Program of Higher Education (No. 20130004130001), and the Fundamental Research Funds for the Central Universities, the Research Funds of Renmin University (No. 11XNL010).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The angle brackets mean a word and its concept.
2.
If there is a contradiction among annotators, they will be asked to re-annotate the definition. If different opinions still exist, another two annotators will take part, and we will adopt the label given by most annotators.
3.
https://code.google.com/archive/p/word2vec/.
4.
http://www.rnnlm.org.
5.
https://dumps.wikimedia.org/.
6.
Probase data is available at http://probase.msra.cn/dataset.aspx.

References

Boden, M.: A guide to recurrent neural networks and backpropagation. Dallas Project Sics Technical report T Sics (2001)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)
Google Scholar
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: EMNLP, pp. 1025–1035. Citeseer (2014)
Google Scholar
Chen, Y., Zhou, M., Wang, S.: Reranking answers for definitional QA using language modeling. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1081–1088. Association for Computational Linguistics (2006)
Google Scholar
Figueroa, A., Atkinson, J.: Maximum entropy context models for ranking biographical answers to open-domain definition questions. In: AAAI 2011, San Francisco, California, USA, August (2011)
Google Scholar
Hua, W., Wang, Z., Wang, H., Zheng, K., Zhou, X.: Short text understanding through lexical-semantic analysis. In: International Conference on Data Engineering (ICDE) (2015)
Google Scholar
Joachims, T.: Training linear SVMS in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226. ACM (2006)
Google Scholar
Kaisser, M., Scheible, S., Webber, B.L.: Experiments at the University of Edinburgh for the TREC 2006 QA track. In: TREC (2006)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Deoras, A., Kombrink, S., Burget, L., Cernockỳ, J.: Empirical evaluation and combination of advanced language modeling techniques. In: INTERSPEECH, pp. 605–608, no. s1 (2011)
Google Scholar
Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE (2011)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J.H., Khudanpur, S.: Extensions of recurrent neural network language model. In: 2011 IEEE International Conference on ICASSP, pp. 5528–5531. IEEE (2011)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol. 13, pp. 746–751 (2013)
Google Scholar
Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. arXiv preprint arXiv:1504.06654 (2015)
Rumelhart, D.E.: Leaning internal representations by back-propagating errors. Nature 323, 318–362 (1986)
Article Google Scholar
Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the Twenty-Second IJCAI-Volume Three, pp. 2330–2336. AAAI Press (2011)
Google Scholar
Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of ICML-11, pp. 1017–1024 (2011)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, New York (2013)
Google Scholar
Wang, Z., Zhao, K., Wang, H., Meng, X., Wen, J.R.: Query understanding through knowledge-based conceptualization. In: Proceedings of the Twenty-Fourth IJCAI (2015)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2012)
Google Scholar
Xu, J., Licuanan, A., Weischedel, R.M.: TREC 2003 QA at BBN: answering definitional questions. In: TREC, pp. 98–106 (2003)
Google Scholar
Xu, J., Cao, Y., Li, H., Zhao, M.: Ranking definitions with supervised learning methods. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 811–819. ACM (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, China
Zehui Hao, Xiaofeng Meng & Qiuyue Wang
Microsoft Research Asia, Beijing, China
Zhongyuan Wang & Jun Yan

Authors

Zehui Hao
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Meng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyue Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Meng .

Editor information

Editors and Affiliations

Arizona State University, Tempe - Phoenix, Arizona, USA
Selçuk Candan
of Science and Technology, Hong Kong University of Science and Technology, Hong Kong, China
Lei Chen
Aalborg University , Aalborg, Denmark
Torben Bach Pedersen
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, Z., Wang, Z., Meng, X., Yan, J., Wang, Q. (2017). Semantic Definition Ranking. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-55699-4_10
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55698-7
Online ISBN: 978-3-319-55699-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics