Abstract
Many applications handle short texts, and enableing machines to understand short texts is a big challenge. For example, in Ads selection, it is is difficult to evaluate the semantic similarity between a search query and an ad. Clearly, edit distance based string similarity does not work. Moreover, statistical methods that find latent topic models from text also fall short because ads and search queries are insufficient to provide enough statistical signals.
In this tutorial, I will talk about a knowledge empowered approach for text understanding. When the input is sparse, noisy, and ambiguous, knowledge is needed to fill the gap in understanding. I will introduce the Probase project at Microsoft Research Asia, whose goal is to enable machines to understand human communications. Probase is a universal, probabilistic taxonomy more comprehensive than any current taxonomy. It contains more than 2 million concepts, harnessed automatically from a corpus of 1.68 billion web pages and two years worth of search-log data. It enables probabilistic interpretations of search queries, document titles, ad keywords, etc. The probabilistic nature also enables it to incorporate heterogeneous information naturally. I will explain how the core taxonomy, which contains hypernym-hyponym relationships, is constructed and how it models knowledge’s inherent uncertainty, ambiguity, and inconsistency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H. (2013). Understanding Short Texts. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-37401-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37400-5
Online ISBN: 978-3-642-37401-2
eBook Packages: Computer ScienceComputer Science (R0)