Abstract
With the exponential growth of texts on the Internet, text search is considered a crucial problem in many fields. Most of the traditional text search approaches are based on “bag of words” text representation based on frequency statics. However, these approaches ignore the semantic correlation of words in the text. So this may lead to inaccurate ranking of the search results. In this paper, we propose a new Wikipedia-based similar text search approach that the words in the texts and query text could be semantic correlated in Wikipedia. We propose a new text representation model and a new text similarity metric. Finally, the experiments on the real dataset demonstrate the high precision, recall and efficiency of our approach.
The work is partially supported by the National Natural Science Foundation of China (Nos. 61322208, 61272178, 61129002), the Doctoral Fund of Ministry of Education of China (No. 20110042110028), and the Fundamental Research Funds for the Central Universities (No. N120504001, N110804002).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hotho, A., Staab, S., Stummme, G.: Wordnet inproves text doucument clustering. In: SIGIR, pp. 143–152 (2003)
Hu, X., Zhang, X., Lu, C.: Exploiting wikipedia as external knowledge for document clustering. In: KDD, pp. 389–396 (2009)
Ribeiro, B., de Arajo, N., Yates, B.: Modern information retrieval. Addison-Wesley Longman (1999)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1957)
Wang, P., Hu, J., Zeng, H.: Improving text classification by using encyclopedia knowledge. In: ICDM, pp. 332–341 (2007)
Zhu, H., Yang, X., Wang, B., Wang, Y.: Improving text search on hybrid data. In: Bao, Z., et al. (eds.) WAIM 2012 Workshops. LNCS, vol. 7419, pp. 192–203. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sun, J., Wang, B., Yang, X. (2014). A Correlation-Based Semantic Model for Text Search. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_75
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_75
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)