Skip to main content

A Correlation-Based Semantic Model for Text Search

  • Conference paper
Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

  • 5991 Accesses

Abstract

With the exponential growth of texts on the Internet, text search is considered a crucial problem in many fields. Most of the traditional text search approaches are based on “bag of words” text representation based on frequency statics. However, these approaches ignore the semantic correlation of words in the text. So this may lead to inaccurate ranking of the search results. In this paper, we propose a new Wikipedia-based similar text search approach that the words in the texts and query text could be semantic correlated in Wikipedia. We propose a new text representation model and a new text similarity metric. Finally, the experiments on the real dataset demonstrate the high precision, recall and efficiency of our approach.

The work is partially supported by the National Natural Science Foundation of China (Nos. 61322208, 61272178, 61129002), the Doctoral Fund of Ministry of Education of China (No. 20110042110028), and the Fundamental Research Funds for the Central Universities (No. N120504001, N110804002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hotho, A., Staab, S., Stummme, G.: Wordnet inproves text doucument clustering. In: SIGIR, pp. 143–152 (2003)

    Google Scholar 

  2. Hu, X., Zhang, X., Lu, C.: Exploiting wikipedia as external knowledge for document clustering. In: KDD, pp. 389–396 (2009)

    Google Scholar 

  3. Ribeiro, B., de Arajo, N., Yates, B.: Modern information retrieval. Addison-Wesley Longman (1999)

    Google Scholar 

  4. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1957)

    Article  Google Scholar 

  5. Wang, P., Hu, J., Zeng, H.: Improving text classification by using encyclopedia knowledge. In: ICDM, pp. 332–341 (2007)

    Google Scholar 

  6. Zhu, H., Yang, X., Wang, B., Wang, Y.: Improving text search on hybrid data. In: Bao, Z., et al. (eds.) WAIM 2012 Workshops. LNCS, vol. 7419, pp. 192–203. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sun, J., Wang, B., Yang, X. (2014). A Correlation-Based Semantic Model for Text Search. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08010-9_75

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08009-3

  • Online ISBN: 978-3-319-08010-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics