skip to main content
10.1145/1352793.1352854acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

Categorizing and ranking search engine's results by semantic similarity

Published: 31 January 2008 Publication History

Abstract

An automatic method for text categorizing and ranking search engine's results by semantic similarity is proposed in this paper. We first obtain nouns and verbs from snippets obtained from search engine using Name Entity Recognition and part-of speech. A semantic similarity algorithm based on WordNet is proposed to calculate the similarity of each snippet to each of the pre-defined categories. A balanced similarity ranking method combined with Google's rank and timeliness of the pages is proposed to rank these snippets. Preliminary experiments with 500 labeled questions from TREC03 show that 72.7% are correctly categorized.

References

[1]
Google. http://www.google.com, 2007.
[2]
WordNet. http://wordnet.princeton.edu/, 2007.
[3]
500 labeled TREC 03 question set. http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/TREC_10.label, 2007.
[4]
UIUC. http://www.cs.uiuc.edu/, 2007.
[5]
Mihalcea, R., Corley, C., and Strapparava C. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of AAAI'06, 2006.
[6]
Lapata, M., and Barzilay, R. Automatic evaluation of text coherence: models and representations. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, 2005.
[7]
Joachims, T. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization, In Proceedings of the Fourteenth International Conference on Machine Learning, 1997.
[8]
Salton, G., and Buckley, C. Term weighting approaches in automatic text retrieval. In Readings in Information Retrieval. San Francisco, CA: Morgan Kaufmann Publishers, 1997.
[9]
Wu, Z., and Palmer, M. Verb semantics and lexical selection. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1994.
[10]
Leacock, C., and Chodorow, M. Combining local context and WordNet sense similarity for word sense identification, In WordNet, An Electronic Lexical Database. The MIT Press, 1998.
[11]
Turney, P. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning, 2001.
[12]
Lin, C., and Hovy, E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Conference, 2003.
[13]
Landauer, T. K., Foltz, P., and Laham, D. Introduction to latent semantic analysis. Discourse Processes 25, 1998.
[14]
Salton, G., and Lesk, M. Computer evaluation of indexing and text processing. Prentice Hall, Ing. Englewood Cliffs, New Jersey, 1971.
[15]
Rocchio, J. Relevance feedback in information retrieval. Prentice Hall, Ing. Englewood Cliffs, New Jersey, 1997.
[16]
Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings of 17th Ann Int ACM SI- GIR Conference on Research and Development in Information Retrieval, SIGIR'94, 1994.
[17]
Hao, T. Y., Hu, D. W., Liu, W. Y., and Zeng, Q. T. Semantic patterns for user-interactive question answering, Concurrency and Computation: Practice and Experience, vol. 20, 1--17, 2007.
[18]
Stop word list. http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words, 2007.
[19]
TreeTagger. http://www.ims.unistuttgart.de/projekte/corplex/TreeTagger/, 2007.
[20]
Semantic similarity based on WordNet. http://www.codeproject.com/KB/string/semanticsimilaritywordnet.aspx, 2007.
[21]
Page, L., Brin, S., Motwani, R. and Winograd, T. The pagerank citation ranking: bringing order to the Web. Stanford Digital Libraries Working Paper, 1998.
[22]
Javatools. http://www.mpiinf.mpg.ed/~suchanek/downloads/javatools/, 2007.
[23]
Definition of topic list. http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/definition.html, 2007.

Cited By

View all
  • (2014)Clustering clinical trials with similar eligibility criteria featuresJournal of Biomedical Informatics10.1016/j.jbi.2014.01.00952:C(112-120)Online publication date: 1-Dec-2014
  • (2012)Semantic-Based Composite Document RankingProceedings of the 2012 IEEE Sixth International Conference on Semantic Computing10.1109/ICSC.2012.28(126-129)Online publication date: 19-Sep-2012
  • (2008)Wiki trust metrics based on phrasal analysisProceedings of the 4th International Symposium on Wikis10.1145/1822258.1822291(1-10)Online publication date: 8-Sep-2008

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICUIMC '08: Proceedings of the 2nd international conference on Ubiquitous information management and communication
January 2008
604 pages
ISBN:9781595939937
DOI:10.1145/1352793
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. categorizing
  2. ranking
  3. search engine
  4. semantic similarity

Qualifiers

  • Research-article

Funding Sources

Conference

ICUIMC08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 251 of 941 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Clustering clinical trials with similar eligibility criteria featuresJournal of Biomedical Informatics10.1016/j.jbi.2014.01.00952:C(112-120)Online publication date: 1-Dec-2014
  • (2012)Semantic-Based Composite Document RankingProceedings of the 2012 IEEE Sixth International Conference on Semantic Computing10.1109/ICSC.2012.28(126-129)Online publication date: 19-Sep-2012
  • (2008)Wiki trust metrics based on phrasal analysisProceedings of the 4th International Symposium on Wikis10.1145/1822258.1822291(1-10)Online publication date: 8-Sep-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media