ABSTRACT
In this paper, an analysis of word ambiguity is conducted on an Excite Search Engine query log consisting of 52,167 queries. Each query is analyzed for each term and if any interaction of terms with queries reduces ambiguity. The data supports the conjecture that merely adding additional terms to a short (five or fewer terms) query statement is insignificant in reducing the ambiguity of the terms being searched for. Specifically, it is shown that regardless of the number of terms, typically one to five words in a query, the search remains ambiguous. The average query length is 2.21 words, and two search words will be shown to provide the least ambiguous results. In addition, it will be shown that a search with at least one unambiguous word tends to produce unambiguous search results, while the opposite tends not to be true, that adding terms does not help reduce ambiguity.
- Allan, James and Raghavan, Hema. "Using Part-of-speech Patterns to Reduce Query Ambiguity." Accessed on 20 June 2007. http://www.scils.rutgers.edu/etc/mongrel/allen-raghavan-sigir.pdf.Google Scholar
- Bates, Marcia J. <u>"Where Should the Person Stop and the Information Search Interface Start?"</u> Information Processing & Management 26 (1990): 575--591. Google ScholarDigital Library
- Beckwith, Richard, Miller, George A., and Tengi, Randee. "Design and Implementation of the WordNet Lexical Database and Searching Software." 18 July 2007. http://wordnet.princeton.edu/5papers.pdf.Google Scholar
- Brin, Sergey and Page, Lawrence. The anatomy of a large-scale hypertextual web search engine. Proceedings of the 7th International World Wide Web Conference, pages 107--117, Brisbane, Australia, 1998. Google ScholarDigital Library
- Cronen-Townsend, S., and Croft, W. B. Quantifying query ambiquity. In Proceedings of Human Language Technology 2002, pages 94--98, March 2002. Google ScholarDigital Library
- Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it. Library and Information Science Research, 28(3), 407--432.Google ScholarCross Ref
- Jansen, B. J. and Pooch, U. 2001. <u>Web user studies: A review and framework for future work</u>. Journal of the American Society for Information Science and Technology. 52(3), 235--246. Google ScholarDigital Library
- Jansen, B. J. and Spink, A. 2005. How are We Searching the World Wide Web?: An Analysis of Nine Search Engine Transaction Logs. Information Processing & Management. 42(1), 248--263. Google ScholarDigital Library
- Jansen, B. J., Spink, A., and Saracevic, T. 2000. <u>Real life, real users, and real needs: A study and analysis of user queries on the web</u>. Information Processing & Management. 36(2), 207--227. Google ScholarDigital Library
- Krovetz, R. and B. Croft. 1992. Lexical Ambiguity and Information Retrieval. In: ACM Transactions on Information Systems, Vol. 10, No. 2, pp. 115--141. Google ScholarDigital Library
- Ling Charles X., Gao, Jianfeng, Zhang, Huajie, Qian, Weining, Zhang, HongJiang "Mining Generalized Query Patterns from Web Logs", Proc. of 34th Annual Hawaii Intl' Conf. on System Sciences (HICSS- -- Track 5, Maui (January 2001), Hawaii, IEEE Computer Society Press, Los Alamitos, p. 5020, 2001. Google ScholarDigital Library
- Sanderson, M. 1994. Word Sense disambiguation and information retrieval. In Proceedings of the 17th Annual International ACM-SIGR Conference on Research and Development in Information Retrieval, pages 142--151, Springer-Verlag. Google ScholarDigital Library
- Silverstein, C., Henzinger, H., Marais H. and Moricz M. "Analysis of a very large altavista query log," in SRC Technical Note 1998--014, 1998.Google Scholar
- Sparck-Jones, Karen, Robertson, Stephen E., and Sanderson, Mark. December, 2007. Ambiguous requests: implications for retrieval tests systems and theories. In SIGIR forum, Vol. 41, No 2 pp 8--17. Google ScholarDigital Library
Index Terms
- Work in progress: effects of multiple words on ambiguity in information retrieval
Recommendations
Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IVThis study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...
Joining automatic query expansion based on thesaurus and word sense disambiguation using WordNet
The selection of the most appropriate sense of an ambiguous word in a certain context is one of the main problems in Information Retrieval (IR). For this task, it is usually necessary to count on a semantic source, that is, linguistic resources like ...
The comparative effectiveness of sponsored and nonsponsored links for Web e-commerce queries
The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues ...
Comments