ABSTRACT
Although web search remains an active research area, interest in enterprise search has not kept up with the information requirements of the contemporary workforce. To address these issues, this research aims to develop, implement, and study the query expansion techniques most effective at improving relevancy in enterprise search. The case-study instrument was a custom Apache Solr-based search application deployed at a medium-sized manufacturing company. It was hypothesized that a composition of techniques tailored to enterprise content and information needs would prove effective in increasing relevancy evaluation scores. Query expansion techniques leveraging entity recognition, alphanumeric term identification, and intent classification were implemented and studied using real enterprise content and query logs. They were evaluated against a set of test queries derived from relevance survey results using standard relevancy metrics such as normalized discounted cumulative gain (nDCG). Each of these modules produced meaningful and statistically significant improvements in relevancy.
- Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, David Grossman, David D. Lewis, Abdur Chowdhury, and Aleksandr Kolcz. 2005. Automatic web query classification using labeled and unlabeled training data. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 581--582. http://dl.acm.org/citation.cfm?id=1076138 Google ScholarDigital Library
- Ben Clark. 2012. Better Lucene/Solr searches with a boost from an external naive Bayes classifier | Wayfair Engineering. (Oct. 2012). http://engineering.wayfair.com/2012/10/better-lucenesolr-searches-with-a-boost-from-an-external-naive-bayes-classifier/Google Scholar
- Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita, Hinrich Schütze, and Stefan Rüd. 2014. The SMAPH system for query entity recognition and disambiguation. In ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation. ACM Press, 25--30. Google ScholarDigital Library
- Tina Costanza. 2013. Global enterprise search market to reach US$4.68bn by 2019 - Frost & Sullivan. (Jan. 2013). https://www.siliconrepublic.com/enterprise/global-enterprise-search-market-to-reach-us4-68bn-by-2019-frost-sullivanGoogle Scholar
- Brooke Cowan, Sven Zethelius, Brittany Luk, Teodora Baras, Prachi Ukarde, and Daodao Zhang. 2015. Named Entity Recognition in Travel-Related Search Queries.. In AAAI. 3935--3941. https://pdfs.semanticscholar.org/2da4/0f5dda818aea7cca17affa976735c0452cb6.pdf Google ScholarDigital Library
- Jeanette Jones. 2013. Various Survey Statistics: Workers Spend Too Much Time Searching for Information. (Nov. 2013). http://www.cottrillresearch.com/various-survey-statistics-workers-spend-too-much-time-searching-for-information/Google Scholar
- Jinyoung Kim, Xiaobing Xue, and W. Bruce Croft. 2009. A probabilistic retrieval model for semistructured data. In Advances in Information Retrieval. Springer, 228--239. http://link.springer.com/chapter/10.1007/978-3-642-00958-722 Google ScholarDigital Library
- Jin Young Kim and W. Bruce Croft. 2012. A field relevance model for structured document retrieval. In European Conference on Information Retrieval. Springer, 97--108. http://link.springer.com/chapter/10.1007/978-3-642-28997-29 Google ScholarDigital Library
- Michal Laclavik, Marek Ciglan, Alex Dorman, Stefan Dlugolinsky, Sam Steingold, and Martin Šeleng. 2014. A search based approach to entity recognition: magnetic and IISAS team at ERD challenge. In ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguation. ACM Press, 63--68. Google ScholarDigital Library
- Jason D. Rennie, Lawrence Shih, Jaime Teevan, and David R. Karger. 2003. Tackling the poor assumptions of naive bayes text classifiers. In ICML, Vol. 3. Washington DC, 616--623. http://www.aaai.org/Papers/ICML/2003/ICML03-081.pdf Google ScholarDigital Library
- Howard Wan. 2016. Query Classification for Solr. (Oct. 2016). https://www.youtube.com/watch?v=ek3ftFfhnWEGoogle Scholar
Index Terms
- Query Expansion in Enterprise Search
Recommendations
Multi-modal query expansion for web video search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalQuery expansion is an effective method to improve the usability of multimedia search. Most existing multimedia search engines are able to automatically expand a list of textual query terms based on text search techniques, which can be called textual ...
Exploiting underrepresented query aspects for automatic query expansion
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningUsers attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely to be more relevant than those that only represent some aspects. Current ...
Explorations in tag suggestion and query expansion
SSM '08: Proceedings of the 2008 ACM workshop on Search in social mediaThe query used in a search system is only an approximation to the user's true information need, and as a result, many factors can reduce the quality of search results. One is query ambiguity, causing searchers with different needs to issue the same ...
Comments