research-article

Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis

Authors:
David Carmel

Yahoo Labs, Haifa, Israel

Yahoo Labs, Haifa, Israel
View Profile

,
Avihai Mejer

Yahoo Lab, Haifa, Israel

Yahoo Lab, Haifa, Israel
View Profile

,
Yuval Pinter

Yahoo Labs, Haifa, Israel

Yahoo Labs, Haifa, Israel
View Profile

,
Idan Szpektor

Yahoo Labs, Haifa, Israel

Yahoo Labs, Haifa, Israel
View Profile

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementNovember 2014Pages 351–360https://doi.org/10.1145/2661829.2661901

Published:03 November 2014Publication History

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 351–360

ABSTRACT

Query term weighting is a fundamental task in information retrieval and most popular term weighting schemes are primarily based on statistical analysis of term occurrences within the document collection. In this work we study how term weighting may benefit from syntactic analysis of the corpus. Focusing on community question answering (CQA) sites, we take into account the syntactic function of the terms within CQA texts as an important factor affecting their relative importance for retrieval. We analyze a large log of web queries that landed on Yahoo Answers site, showing a strong deviation between the tendencies of different document words to appear in a landing (click-through) query given their syntactic function. To this end, we propose a novel term weighting method that makes use of the syntactic information available for each query term occurrence in the document, on top of term occurrence statistics. The relative importance of each feature is learned via a learning to rank algorithm that utilizes a click-through query log. We examine the new weighting scheme using manual evaluation based on editorial data and using automatic evaluation over the query log. Our experimental results show consistent improvement in retrieval when syntactic information is taken into account.

References

J. Allan and H. Raghavan. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of SIGIR, pages 307--314. ACM, 2002. Google ScholarDigital Library
G. Amati, V. Rijsbergen, and C. Joost. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4), Oct. 2002. Google ScholarDigital Library
J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In Proceedings of SIGIR, pages 315--322. ACM, 2009. Google ScholarDigital Library
R. Baeza-Yates. Challenges in the interaction of information retrieval and natural language processing. In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 2945, pages 445--456. Springer Berlin Heidelberg, 2004.Google ScholarCross Ref
R. A. Baeza-yates and B. A. Ribeiro-neto. Modern Information Retrieval, Second Edition. Addison-Wesley Professional, 2011. Google ScholarDigital Library
C. Barr, R. Jones, and M. Regelson. The linguistic structure of English Web-search Queries. In Proceedings of EMNLP, pages 1021--1030. ACL, 2008. Google ScholarDigital Library
C. J. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In NIPS, volume 6, pages 193--200, 2006.Google Scholar
L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community QA. In IJCNLP, volume 11, pages 273--281, 2011.Google Scholar
X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In Proceedings of CIKM, pages 265--274. ACM, 2009. Google ScholarDigital Library
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proceedings of SIGIR, pages 186--193. ACM, 2006. Google ScholarDigital Library
M. Collins, L. Ramshaw, J. Hajič, and C. Tillmann. A statistical parser for Czech. In Proceedings of ACL, pages 505--512. ACL, 1999. Google ScholarDigital Library
K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. Machine Learning, 91(2):155--187, 2013. Google ScholarDigital Library
H. Cui, R. Sun, K. Li, M.-Y. Kan, and T.-S. Chua. Question answering passage retrieval using dependency relations. In Proceedings of SIGIR, pages 400--407. ACM, 2005. Google ScholarDigital Library
M.-C. De Marneffe, B. MacCartney, C. D. Manning, et al. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, volume 6, pages 449--454, 2006.Google Scholar
P. Donmez, K. M. Svore, and C. J. Burges. On the local optimality of lambdarank. In Proceedings of SIGIR, pages 460--467. ACM, 2009. Google ScholarDigital Library
H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In Proceedings of ACL, pages 156--164, 2008.Google Scholar
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarDigital Library
J. Gao, J.-Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. In Proceedings of SIGIR, pages 170--177. ACM, 2004. Google ScholarDigital Library
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of CIKM, pages 84--90. ACM, 2005. Google ScholarDigital Library
D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proceedings of ACL, pages 423--430. ACL, 2003. Google ScholarDigital Library
V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of SIGIR '01, pages 120--127. ACM, 2001. Google ScholarDigital Library
C.-J. Lee, R.-C. Chen, S.-H. Kao, and P.-J. Cheng. A term dependency-based approach for query terms ranking. In Proceedings of CIKM, pages 1267--1276. ACM, 2009. Google ScholarDigital Library
Q. Liu, E. Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, and I. Szpektor. Predicting web searcher satisfaction with existing community-based answers. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 415--424. ACM, 2011. Google ScholarDigital Library
Q. Liu, E. Agichtein, G. Dror, Y. Maarek, and I. Szpektor. When web search fails, searchers become askers: Understanding the transition. In Proceedings of SIGIR, pages 801--810. ACM, 2012. Google ScholarDigital Library
T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval, pages 3--10, 2007.Google Scholar
Y. Lu, F. Peng, G. Mishne, X. Wei, and B. Dumoulin. Improving web search relevance with semantic features. In Proceedings of EMNLP, pages 648--657. ACL, 2009. Google ScholarDigital Library
M. Marcus, B. Santorini, and M. Marcinkiewicz. Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19(2):313--330, 1993. Google ScholarDigital Library
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proceedings of SIGIR, pages 472--479. ACM, 2005. Google ScholarDigital Library
J. W. Murdock, J. Fan, A. Lally, H. Shima, and B. Boguraev. Textual evidence gathering and analysis. IBM Journal of Research and Development, 56(3.4):8--1, 2012. Google ScholarDigital Library
J. H. Park and W. B. Croft. Query term ranking based on dependency parsing of verbose queries. In Proceedings of SIGIR, pages 829--830. ACM, 2010. Google ScholarDigital Library
J. H. Park, W. B. Croft, and D. A. Smith. A quasi-synchronous dependence model for information retrieval. In Proceedings of CIKM, pages 17--26. ACM, 2011. Google ScholarDigital Library
S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009. Google ScholarDigital Library
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, Aug. 1988. Google ScholarDigital Library
C. Shah and W. B. Croft. Evaluating high accuracy retrieval techniques. In Proceedings of SIGIR, pages 2--9. ACM, 2004. Google ScholarDigital Library
A. F. Smeaton. Using NLP or NLP resources for information retrieval tasks. In Natural language information retrieval, pages 99--111. Springer, 1999.Google ScholarCross Ref
E. M. Voorhees. Natural language processing and information retrieval. In Information Extraction: Towards Scalable, Adaptable Systems, pages 32--48. Springer-Verlag, 1999. Google ScholarDigital Library
K. Wang, Z. Ming, and T.-S. Chua. A syntactic tree matching approach to finding similar questions in community-based QA services. In Proceedings of SIGIR, pages 187--194. ACM, 2009. Google ScholarDigital Library
H. Wu, W. Wu, M. Zhou, E. Chen, L. Duan, and H.-Y. Shum. Improving search relevance for short queries in community question answering. In Proceedings of WSDM, pages 43--52. ACM, 2014. Google ScholarDigital Library
F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to rank: theory and algorithm. In Proceedings of IMCL, pages 1192--1199. ACM, 2008. Google ScholarDigital Library
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of SIGIR, pages 475--482. ACM, 2008. Google ScholarDigital Library
W. Zhang, Z. Ming, Y. Zhang, L. Nie, T. Liu, and T.-S. Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In Proceedings of Coling, pages 3105--3120, 2012.Google Scholar

Index Terms

Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Context-Aware Document Term Weighting for Ad-Hoc Search
WWW '20: Proceedings of The Web Conference 2020

Bag-of-words document representations play a fundamental role in modern search engines, but their power is limited by the shallow frequency-based term weighting scheme. This paper proposes HDCT, a context-aware document term weighting framework for ...
Read More
Intent Term Weighting in E-commerce Queries
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

E-commerce search engines can fail to retrieve results that satisfy a query's product intent because: (i) conventional retrieval approaches, such as BM25, may ignore the important terms in queries owing to their low "inverse document frequency" " (IDF), ...
Read More
A community question-answering refinement system
HT '11: Proceedings of the 22nd ACM conference on Hypertext and hypermedia

Community Question Answering (CQA) websites, which archive millions of questions and answers created by CQA users to provide a rich resource of information that is missing at web search engines and QA websites, have become increasingly popular. Web ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community question answering
dependency parsing
learning to rank
part-of-speech tagging
term weighting
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 389
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Context-Aware Document Term Weighting for Ad-Hoc Search

Intent Term Weighting in E-commerce Queries

A community question-answering refinement system