skip to main content
10.1145/2661829.2661901acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis

Published:03 November 2014Publication History

ABSTRACT

Query term weighting is a fundamental task in information retrieval and most popular term weighting schemes are primarily based on statistical analysis of term occurrences within the document collection. In this work we study how term weighting may benefit from syntactic analysis of the corpus. Focusing on community question answering (CQA) sites, we take into account the syntactic function of the terms within CQA texts as an important factor affecting their relative importance for retrieval. We analyze a large log of web queries that landed on Yahoo Answers site, showing a strong deviation between the tendencies of different document words to appear in a landing (click-through) query given their syntactic function. To this end, we propose a novel term weighting method that makes use of the syntactic information available for each query term occurrence in the document, on top of term occurrence statistics. The relative importance of each feature is learned via a learning to rank algorithm that utilizes a click-through query log. We examine the new weighting scheme using manual evaluation based on editorial data and using automatic evaluation over the query log. Our experimental results show consistent improvement in retrieval when syntactic information is taken into account.

References

  1. J. Allan and H. Raghavan. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of SIGIR, pages 307--314. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Amati, V. Rijsbergen, and C. Joost. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4), Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In Proceedings of SIGIR, pages 315--322. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baeza-Yates. Challenges in the interaction of information retrieval and natural language processing. In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 2945, pages 445--456. Springer Berlin Heidelberg, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. A. Baeza-yates and B. A. Ribeiro-neto. Modern Information Retrieval, Second Edition. Addison-Wesley Professional, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Barr, R. Jones, and M. Regelson. The linguistic structure of English Web-search Queries. In Proceedings of EMNLP, pages 1021--1030. ACL, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. J. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In NIPS, volume 6, pages 193--200, 2006.Google ScholarGoogle Scholar
  8. L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community QA. In IJCNLP, volume 11, pages 273--281, 2011.Google ScholarGoogle Scholar
  9. X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In Proceedings of CIKM, pages 265--274. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proceedings of SIGIR, pages 186--193. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Collins, L. Ramshaw, J. Hajič, and C. Tillmann. A statistical parser for Czech. In Proceedings of ACL, pages 505--512. ACL, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. Machine Learning, 91(2):155--187, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Cui, R. Sun, K. Li, M.-Y. Kan, and T.-S. Chua. Question answering passage retrieval using dependency relations. In Proceedings of SIGIR, pages 400--407. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M.-C. De Marneffe, B. MacCartney, C. D. Manning, et al. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, volume 6, pages 449--454, 2006.Google ScholarGoogle Scholar
  15. P. Donmez, K. M. Svore, and C. J. Burges. On the local optimality of lambdarank. In Proceedings of SIGIR, pages 460--467. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In Proceedings of ACL, pages 156--164, 2008.Google ScholarGoogle Scholar
  17. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Gao, J.-Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. In Proceedings of SIGIR, pages 170--177. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of CIKM, pages 84--90. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proceedings of ACL, pages 423--430. ACL, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of SIGIR '01, pages 120--127. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C.-J. Lee, R.-C. Chen, S.-H. Kao, and P.-J. Cheng. A term dependency-based approach for query terms ranking. In Proceedings of CIKM, pages 1267--1276. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Liu, E. Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, and I. Szpektor. Predicting web searcher satisfaction with existing community-based answers. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 415--424. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Q. Liu, E. Agichtein, G. Dror, Y. Maarek, and I. Szpektor. When web search fails, searchers become askers: Understanding the transition. In Proceedings of SIGIR, pages 801--810. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval, pages 3--10, 2007.Google ScholarGoogle Scholar
  26. Y. Lu, F. Peng, G. Mishne, X. Wei, and B. Dumoulin. Improving web search relevance with semantic features. In Proceedings of EMNLP, pages 648--657. ACL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Marcus, B. Santorini, and M. Marcinkiewicz. Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19(2):313--330, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proceedings of SIGIR, pages 472--479. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. W. Murdock, J. Fan, A. Lally, H. Shima, and B. Boguraev. Textual evidence gathering and analysis. IBM Journal of Research and Development, 56(3.4):8--1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. H. Park and W. B. Croft. Query term ranking based on dependency parsing of verbose queries. In Proceedings of SIGIR, pages 829--830. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. H. Park, W. B. Croft, and D. A. Smith. A quasi-synchronous dependence model for information retrieval. In Proceedings of CIKM, pages 17--26. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, Aug. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Shah and W. B. Croft. Evaluating high accuracy retrieval techniques. In Proceedings of SIGIR, pages 2--9. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. F. Smeaton. Using NLP or NLP resources for information retrieval tasks. In Natural language information retrieval, pages 99--111. Springer, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  36. E. M. Voorhees. Natural language processing and information retrieval. In Information Extraction: Towards Scalable, Adaptable Systems, pages 32--48. Springer-Verlag, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Wang, Z. Ming, and T.-S. Chua. A syntactic tree matching approach to finding similar questions in community-based QA services. In Proceedings of SIGIR, pages 187--194. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Wu, W. Wu, M. Zhou, E. Chen, L. Duan, and H.-Y. Shum. Improving search relevance for short queries in community question answering. In Proceedings of WSDM, pages 43--52. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to rank: theory and algorithm. In Proceedings of IMCL, pages 1192--1199. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of SIGIR, pages 475--482. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Zhang, Z. Ming, Y. Zhang, L. Nie, T. Liu, and T.-S. Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In Proceedings of Coling, pages 3105--3120, 2012.Google ScholarGoogle Scholar

Index Terms

  1. Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
      November 2014
      2152 pages
      ISBN:9781450325981
      DOI:10.1145/2661829

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader