Elsevier

Expert Systems with Applications

Volume 40, Issue 17, 1 December 2013, Pages 7060-7068
Expert Systems with Applications

Evolutionary optimization for ranking how-to questions based on user-generated contents

https://doi.org/10.1016/j.eswa.2013.06.017Get rights and content

Highlights

  • We propose an evolutionary optimization model for ranking how-to questions from the web.

  • The approach combines evolutionary computation techniques and clustering methods.

  • Experiments show promising results of evolutionary optimization to generate correct HOW-TO answers.

Abstract

In this work, a new evolutionary model is proposed for ranking answers to non-factoid (how-to) questions in community question-answering platforms. The approach combines evolutionary computation techniques and clustering methods to effectively rate best answers from web-based user-generated contents, so as to generate new rankings of answers. Discovered clusters contain semantically related triplets representing question–answers pairs in terms of subject-verb-object, which is hypothesized to improve the ranking of candidate answers. Experiments were conducted using our evolutionary model and concept clustering operating on large-scale data extracted from Yahoo! Answers. Results show the promise of the approach to effectively discovering semantically similar questions and improving the ranking as compared to state-of-the-art methods.

Introduction

Traditionally, Question Answering (QA) systems aim at automatically answering natural language questions by extracting short text fragments from documents contained in a target collection (e.g., the web). On the other hand, Community QA (CQA) services are composed of members, who contribute new questions, which can be answered by other members. By making good use of this synergy, members share their knowledge so as to build a valuable massive archive (i.e., fora) of questions and answers. This rapidly growing repository yields insights and solutions to many common questions and daily problems that people may face. This collaborative paradigm has shown to be attractive to provide answers when topics are hard to address by usual search engines. Hence some users may be looking for opinions, assistance to others, etc. which are barely plausible using conventional QA systems.

Unlike some traditional QA systems, CQA platforms have became popular as a great opportunity to obtain answers generated by other users, rather than lists of text snippets or documents. Typically, CQA services (i.e., Yahoo! Answers) are organized in categories, which are selected by members when submitting a question. These categories can be used for finding contents on topics of interest. Thus, this kind of platform provides a mechanism where posted questions can receive several responses from multiple users, who contribute good questions and answers, rate the answers’ quality (i.e., positive/negative votes, thumbs-up/thumbs-down, etc.) and post comments.

Broadly speaking, research on automatic QA systems has been conducted under the umbrella of the QA track in the Text REtrieval Conference (TREC), in which systems are targeted at news articles and some specific and re-structured classes of questions (Voorhees, 2005). Despite their success, these systems differ significantly from CQA platforms: news collections are normally clean documents whereas CQA services are built on top of noisy user generated content. On the one hand, CQA platforms aim at finding members who can provide quick and short answers to askers, whereas traditional QA systems do not yield this social ability. Furthermore, CQA platforms rely on the voluntary involvement of their members, and have demonstrated to be more attractive at responding complex questions (Liu, Bian, & Agichtein, 2008), especially how-to questions (e.g., “How to make beer?”) that appear to catch much attention (Harper, Raban, Rafaeli, & Konstan, 2008).

Current QA systems are different from CQA services in different ways (Blooma & Kurian, 2011):

  • (1)

    QA systems started by processing single-sentence factual questions (e.g., “Where is London?”), but later the focus was shifted to interactive questions Dang, Lin, and Kelly (2006). CQA services, in turn, are rich in multiple-sentence questions (e.g., “I am traveling to Bellevue this summer. What are tourist attractions there?”).

  • (2)

    QA systems extract answers from documents, whereas for CQA platforms, answers are generated by users.

  • (3)

    QA systems typically operate on clean and valid documents, so the answer quality is often very high, whereas for CQA services, quality is unclear as it depends on the content contributed by members.

  • (4)

    CQA platforms are rich in meta-data (e.g., best answers selected by askers).

  • (5)

    Responses in traditional QA systems are provided immediately, whereas in CQA services, response times depend on the availability of their members.

As a consequence, the quality of posted answers in CQA services is also variable, ranging from excellent to insulting responses. Another major problem involves the low participation rate of members, causing a small number of users to answer a large number of questions. Everyday, a bunch of questions remain unanswered or done with delay. For example, in Yahoo! Answers nearly 15% of all submitted questions in English go unresolved, poorly answered, or never satisfactorily resolved (Shtok, Dror, Maarek, & Szpektor, 2012). This drawback might be due to users who simply do not want to respond questions (or they are not experts), or system saturation producing that members are unaware of new questions of their interest. In order to address these issues, informing askers that posted questions are unlikely to be resolved, offering suggestions or past answers (or forwarding questions to experts) may become practical solutions as recent research has reported that, at least, 78% of the best answers are reusable (Liu et al., 2008).

An effective CQA platform should be capable of detecting similar past questions and relevant answers, and recommending potential answerers. However, members need to narrow the lexical gap between new and past questions. Unfortunately, the question’s body provides both relevant and irrelevant contents, and ill-formed language and heterogeneous styles for answers significantly affect the quality of answers in CQA services. Hence ranking answers to complex questions plays a key role in CQA platforms. While state-of-the-art approaches to QA have been fair to deal with factoid questions, they fail to effectively provide answers to complex questions, specially considering manner (how-to) questions are a mainstream for CQA platforms (Liu et al., 2008).

This research addresses non-factoid question-answering, in particular, discovering answers to procedure (how-to) questions. To this end, a QA model combining evolutionary computation techniques and clustering methods is proposed to effectively search for best answers from web-based user-generated contents, so as to generate new rankings of answers. Our work’s main claim is that combining evolutionary computation optimization techniques and question–answer clustering may be effective to find semantically similar questions by improving the ranking of candidate answers. Specifically, genetically discovering semantic relationships by concept clustering that contain answers may significantly increase the baseline performance of a QA ranking.

Accordingly, this paper is organized as follows: Section 2 discusses some concepts and foundations for this work, Section 3 describes a novel evolutionary optimization model which uses clustering methods for ranking answers to how-to questions, Section 4 discusses the different experiments, evaluations and results by using our approach and finally, Section 5 highlights the main conclusions of this work.

Section snippets

Related work

User opinions are important supplies for CQA platforms. Many features of typical CQA services, such as the best answer to a question, are dependent on ratings cast by the community. For example, in a popular CQA platform such as Yahoo! Answers, members vote for the best answers to their questions and can also thumb up or down each individual answer. In terms of designing CQA platforms for theses scenarios, there are four key topics: question processing (Blooma et al., 2011, Blooma and Kurian,

An evolutionary model for ranking how-to answers

By using a large web-based knowledge on related questions and answers such as Yahoo! Answers, a novel QA model based on evolutionary computation optimization and clustering techniques was designed for re-ranking answers to how-to questions. Each question is represented by a Shallow Subject-Verb-Object (SSVO) triplet which is an entity grouping a set of semantically similar questions. In order to generate new rankings of candidate answers, clusters of similar triplets are automatically generated

Experiments and results

In order to assess the effectiveness of our evolutionary model for ranking how-to answers, a web-based computer prototype was implemented. Experiments were then conducted in order to investigate the extent to which our combined triplet clustering and evolutionary optimization can indeed create underlying rich semantic association between similar question so as to generate high-quality ranking of candidate answers.

Conclusions and future work

In this work, a novel evolutionary computation QA model which uses clustering methods for generating the best answers to how-to questions from user-generated contents is proposed. New representation scheme are proposed to map question–answer pairs into shallow SVO triplets which are provided to an evolutionary optimization method which iteratively searches for the best configurations of clusters along a population of candidate answers. Specifically, a genetic algorithm (GA) based optimization

References (50)

  • E. Agichtein et al.

    Finding high-quality content in social media

  • E. Alba et al.

    Evolutionary algorithms

    (2005)
  • Bian, J., Liu, Y., Agichtein, E., & Zha, H. (2008). Finding the right facts in the crowd: Factoid question answering...
  • M.J. Blooma et al.

    Quadripartite graph-based clustering of questions

  • Blooma, M. J., & Kurian, J. C., 2011. Research issues in community based question answering. In Proceedings of the 2011...
  • Blooma, M. J., & Kurian, J. C. (2012). Clustering similar questions in social question answering systems. In...
  • Y. Cao et al.

    Recommending questions using the mdl-based tree cut model

  • L. Chen et al.

    Understanding user intent in community question answering

  • Dang, H. T., Lin, J., & Kelly, D. (2006). Overview of the TREC 2006 question answering track. In Proceedings of the...
  • Derczynski, L., Wang, J., Gaizauskas, R., & Greenwood, M. (2008). A data driven approach to query expansion in question...
  • W. Fan et al.

    The effects of fitness functions on genetic programming-based ranking discovery for web search

    Journal of the American Society for Information Science and Technology

    (2004)
  • Figueroa, A., & Neumann, G. (2013). Learning to Rank effective paraphrases from query logs for community question...
  • Florez-Revuelta, F. (2007). Specific crossover and mutation operators for a grouping problem based on interaction data...
  • F.M. Harper et al.

    Facts or friends?: Distinguishing informational and conversational questions in social q& a sites

  • F.M. Harper et al.

    Predictors of answer quality in online Q&A sites

  • J. Jeon et al.

    A framework to predict the quality of answers with non-textual features

  • B.M. John et al.

    What makes a high-quality user-generated answer?

    IEEE Internet Computing

    (2011)
  • Karimzadehgan, M., Li, W., Ruofei, Z., & Mao, J. (2011). A stochastic learning-to-rank algorithm and its application to...
  • Ko, J., Wang, J., Mitamura, T., & Nyberg, E. (2007). Language-independent probabilistic answer ranking for question...
  • W.B. Langdon

    Elementary bit string mutation landscapes

  • Li, B., Liu, Y., Ram, A., Garcia, E., & Agichtein, E. (2008). Exploring question subjectivity prediction in community...
  • Li, B., Lyu, M., & King, I. (2012). Communities of Yahoo! answers and baidu zhidao: Complementing or competing? In The...
  • Q. Liu et al.

    Modeling answerer behavior in collaborative question answering systems

  • X. Liu et al.

    Finding experts in community-based question-answering services

  • Y. Liu et al.

    Predicting information seeker satisfaction in community question answering

  • Cited by (9)

    • Leveraging linguistic traits and semi-supervised learning to single out informational content across how-to community question-answering archives

      2017, Information Sciences
      Citation Excerpt :

      In short, they discovered that the existence of a user picture greatly contributes to the ranking task, and that most of top contributors are always good at only one or two questions categories. In the same spirit of [39,40], [1] proposed an answer re-ranking technique based subject-verb-object models to improve the ranking of answer candidates to how-to questions. All in all, these approaches do not taken into account the potential of different natural language processing tools, such as sentiment analysis and dependency parsing, which have proven to be instrumental in other classification tasks [7,34].

    • Genetic programming-based feature learning for question answering

      2016, Information Processing and Management
      Citation Excerpt :

      We obtained a threshold value for efficient number of retrieved documents with respect to accuracy and process time. Atkinson et al. (2013) proposed an evolutionary model for ranking answers of how-to questions in a community QA system. Their approach combines evolutionary computation techniques and clustering methods to effectively rate the best answers from web-based user-generated contents.

    • Knowledge-based question answering using the semantic embedding space

      2015, Expert Systems with Applications
      Citation Excerpt :

      As conventional QA approaches, when given a question statement, IR-based QA systems [14,18,30,32] try to retrieve, extract, and assemble answer information for a given question based on a large volume of unstructured data such as Wikipedia, before generating an answer statement. Similarly, community-based question answering (CQA) systems [1,15] aim to find answers from past question–answer pairs in online social sites such as Yahoo! Answers.3 Recently, researchers have developed open-domain systems based on large-scale KBs such as Freebase.

    • Category-specific models for ranking effective paraphrases in community Question Answering

      2014, Expert Systems with Applications
      Citation Excerpt :

      Conversely, types have no impact on the number of answers. In general, advice questions appear to catch the most and best attention of cQA members, causing the emergence of new methods targeting exclusively at this particular type of question (Atkinson, Figueroa, & Andrade, 2013; Surdeanu, Ciaramita, & Zaragoza, 2008, 2011; Zhou, Lan, Niu, & Lu, 2012). The idea behind (Shtok et al., 2012) is reusing resolved questions for estimating the probability of new questions to be answered by past best answers.

    • Text analytics: An introduction to the science and applications of unstructured information analysis

      2022, Text Analytics: An Introduction to the Science and Applications of Unstructured Information Analysis
    View all citing articles on Scopus

    This research was partially supported by FONDECYT, Chile under Grant number 1130035: “An Evolutionary Computation Approach to Natural-Language Chunking for Biological Text Mining Applications”.

    View full text