A knowledge-based question answering system for B2C eCommerce

doi:10.1016/j.knosys.2008.04.005

Knowledge-Based Systems

Volume 21, Issue 8, December 2008, Pages 946-950

https://doi.org/10.1016/j.knosys.2008.04.005 Get rights and content

Abstract

The evolution of Business-to-Consumer (B2C) eCommerce has been formed through various generations. Last models of B2C eCommerce are comparative shopping systems that connect to multiple vendors’ databases and collect the information requested by the user. The comparative result obtained is then displayed in a tabular format in the user’s browser. Although this scenario is much better than the multiple manual site comparisons, user still needs to face inconsistent user interfaces when he is linked from the comparison site to the actual purchasing site for shopping. Therefore, user has to learn logics of each site’s user interface. In this paper, we propose a question answering system based on natural language processing techniques for retail (B2C) in eCommerce. This system gets a question in natural language formats, decomposes it to keywords, and extracts constraints automatically. Corresponding answers are then retrieved from the vendors’ Web sites by exploiting the question constraints.

Introduction

The eCommerce began with the introduction of EDI between companies, and ATMs for banking [1], [2]. Introduction of the Web Browsers opened up a new age by combining open internet and easy user interface approaches [1], [2].

B2C ordinarily refer to online trading and auctions, for example, online stock trading markets, online auction for computers and other goods. B2C eCommerce refers to the emerging commerce model where businesses/companies and consumers interact electronically or digitally in some way. One of the best examples of B2C eCommerce is Amazon.com, an online bookstore that launched its site in 1995. In a B2C eCommerce the focus is more about enticing prospects and converting them into customers, retaining them and share value created during the process. The ultimate goal is the conversion of shoppers into buyers as aggressively and consistently as possible.

In a typical B2C flow of information between business and consumer typically is through the medium of Internet. This flow includes product orders/service requests from customers and product information, specifications, providing of services by Business.

B2C eCommerce is the predominant commercial experience of Web users. A typical scenario involves a user’s visiting one or several online shops, browsing their offers, selecting and ordering products. Ideally, a user would collect information about price, terms, and conditions (such as availability) of all or at least all major, online shops and then proceed to select the best offer. But manual browsing is too time-consuming to be conducted on this scale. Typically a user will visit one or a very few online stores before making a decision.

However, the evolution of B2C eCommerce has been formed through various generations. Last models of B2C eCommerce are comparative shopping catalogs. Models such as pricescan.com [3] that visit several shops, extract product and price information, and compile a market overview. The comparative result obtained is then displayed in a tabular format in the user’s browser. This approach suffers from several drawbacks. First, it’s necessary for these models to get access grant from vendors before to access their databases for retrieving any information. Since some vendors may not give access grant to their databases, their product information will not appear in the information provided by these models. We have proposed a knowledge-based approach to resolve this problem in [15]. In this approach, products and price information are understood and extracted from Web pages of vendors’ sites to build virtual catalog directly. Second, user still needs to face inconsistent user interfaces when he is linked from the comparison site to the actual purchasing site for shopping. Therefore, user has to learn logics of each site’s user interface. For example, user has to analyze his question into some keywords based on logics of user interface and give them to the system. It means that there is not such a possibility that user can ask his question in form of natural language (such as English) and get his answer. We can say that using keywords based on logics of third generation system’s user interfaces is not a good way to establish relationship between user and system [4]. Because at first a user is not interested to extract keywords of his question or maybe he is unable to do so. On the other hand, usually a few keywords cannot cover the complete meaning of user’s question. In most cases, users are searching clear responses for their questions, while the outputs of third generation systems are collection of answers related to user’s question that probably they contain the correct answer.

In recent years, Question Answering (QA) systems have evolved out of the field of Information Retrieval to meet better the needs of information seekers. Unlike simple keyword-based information retrieval systems, they aim to communicate directly with users through a natural language. They accept natural language questions and return exact answers eliminating the burden of query formulation and reading lots of irrelevant documents to attain the answer. Open-domain QA systems deal with unrestricted questions upon large-scale text corpora typically by means of statistical approaches whereas restricted-domain systems endeavor to concentrate on a controlled domain of interest (e.g. weather forecast or UNIX technical manuals). MELISA [5] is a good example for restricted-domain QA systems.

In this paper, we propose a QA system for B2C eCommerce. Now, this system can answer all questions in domain of digital camera while it can be developed for any retail domains. This system exploits an initial knowledge base which makes some advantages in contrast with Open-domain QA systems (i.e. systems do not have any specific domain knowledge [6], [7], [8]).

We present the details of our approach in the remainder of the paper as follows. After a short overview of the related work in Section 2, Section 3 describes the system architecture. Section 4 explains how we define an initial knowledge’s concepts, relations, and instances. In Section 5, we describe our approach to analyze the NL questions. In Section 6, we report the experiments we conducted involving digital camera advertisements on the Web. Finally, Section 7 presents the conclusion of this work.

Section snippets

Related work

Halo [9] is one of the most ambitious recent investments in knowledge-based question answering systems, “a staged, long-term research and development initiative toward the development of a ‘Digital Aristotle’ capable of answering novel questions and solving advanced problems in a broad range of scientific disciplines.” In the pilot phase of the project the state of the art in knowledge representation and reasoning was applied for a limited syllabus in chemistry with promising results [10].

System architecture

In the proposed system, there is an agent makes possibility of natural language negotiation with user. This agent analyses the user NL questions and extracts the keywords and conditions of the questions. In the next step, extracted keywords are given to another agent called web crawler to search and retrieve the related pages which include same keywords. Retrieved pages are then passed to information extraction agent that extracts user’s exact answers using questions’ keywords and conditions.

Knowledge extraction

Knowledge is defined as concepts, their relationships, and concepts instances of specific domain. Concepts and relationships are identified and defined by domain experts. When we apply the knowledge to a Web page, the objects and relationships are identified and associated with concepts and relationships in the knowledge’s conceptual-model. Thus the strings on a Web page are recognized and understood in terms of the answers.

Fig. 2 shows partial knowledge extraction for digital cameras

Question analysis

In proposed system, there is a possibility for user to ask his question about sellers and products in natural language (i.e. English). In Fig. 3, there are some typical user questions about digital cameras.

User negotiation agent must make the NL questions machine understandable. It uses a question analyzer component for this job. This component analyses the user NL questions and extracts the keywords and conditions of the questions. Extracted keywords are given to another agent called web

Preliminary experiments

This section explains our experiments conducted to verify the validity of our approach. First we describe the process in which the underlying knowledge was created and implemented. Then we present the evaluation of our proposed approach.

Conclusion

In this paper we reported on a knowledge-based domain-specific question answering system for B2C eCommerce. Although the problem has been studied by several researchers, existing techniques are limited to specific heuristics and databases. An effective method is proposed to decompose the user’s NL questions and extract the keywords and conditions automatically. In the next step, we will be working on developing our system to cover all formats of the users’ questions.

We believe as the

References (17)

C.S. Lee et al.
Automated ontology construction for unstructured text documents
Journal of Data & Knowledge Engineering
(2007)
EDI Forum, 2006. Available from:...
R. Kalakota et al.
Electronic Commerce, A Manager’s Guide
(1997)
Product Comparison Shopping in PriceSCAN.com, 2006. Available from:...
E. Darrudi, F. Oroumchian, M. Rahgozar, M.S. Mirian, K. Neshatian, B.R. Ofoghi, TeLQAS: a realization of humanlike...
J.M. Abasolo, M. Gmez, MELISA: an ontology based agent for information retrieval in medicine, ECDL 2000 Workshop on the...
D. Moldovan, S.Harabagiu, R. Gîrju, P. Morãrescu, F. Lãcãtuou, A. Novischi, A. Bãdulescu, O. Bolohan, Lcc tools for...
Hui Yang, Tat-Seng Chua, Shuguang Wang, Modeling web knowledge for answering event-based questions, in: 12th...

There are more references available in the full text version of this article.

Cited by (37)

A literature review on question answering techniques, paradigms and systems
2020, Journal of King Saud University - Computer and Information Sciences
Question Answering (QA) systems enable users to retrieve exact answers for questions posed in natural language.
This study aims at identifying QA techniques, tools and systems, as well as the metrics and indicators used to measure these approaches for QA systems and also to determine how the relationship between Question Answering and natural language processing is built.
The method adopted was a Systematic Literature Review of studies published from 2000 to 2017.
130 out of 1842 papers have been identified as describing a QA approach developed and evaluated with different techniques.
Question Answering researchers have concentrated their efforts in natural language processing, knowledge base and information retrieval paradigms. Most of the researches focused on open domain. Regarding the metrics used to evaluate the approaches, Precision and Recall are the most addressed.
Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering
2020, Information Sciences
Citation Excerpt :
Factoid questions can be answered by single words or phrases expressing a person name, a temporal expression, or a location, whereas non-factoid questions can be answered by sentences or paragraphs expressing definitions, reasons, or methods. Moreover, depending on the typology of information sources (knowledge bases/documents) and, thus, on how the Information Retrieval (IR) can be performed on them, QA systems can be further classified according to two major paradigms, namely knowledge-based [41,46] and IR-based [45]. Finally, depending on the typology of questions to be answered, they can be distinguished into open-domain, if no restriction is made on the domain of the questions, and closed-domain, if questions are bound to a specific domain.
Question Answering (QA) systems based on Information Retrieval return precise answers to natural language questions, extracting relevant sentences from document collections. However, questions and sentences cannot be aligned terminologically, generating errors in the sentence retrieval. In order to augment the effectiveness in retrieving relevant sentences from documents, this paper proposes a hybrid Query Expansion (QE) approach, based on lexical resources and word embeddings, for QA systems. In detail, synonyms and hypernyms of relevant terms occurring in the question are first extracted from MultiWordNet and, then, contextualized to the document collection used in the QA system. Finally, the resulting set is ranked and filtered on the basis of wording and sense of the question, by employing a semantic similarity metric built on the top of a Word2Vec model. This latter is locally trained on an extended corpus pertaining the same topic of the documents used in the QA system. This QE approach is implemented into an existing QA system and experimentally evaluated, with respect to different possible configurations and selected baselines, for the Italian language and in the Cultural Heritage domain, assessing its effectiveness in retrieving sentences containing proper answers to questions belonging to four different categories.
A graph-based semantic relatedness assessment method combining wikipedia features
2017, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Semantic relatedness between concepts is considered as an important problem for many tasks in Natural Language Processing (NLP) such as automatic detection and correction of spelling errors (Budanitsky and Hirst, 2006), word sense disambiguation (Han and Zhao, 2010; Leacock and Chodorow, 1998), semantic annotation (Sanchez et al., 2011b), information retrieval (Baziz et al., 2005; Finkelstein et al., 2002; Formica, 2008; Gurevych et al., 2007; Tapeh and Rahgozar, 2008), and knowledge acquisition (Liu et al., 2012).
Semantic relatedness assessment between concepts is a critical issue in many domains such as artificial intelligence, information retrieval, psychology, biology, linguistics and cognitive science. Therefore, several methods assess relatedness by exploiting knowledge bases to express the semantics of concepts. However, there are some limitations such as high-dimensional space, high-computational complexity, fitting non-dynamic domains. Considering that Wikipedia, a domain-independent encyclopedic repository, which provides very large coverage, has been exploited by many methods as a huge semantic resource. In this paper, we propose a novel graph-based relatedness assessment method using Wikipedia features to avoid some of the limitations and drawbacks mentioned above. Firstly, for each term in a word pair, the top $k$ most relevant Wikipedia concepts are returned by the Naive-ESA algorithm to reduce the dimensional space of Explicit Semantic Analysis (ESA) method. Secondly, for each different candidate concept in two relevant concept sets, we collect its categories set from the Wikipedia Category Graph (WCG). Based on the categories in WCG network, the relatedness between concepts at the correspondence position of the two sorted concept sets is computed as the association coefficient. Thirdly, based on this parameter, a novel relatedness assessment metric is presented. The evaluation is performed on some datasets well-recognized as benchmarks, using several widely used metrics and a new metric designed by ourselves. The result demonstrates that our method has a better correlation with the intuitions of human judgments than other related works.
Computing semantic relatedness using Wikipedia features
2013, Knowledge-Based Systems
Citation Excerpt :
Semantic Relatedness (SR) is used as a necessary pre-processing step to many Natural Language Processing (NLP) tasks, such as Word Sense Disambiguation (WSD) [21,15]. Moreover, SR constitutes one of the major stakes in the Information Retrieval (IR) [10,2,13,56,60] especially in some tasks such as semantic indexing [51]. A powerful semantic relatedness measure can have influences on Semantic Information Retrieval (SIR) system.
Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, we propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Therefore, we utilized the Wikipedia features (articles, categories, Wikipedia category graph and redirection) in a system combining this Wikipedia semantic information in its different components. The approach is preceded by a pre-processing step to provide for each category pertaining to the Wikipedia category graph a semantic description vector including the weights of stems extracted from articles assigned to the target category. Next, for each candidate word, we collect its categories set using an algorithm for categories extraction from the Wikipedia category graph. Then, we compute the semantic relatedness degree using existing vector similarity metrics (Dice, Overlap and Cosine) and a new proposed metric that performed well as cosine formula. The basic system is followed by a set of modules in order to exploit Wikipedia features to quantify better as possible the semantic relatedness between words. We evaluate our measure based on two tasks: comparison with human judgments using five datasets and a specific application “solving choice problem”. Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches.
A Conceptual Graph-Based Method to Compute Information Content
2023, Mathematics
Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese
2023, International Journal of Machine Learning and Cybernetics

View all citing articles on Scopus

View full text