Elsevier

Knowledge-Based Systems

Volume 21, Issue 8, December 2008, Pages 946-950
Knowledge-Based Systems

A knowledge-based question answering system for B2C eCommerce

https://doi.org/10.1016/j.knosys.2008.04.005Get rights and content

Abstract

The evolution of Business-to-Consumer (B2C) eCommerce has been formed through various generations. Last models of B2C eCommerce are comparative shopping systems that connect to multiple vendors’ databases and collect the information requested by the user. The comparative result obtained is then displayed in a tabular format in the user’s browser. Although this scenario is much better than the multiple manual site comparisons, user still needs to face inconsistent user interfaces when he is linked from the comparison site to the actual purchasing site for shopping. Therefore, user has to learn logics of each site’s user interface. In this paper, we propose a question answering system based on natural language processing techniques for retail (B2C) in eCommerce. This system gets a question in natural language formats, decomposes it to keywords, and extracts constraints automatically. Corresponding answers are then retrieved from the vendors’ Web sites by exploiting the question constraints.

Introduction

The eCommerce began with the introduction of EDI between companies, and ATMs for banking [1], [2]. Introduction of the Web Browsers opened up a new age by combining open internet and easy user interface approaches [1], [2].

B2C ordinarily refer to online trading and auctions, for example, online stock trading markets, online auction for computers and other goods. B2C eCommerce refers to the emerging commerce model where businesses/companies and consumers interact electronically or digitally in some way. One of the best examples of B2C eCommerce is Amazon.com, an online bookstore that launched its site in 1995. In a B2C eCommerce the focus is more about enticing prospects and converting them into customers, retaining them and share value created during the process. The ultimate goal is the conversion of shoppers into buyers as aggressively and consistently as possible.

In a typical B2C flow of information between business and consumer typically is through the medium of Internet. This flow includes product orders/service requests from customers and product information, specifications, providing of services by Business.

B2C eCommerce is the predominant commercial experience of Web users. A typical scenario involves a user’s visiting one or several online shops, browsing their offers, selecting and ordering products. Ideally, a user would collect information about price, terms, and conditions (such as availability) of all or at least all major, online shops and then proceed to select the best offer. But manual browsing is too time-consuming to be conducted on this scale. Typically a user will visit one or a very few online stores before making a decision.

However, the evolution of B2C eCommerce has been formed through various generations. Last models of B2C eCommerce are comparative shopping catalogs. Models such as pricescan.com [3] that visit several shops, extract product and price information, and compile a market overview. The comparative result obtained is then displayed in a tabular format in the user’s browser. This approach suffers from several drawbacks. First, it’s necessary for these models to get access grant from vendors before to access their databases for retrieving any information. Since some vendors may not give access grant to their databases, their product information will not appear in the information provided by these models. We have proposed a knowledge-based approach to resolve this problem in [15]. In this approach, products and price information are understood and extracted from Web pages of vendors’ sites to build virtual catalog directly. Second, user still needs to face inconsistent user interfaces when he is linked from the comparison site to the actual purchasing site for shopping. Therefore, user has to learn logics of each site’s user interface. For example, user has to analyze his question into some keywords based on logics of user interface and give them to the system. It means that there is not such a possibility that user can ask his question in form of natural language (such as English) and get his answer. We can say that using keywords based on logics of third generation system’s user interfaces is not a good way to establish relationship between user and system [4]. Because at first a user is not interested to extract keywords of his question or maybe he is unable to do so. On the other hand, usually a few keywords cannot cover the complete meaning of user’s question. In most cases, users are searching clear responses for their questions, while the outputs of third generation systems are collection of answers related to user’s question that probably they contain the correct answer.

In recent years, Question Answering (QA) systems have evolved out of the field of Information Retrieval to meet better the needs of information seekers. Unlike simple keyword-based information retrieval systems, they aim to communicate directly with users through a natural language. They accept natural language questions and return exact answers eliminating the burden of query formulation and reading lots of irrelevant documents to attain the answer. Open-domain QA systems deal with unrestricted questions upon large-scale text corpora typically by means of statistical approaches whereas restricted-domain systems endeavor to concentrate on a controlled domain of interest (e.g. weather forecast or UNIX technical manuals). MELISA [5] is a good example for restricted-domain QA systems.

In this paper, we propose a QA system for B2C eCommerce. Now, this system can answer all questions in domain of digital camera while it can be developed for any retail domains. This system exploits an initial knowledge base which makes some advantages in contrast with Open-domain QA systems (i.e. systems do not have any specific domain knowledge [6], [7], [8]).

We present the details of our approach in the remainder of the paper as follows. After a short overview of the related work in Section 2, Section 3 describes the system architecture. Section 4 explains how we define an initial knowledge’s concepts, relations, and instances. In Section 5, we describe our approach to analyze the NL questions. In Section 6, we report the experiments we conducted involving digital camera advertisements on the Web. Finally, Section 7 presents the conclusion of this work.

Section snippets

Related work

Halo [9] is one of the most ambitious recent investments in knowledge-based question answering systems, “a staged, long-term research and development initiative toward the development of a ‘Digital Aristotle’ capable of answering novel questions and solving advanced problems in a broad range of scientific disciplines.” In the pilot phase of the project the state of the art in knowledge representation and reasoning was applied for a limited syllabus in chemistry with promising results [10].

System architecture

In the proposed system, there is an agent makes possibility of natural language negotiation with user. This agent analyses the user NL questions and extracts the keywords and conditions of the questions. In the next step, extracted keywords are given to another agent called web crawler to search and retrieve the related pages which include same keywords. Retrieved pages are then passed to information extraction agent that extracts user’s exact answers using questions’ keywords and conditions.

Knowledge extraction

Knowledge is defined as concepts, their relationships, and concepts instances of specific domain. Concepts and relationships are identified and defined by domain experts. When we apply the knowledge to a Web page, the objects and relationships are identified and associated with concepts and relationships in the knowledge’s conceptual-model. Thus the strings on a Web page are recognized and understood in terms of the answers.

Fig. 2 shows partial knowledge extraction for digital cameras

Question analysis

In proposed system, there is a possibility for user to ask his question about sellers and products in natural language (i.e. English). In Fig. 3, there are some typical user questions about digital cameras.

User negotiation agent must make the NL questions machine understandable. It uses a question analyzer component for this job. This component analyses the user NL questions and extracts the keywords and conditions of the questions. Extracted keywords are given to another agent called web

Preliminary experiments

This section explains our experiments conducted to verify the validity of our approach. First we describe the process in which the underlying knowledge was created and implemented. Then we present the evaluation of our proposed approach.

Conclusion

In this paper we reported on a knowledge-based domain-specific question answering system for B2C eCommerce. Although the problem has been studied by several researchers, existing techniques are limited to specific heuristics and databases. An effective method is proposed to decompose the user’s NL questions and extract the keywords and conditions automatically. In the next step, we will be working on developing our system to cover all formats of the users’ questions.

We believe as the

References (17)

  • C.S. Lee et al.

    Automated ontology construction for unstructured text documents

    Journal of Data & Knowledge Engineering

    (2007)
  • EDI Forum, 2006. Available from:...
  • R. Kalakota et al.

    Electronic Commerce, A Manager’s Guide

    (1997)
  • Product Comparison Shopping in PriceSCAN.com, 2006. Available from:...
  • E. Darrudi, F. Oroumchian, M. Rahgozar, M.S. Mirian, K. Neshatian, B.R. Ofoghi, TeLQAS: a realization of humanlike...
  • J.M. Abasolo, M. Gmez, MELISA: an ontology based agent for information retrieval in medicine, ECDL 2000 Workshop on the...
  • D. Moldovan, S.Harabagiu, R. Gîrju, P. Morãrescu, F. Lãcãtuou, A. Novischi, A. Bãdulescu, O. Bolohan, Lcc tools for...
  • Hui Yang, Tat-Seng Chua, Shuguang Wang, Modeling web knowledge for answering event-based questions, in: 12th...
There are more references available in the full text version of this article.

Cited by (37)

  • A literature review on question answering techniques, paradigms and systems

    2020, Journal of King Saud University - Computer and Information Sciences
  • Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering

    2020, Information Sciences
    Citation Excerpt :

    Factoid questions can be answered by single words or phrases expressing a person name, a temporal expression, or a location, whereas non-factoid questions can be answered by sentences or paragraphs expressing definitions, reasons, or methods. Moreover, depending on the typology of information sources (knowledge bases/documents) and, thus, on how the Information Retrieval (IR) can be performed on them, QA systems can be further classified according to two major paradigms, namely knowledge-based [41,46] and IR-based [45]. Finally, depending on the typology of questions to be answered, they can be distinguished into open-domain, if no restriction is made on the domain of the questions, and closed-domain, if questions are bound to a specific domain.

  • A graph-based semantic relatedness assessment method combining wikipedia features

    2017, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    Semantic relatedness between concepts is considered as an important problem for many tasks in Natural Language Processing (NLP) such as automatic detection and correction of spelling errors (Budanitsky and Hirst, 2006), word sense disambiguation (Han and Zhao, 2010; Leacock and Chodorow, 1998), semantic annotation (Sanchez et al., 2011b), information retrieval (Baziz et al., 2005; Finkelstein et al., 2002; Formica, 2008; Gurevych et al., 2007; Tapeh and Rahgozar, 2008), and knowledge acquisition (Liu et al., 2012).

  • Computing semantic relatedness using Wikipedia features

    2013, Knowledge-Based Systems
    Citation Excerpt :

    Semantic Relatedness (SR) is used as a necessary pre-processing step to many Natural Language Processing (NLP) tasks, such as Word Sense Disambiguation (WSD) [21,15]. Moreover, SR constitutes one of the major stakes in the Information Retrieval (IR) [10,2,13,56,60] especially in some tasks such as semantic indexing [51]. A powerful semantic relatedness measure can have influences on Semantic Information Retrieval (SIR) system.

View all citing articles on Scopus
View full text