An SVD-Entropy and bilinearity based product ranking algorithm using heterogeneous data
Introduction
Product review has emerged as a novel field of research and has valuable implications in the real world. Many commercial commerce websites, such as Amazon.com, Angie's List, Google, Yelp, etc., provide a platform for consumers to share their opinions. In 1995, Amazon was one of the first online stores to allow customers to post reviews for products. Amazon remains one of the most valuable resources for consumers to make informed decisions for purchasing products [1]. Similarly, Yelp is a free review site where consumers can rate businesses on a five-star scale. Unbiased reviews by customers build the confidence of new consumers to go ahead with the transactions [2]. The reviews can be quantitative in the form of a five-star rating (0–5), or qualitative in the form of plain text comments or textual answers to questions. However, with the ever growing volume of available review data and the limited time and attention span of customers, researchers are focusing on advanced mining methods to ascertain the quality, authenticity, and usefulness of reviews. The goal is to find information buried in the noisy data so as to help the customer make an informed decision. The existing data mining techniques sometimes lack the novelty and flexibility to work well with the new age applications and there is a growing need to combine the features of various available algorithms into advanced practical methods.
The product rating with five stars (or any numerical scale 0–5 or 0–10 etc.) alone does not give the sentiment of the review as the opinion semantics text does [3], [4]. Hence, nowadays the trend is to use more than one type of consumer sentiment in the surveys about products and services. However, though it is quick and easy to process quantitative ratings, qualitative information is vague because the deciphering and evaluation of text reviews are time-consuming due to an unstructured form of natural language text [2]. In fact, it is a separate area of Information Retrieval and Natural Language Processing where text character similarity and semantic similarity are used.
Reviews remain a primary source of information for consumers, producers, and marketers [5]. However, reviews are subjective opinions and judgment about a product or a service. The availability of search and evaluation tools makes the consumers knowledgeable as well as comfortable to distinguish between forged and genuine reviews from their implied reputation [6]. Fortunately, there are advanced techniques available to scrutinize the reviews for the validity of information [7] such as using the reliable loyal highly ranked reviewers to authenticate the quality of reviews by factoring the reviews they have written.
There are several approaches to ascertain the quality, authenticity and usefulness of the qualitative and quantitative reviews: Entropy, Decision trees, SVD, Information Retrieval, Natural Language Processing, Support Vector Machines, Machine Learning, Stochastic probability, etc. These approaches focus on (1) evaluating the helpfulness of the review data, (2) ranking products based off of the relevance and ratings from the reviews, and (3) reducing the consumer search and discovery time. However, no single algorithm can achieve all three. In this paper, we design an improved algorithm based of a combination of mining and classification approaches to analyze the review data and help consumers determine the good and bad features of the product before they buy it.
In this paper, we use heterogeneous Amazon review data for our experiments, same as in [8]. However, in the previous work, most of the results were theoretical. In this paper, we provide the results of running individual and hybrid algorithms on the datasets highlighting the prediction accuracy of the new algorithm. Also, since datasets are large, processing is time consuming. We applied SVD to data before processing. In the interest of efficiency and effectiveness of the algorithm, data size is reduced while keeping the prediction accuracy intact. We will discuss these merits in detail later in the chapter.
The paper is organized as follows: Section 2 describes the background and literature review about SVD-Entropy and Bilinear similarity. Section 3 describes our contribution, a hybrid algorithm that is designed to surpass the performance and accuracy of the individual techniques. Section 4 is on experiments that confirm the hybrid algorithm performs as expected and Section 5 concludes with summarization.
Section snippets
Preliminaries
Data analysis starts with a set of foundations adapted from several fields including statistics, mathematics, social sciences, natural sciences, and computer science. There are four stages of data analysis: data collection, data preparation/transformation, data analysis algorithms, and put the algorithm into practice [9].
Here we describe the standard terms used in this work and discussion [10]. Data instance/object is in the form of a vector. We assume all vectors are column vectors. A matrix
Hybrid adaptive algorithm
Our objective is to develop an adaptive hybrid algorithm using a combination of existing techniques to rate the products consistently based on consumer sentiment. We take advantage of many of the approaches surveyed above into a single adaptive hybrid algorithm to rank products. We use the following heterogeneous information related to a product as inputs to our approach. Once the quantitative rating is computed, the ranking is just indexing of products based on quantitative measures. The
Empirical evaluation and discussion
We start by reducing the product review data space with SVD and then calculate the Entropy measure of the transformed product reviews. The SVD transformation resulted in data reduction by a factor of 3 resulting in 35% improvement in overall running time. Using the Entropy based classification model as explained in Section 2.4, we calculate the Scorei for the reviews per product ‘i.' We use k-fold cross validation, k= 10, to optimize the model parameters to make the model fit the training data
Conclusion
Several commercial e-commerce websites provide a platform for consumers to share their opinions. In this paper, we developed an algorithm to process heterogeneous survey data for ranking consumer products. The data consists of three diverse representations: five star rating, Q&A text, reviews text. Our hybrid approach takes the best features of analysis methods: two-pronged similarities in terms of Entropy and Bilinear similarity. We have used three diverse categories of Amazon.com data: k-core
References (29)
- et al.
Feature selection with SVD Entropy: some modification and extension
Inf. Sci.
(2014) - L. Kolowich, [Online] 19 Online Review Sites for Collecting, Business & Product Reviews, Hubspot, September 17,...
- et al.
Importance of online product reviews from a consumer's perspective
Adv. Econ. Bus.
(2013) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews
Sentiment Analysis and Opinion Mining
(2012)- [Online] A. Haghighi, How to approach machine learning as a non-technical person. Crunch Network, Apr 2, 2016....
- et al.
The effect of on-line consumer reviews on consumer purchasing intention: the moderating role of involvement
Int. J. Electron. Commer.
(2007) Let's talk about Amazon reviews: how we spot the fakes
(2016)- et al.
An entropy based product ranking algorithm using reviews and Q&A data
- L. Getoor (Chair), D. Culler, E. Sturler, D. Ebert, M. Franklin, and H.V. Jagadish on behalf of the CRA Board,...