Keywords

1 Introduction

With the rapid growth of e-commerce, online users are provided with the access to a greater variety of products and an enormous amount of information about the products. Particularly, product reviews have taken an important role in users’ online purchasing process [2, 7, 13, 25]. According to statistics in [14], 84% of Americans have used product reviews to make their online purchasing decisions.

The studies from customer behavior state that people normally go through three stages of decision making in e-commerce: (1) Stage 1-screening out interesting ones for further consideration; (2) Stage 2-evaluating an alternative in details to decide whether to save it as a purchase candidate (the transition between stage 1 and stage 2 is iterative until the user locates a set of candidates); and (3) Stage 3-comparing purchase candidates for the final choice [5]. To help users effectively and efficiently glean information from reviews for making better purchasing decisions, some systems have summarized product reviews by extracting attributes and associated sentiments, and employed different approaches to conveying the attribute-sentiment information mainly at the 2nd and 3rd stages (i.e., showing product reviews at detail page and comparison page) [3, 4, 15, 27]. However, little work has investigated how to present reviews at the 1st stage for facilitating users to screen out interesting alternatives.

Therefore, in this manuscript, we are motivated to develop a novel review-based screening interface. Specifically, in terms of how people utilize reviews to screen out items [26], we made two major innovations in the interface design. Firstly, it supports users to eliminate alternatives by both sentiment attributes (i.e., the attributes extracted from product reviews) and static attributes (i.e., the physical properties of a product). Secondly, to help users effectively determine the cutoff value of an attribute, we visualized the value distribution of each attribute and tradeoffs among attributes. Then, we performed a user study to validate the superiority of our review-based screening interface against traditional screening interface. The results show that people depended highly on sentiment attributes, which points to the benefit of incorporating them in screening interface. Moreover, the novel interface achieves more positive user assessments in terms of perceived decision accuracy, cognitive effort, pleasantness to use, and intention to return.

The remainder content is organized as follows. We first introduce related work in two steams: users’ decision-making process during online purchasing and relevant review-based interfaces (Sect. 2). We then describe the details of developing review-based screening interface (Sect. 3). The setup and results of a user study will follow (Sects. 4 and 5). Finally, we conclude the work and discuss our findings’ practical implications (Sect. 6).

2 Related Work

2.1 Three-Stage Decision Making Process of Online Purchasing

From the perspective of customers, online purchasing can be viewed as a decision making process, in which the user is required to choose a suitable product among a huge number of options. In classical decision theory, customers are assumed to process all relevant information and explicitly consider trade-offs among attributes to choose an optimal product with the maximum utility [20].

However, some researchers have demonstrated that in complex decision environments (such as choosing an option from a large number of alternatives with a variety of attributes), individuals are often unable to evaluate all available alternatives in great depth for making decisions [1, 8]. Instead, they are inclined to process the information at different stages: (1) the initial screening of available products to determine which ones are worth considering further, and (2) the in-depth comparison of selected products to make the actual purchase decision [10, 18].

In [5], a precise three-stage decision process was proposed: (1) screening out interesting ones that are worth further consideration, (2) reading detailed information about the product selected in the preceding stage and deciding whether to take it as a purchase candidate, and (3) comparing several candidates to make the final choice. Decision makers basically follow such a linear process, but they cycle between the 1st and 2nd stages until one or more candidates are located (see Fig. 1).

Fig. 1.
figure 1

Three-stage decision-making process of online purchasing [5]

2.2 Review-Based Interface Design

At the detail page of a product (i.e., stage 2), users are inclined to explore the positive/negative sentiments towards one or more attributes. Carenini et al. summarized product reviews in the form of a tree map which visualizes the sentiment and frequency of each attribute via box color and size [3]. In addition, to help users digest reviews in greater detail, Yatani et al. extracted frequently mentioned adjective-noun word pairs from reviews, using the font size and color to represent the occurrence frequency and sentiment [27]. Moreover, Hu et al. and Huang et al. automatically highlighted associated review sentences when users hover over a specific attribute or word pair, to make a balance between reducing information overload and providing original review context [11, 12].

At the comparison interface (i.e., stage 3), users tend to perform a side-by-side and feature-by-feature comparison of product reviews on competing candidates. Liu et al. used bar charts to show positive (above x-axis) and negative (below x-axis) sentiments on the attributes of a camera, with the bar’s height representing the number of mentions [15]. Based on Liu et al.’s work, Carenini et al. developed a stacked bar chart to visualize the sentiment of each attribute. Each bar corresponds to a polarity category (from −3 to 3) and its height represents the quantity of that sentiment [4]. Chen et al. developed a comparison interface by combining numerical sentiments (e.g., sentiment score and occurrence frequency) with verbal sentiments (i.e., opinion words/phrases) [6].

To the best of our knowledge, regarding how to present reviews to help users screen out interesting alternatives (i.e., stage 1), few researches have put forward specific design solutions.

3 Review-Based Screening Interface

The studies from consumer decision making state that an effective information display (leading to more accurate decision with less effort) requires an in-depth comprehension of users’ decision making behaviors [18]. In [26], we initially did a formative study to empirically investigate how people utilize reviews to make online purchasing decisions. As to the process of screening out interesting alternatives, users’ decision making behaviors and design implications are summarized in Table 1.

Table 1. Users’ decision making behaviors and design implications at the 1st stage

The review-based screening interface was hence generated to optimize these two design implications. More specifically, the interface generation contains the following three major steps:

Step 1:

To develop review-based screening interface, we take online hotel booking as the sample domain for two reasons: (1) it is easier to recruit a sufficient number of target users to test its effectiveness, and (2) we can obtain abundant online hotel reviews from commercial sites. A dataset with 100 B&B (50 in Beijing and 50 in Rome) is used for the experiment; all hotels’ specifications and reviews are crawled from Tripadvisor.com in September 2014.

Step 2:

In the context of hotel booking, in addition to three static attributes (i.e., district, facility and price), four most frequently mentioned sentiment attributes (i.e., cleanliness, location, value, and service) are incorporated to help users eliminate alternatives. As for “facility” and “district”, users tend to specify multiple discrete values (e.g., choosing hotels with wifi, kitchen, and car parking). Therefore, we utilized “press-and stick” toggle buttons as their filters, which support multi-choice and are space-saving [22]. Considering price and sentiment attributes, users are inclined to select data points that are less than a larger number or greater than a smaller number. Hence, their filters are presented in the form of double sliders to facilitate users to adjust the value range (see Fig. 2).

Fig. 2.
figure 2

Screenshot of the review-based screening interface

Step 3:

In comparison to pure sliders, we made two major modifications to accommodate presenting the value distribution of each attribute and trade-offs among attributes to help users more effectively determine the cutoff values.

For a slider filter, the value distribution of an attribute can be represented in the form of bar chart (see Fig. 3). The height of a bar is proportional to the number of hotels, which is a good visualization of data because it may reduce learning time and potential misunderstanding [17]. Moreover, the number of hotels with values satisfying the specified range is shown right above bars based on the Gestalt principles of proximity [23].

Fig. 3.
figure 3

Screenshot of the review-based screening interface (mouse-over status)

Because real-time change can effectively reflect the relation of values [16], the trade-offs among attributes are visualized via simultaneous move of slider knobs. For instance, when the user drags the slider knob to reduce price, the slider knobs standing for the maximum values of sentiment attributes will move simultaneously (see Fig. 3). However, users may be inclined to miss the changes because they can only quickly take in information from 1 to 4 degrees of visual angles [24]. With the purpose of clearly expressing the trade-offs among attributes, the corresponding knobs of different sliders are connected with lines, which could be powerful to express relationships [19].

To make critical information prominent and avoid information overload, we employed “details on demand” which shows details “hidden behind” specific points [21]. More specifically, only when users hover over or drag a slider knob, both the bar chart and tapered lines connecting relevant slider knobs are shown up.

4 User Study

4.1 Materials

In this section, we performed a user study to test the effectiveness of our innovative review-based screening interface against the traditional screening interface.

In the traditional screening interface of e-commerce websites, checkbox has been broadly utilized as filter form. Specifically, the filter of each attribute is composed of an array of N checkboxes, each of which represents a value range on the dimension. Users can select products with values within a certain range by clicking corresponding checkbox. For example, the filter of cleanliness is composed of five checkboxes that stand for choosing hotels with ‘above 4.5’, ‘above 4’, ‘above 3.5’, ‘above 3’ and ‘all’ scores. Every checkbox is followed by the number of products with values within the range (shown in bracket). When a set of products are selected, the number following each checkbox simultaneously changes. Figure 4 provides an example, in which when users click the box labeled ‘Restaurant’ to select the 23 hotels with restaurant, the distribution of cleanliness scores changes in real-time (e.g., none of the 23 hotels’ cleanliness scores are ‘above 4.5’).

Fig. 4.
figure 4

Screenshot of the traditional screening interface

4.2 Evaluation Framework

Given that the objective of the experiment is to identify whether our innovative interface could better support users to screen out interesting alternatives and improve their decision-making process, the measurement was conducted from both objective and subjective perspectives.

Objective measures include users’ decision effort and behavior. The decision effort was assessed by users’ task completion time. To understand how users actually behaved, we measured the attributes users adopted for narrowing down alternatives.

Except for the above objective measures, users’ perception is mainly concerned with interface usability. According to ISO, usability is defined as the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use. Grounded in the definition, users’ subjective perceptions were assessed by 3 constructs: decision accuracy, cognitive effort and satisfaction. To measure these subjective perceptions, a set of questions were pre-designed mostly from existing studies, where they were tested and found to have strong content validity and reliability (see Table 2).

Table 2. Questions to measure users’ subjective perceptions

4.3 Within-Subjects Study

We utilized within-subjects method to compare the two interfaces under equal settings, which required all participants to go through both the traditional and innovative interfaces in a random order and finish a randomly assigned task (i.e., imagine that you will have a trip to Beijing/Rome with your friends in the summer holiday, and need to book a hostel online. The top 50 Beijing/Rome Bed and Breakfast are presented. Please choose interesting one(s) for further consideration). Compared to between-subjects method, within-subjects method was employed for two reasons [16]. First, fewer participants are needed since each participant is tested on both interfaces. Second, the variability in measurements is more likely due to differences between interfaces than to behavioral differences between participants. To avoid any carryover effect, we developed four (2*2) experiment conditions. The manipulated factors are interfaces’ order (innovative interface first or traditional interface first) and tasks’ order (locate hotels in Beijing first or in Rome first). Participants were evenly assigned to one of the conditions.

4.4 Procedure and Participants

To collect users’ actions and perceptions, we built an online experiment site, including the task description, evaluated interfaces, and questionnaires. All users’ actions and answers were automatically recorded in log files. The main procedures of the study can be divided into four steps.

  1. Step 1:

    Each participant was given a brief introduction to the experiment’s objective at the beginning, and then required to fill in his/her personal background and e-commerce experiences.

  2. Step 2:

    The user was asked to use a randomly assigned interface (traditional interface or innovative interface) to screen out interesting hotels (in Beijing or in Rome). After finishing the task, he/she was automatically led to a page to give his/her opinions on the interface.

  3. Step 3:

    The user used another interface (e.g., innovative interface if s/he just used traditional interface at step 2) to locate hotels worth further consideration at a new place (e.g., Beijing if s/he just searched for hotels in Rome at step 2). Similarly, when the task was done, s/he was also required to indicate her/his subjective perceptions with the interface he/she just used.

  4. Step 4:

    After the user went through the two interfaces, s/he was asked to indicate which one s/he prefers and the reasons.

60 participants (28 females) were recruited to take part in the user study. They are university staffs and students pursuing Bachelor, Master or PhD degrees, with different majors, such as Electronics, Engineering, and Architecture. Table 3 gives their demographic profile. In the pre-study questionnaire, they indicated their frequency of Internet use (on average 4.94 ‘almost daily’), e-shopping experiences (on average 3.63 ‘1–3 times a month’), and online hotel booking experience (on average 2.60 ‘a few times every 3 months’).

Table 3. Demographic profile of participants in user study

5 Results

SPSS 22 was used for data analysis. To identify whether the observed differences between the two interfaces are statistically significant, we ran t-test (at 95% confidence level) [9].

5.1 Subjective Measures

We firstly analyzed users’ answers to the questionnaires in order to know how they subjectively felt about the two interfaces. Table 4 lists participants’ mean response to each question and the significance analysis. We can observe that users gave more positive scores on our innovative screening interface than on the traditional screening interface concerning all 4 questions. The significance analysis further shows that the innovative interface achieves significantly higher scores regarding Q2 “Cognitive effort” (5.46 vs. 5.15 in traditional interface, t = −2.49, p = .016), Q3 “Pleasant to use” (5.63 vs. 5.17 in traditional interface, t = −2.28, p = .027), and Q4 “Return intention” (5.70 vs. 5.27 in traditional interface, t = −2.24, p = .029).

Table 4. Comparison on users’ subjective perceptions with the two interfaces

5.2 Objective Measures

Regarding time consumption, users spent more time on narrowing down options with our innovative interface than with the traditional interface, but without significant difference (124.23 vs. 105.38 s, t = −1.89, p = .064). Moreover, we recorded which attributes users adopted in both interfaces (see Fig. 5 (left)). For example, 73% and 51% of participants eliminated alternatives by facility in the traditional interface and innovative interface, respectively. Overall, the average application frequency of the four sentiment attributes (i.e., clean, location, value and service) is slightly higher than that of the three static attributes (i.e., facility, district and price) in both traditional interface (64.8% vs. 60.3%) and innovative interface (56.2% vs. 45%) (see Fig. 5 (right)). The high application frequencies of sentiment attributes indicate the necessity of incorporating them in screening interface to meet individual user’s filtering needs.

Fig. 5.
figure 5

The frequencies of use of static attributes and sentiment attributes

5.3 User Comments

In the post-study questionnaire, users were asked to choose the interface they preferred. 65% of users (39 out of 60) favored the innovative interface, whereas the other 35% of users liked the traditional checkbox interface. With Chi-square test, the difference is significant (χ2 = 5.40, p < .05). Further analysis of users’ comments made the reasons more explicit as to why the innovative interface was subjectively preferred by the majority of participants.

In total, 47 users gave their comments (34 preferred the innovative interface and 13 preferred the traditional checkbox interface). 19 participants felt the innovative interface is more intuitive, “The sliders show the filter information directly to me so that I can see all the details without additional click, whereas the checkbox requires more clicks”. The second reason is that the innovative interface supports users to specify more precise cut-off values (18 participants), “I can choose a more precise cut-off value (e.g., higher than ‘3.7’ for location) with slider compared to the cut-off options (e.g., ‘good’, ‘very good’) with checkbox”. Besides, 13 participants felt that it is easier for them to learn the value distribution of an attribute in the innovative interface, “The bar chat form makes it easier for me to pick up the ‘main stream’ zone”. The last but not least important reason is that the trade-offs among attributes are more accessible to users in the innovative interface (13 participants), “The available region of different attributes are related, showing how the change of one preference will affect the others. This eases the procedure of making trade-offs between attributes”.

As to the strong points of the traditional interface, 8/13 users felt that it is more common, “I am more familiar with the checkbox form, which are broadly employed in commercial websites”. In addition, 6/13 participants indicated that the traditional interface is simpler and easier to understand, while the innovative interface is a little complicated and overladen, “In checkbox interface the filter is simple and clear, while the 2nd interface contains some distracting visual information”.

6 Discussion

In this paper, we aimed to investigate how to present reviews at the 1st stage for helping users screen out interesting alternatives for further consideration.

Grounded in our prior findings on how users utilize reviews to make online purchasing decisions, we have developed a review-based screening interface as an improvement of the traditional screening interface. Subsequently, we were motivated to conduct a user study to test the effectiveness of our design solution. The results show that our innovative interface performed better than the traditional screening interface, regarding users’ perceived decision accuracy, effort and satisfaction. Besides, people actively utilized sentiment attributes to filter products in both interfaces.

As to practical implication, we believe our findings can be suggestive for researchers who are working on developing review-based interfaces. For e-commerce websites, our results provide insights on how to incorporate product reviews into the screening interface to help users screen out interesting alternatives. In fact, the satisfying user experience with our innovative interface suggests that it can be directly employed by the commercial sites to serve their online users.