Opinion mining from online hotel reviews – A text summarization approach

https://doi.org/10.1016/j.ipm.2016.12.002Get rights and content

Highlights

  • Text summarization technique can extract essential information from online reviews.

  • Our method can identify top-k most informative sentences from online hotel reviews.

  • We jointly considered author, review time, usefulness, and opinion factors.

  • Online hotel reviews were collected from TripAdvisor in experimental evaluation.

  • The results show that our approach provides more comprehensive hotel information.

Abstract

Online travel forums and social networks have become the most popular platform for sharing travel information, with enormous numbers of reviews posted daily. Automatically generated hotel summaries could aid travelers in selecting hotels. This study proposes a novel multi-text summarization technique for identifying the top-k most informative sentences of hotel reviews. Previous studies on review summarization have primarily examined content analysis, which disregards critical factors like author credibility and conflicting opinions. We considered such factors and developed a new sentence importance metric. Both the content and sentiment similarities were used to determine the similarity of two sentences. To identify the top-k sentences, the k-medoids clustering algorithm was used to partition sentences into k groups. The medoids from these groups were then selected as the final summarization results. To evaluate the performance of the proposed method, we collected two sets of reviews for the two hotels posted on TripAdvisor.com. A total of 20 subjects were invited to review the text summarization results from the proposed approach and two conventional approaches for the two hotels. The results indicate that the proposed approach outperforms the other two, and most of the subjects believed that the proposed approach can provide more comprehensive hotel information.

Introduction

Advances in Internet technology and the vigorous development of Web 2.0 applications have caused substantial change in the tourism industry (Litvin, Goldsmith, & Pan, 2008). The rise of online tourism forums has rendered the Internet the primary means of seeking travel information (Chung and Koo, 2015, Jannach et al., 2014, Li et al., 2015, Liu and Park, 2015). Travelers communicate with each other and share their perspectives and experiences on social networking sites, generating numerous reviews daily (Cantallops and Salvi, 2014, Chung et al., 2015, Ye et al., 2011). For example, TripAdvisor.com, one of the most widely used travel sites, provides a platform for sharing reviews and opinions on various hotels and restaurants. In addition, members can rate reviews according to their perceived usefulness. According to a survey conducted by Ady and Quadri-Felitti (2015), nearly 95% of travelers read online hotel reviews before making their booking decision, and more than one-third of travelers believe that the opinion expressed in reviews is the most critical factor in selecting a hotel online.

Up to thousands of reviews for a single hotel can easily be accumulated from social media, but this causes the problem of information overload (Harer and Kadam, 2014, Liu et al., 2012, Peetz et al., 2016). Summarized review content can give readers the most essential information about a hotel and also save time during their purchasing decision process (Ady & Quadri-Felitti, 2015). Papathanassis and Knolle (2011) indicated that in the era of web 2.0, a novel content management process is required to control and maintain the user-generated content (UGC) in tourism websites. Therefore, how to automatically summarize online hotel reviews is an appealing research topic.

The text summarization technique is used to extract the most essential information from an original text and generate a simplified version of the text for users (Gambhir and Gupta, 2016, Gupta and Lehal, 2010, Kar et al., 2015, Mani and Maybury, 1999). In the past decade, text summarization has been applied in various domains (Abdi et al., 2015, Liu et al., 2012, Liu et al., 2015, Ly et al., 2011, June, Mehta, 2013, Sankarasubramaniam et al., 2014). For example, given a user query, Google can offer dozens of website links as well as a short paragraph of summarized text regarding the content of each website, which helps users judge the interestingness or usefulness of websites. For another example, a well-known application software, Summly, can automatically retrieve relevant news articles and then show the digests of each news article according to the news categories selected by users.

Generally, text summarization can be divided into single-text and multi-text summarizations (Gupta & Lehal, 2010). Single-text summarization addresses the problem of text summarization for a single document only. Because the author (or group of authors) of a document completed it according to a common consensus, the content does not exhibit inconsistency problems. In other words, the opinions in a single text do not conflict. Furthermore, because a single document is issued at a specified time point, single-text summarization does not need to consider the effect of content novelty. Comparably, multi-text summarization simultaneously processes multiple documents on the same subject (Heu, Qasim, & Lee, 2015). When handling multi-text summarizations, the problem of conflicting opinions raised by different authors should be resolved and the consistency of semantic expression in the summarization results must be ensured.

This paper proposes a novel multi-text summarization technique specifically designed for hotel review summarizations. Previous studies on review summarizations have primarily focused on using text processing techniques, such as bag-of-word and semantic approaches (Atkinson and Munoz, 2013, Jeong et al., 2016, Meng and Wang, 2009, Wang et al., 2013), disregarding other useful information that could be extracted from online social media, such as (a) author reliability, which implies that the reviews written by a famous author or blogger have high credibility; (b) review time, which implies that more recent reviews generally provide users with more up-to-date information; (c) review usefulness, which implies that reviews that receive higher ratings from other users typically provide more information; and (d) conflicting opinions, which implies that different reviewers have their own preferences and might not agree with one another. Thus, reviews expressing different sentiment polarities (i.e., either positive or negative comments) might contain information raised by different reviewers. To the best of our knowledge, no studies have considered these four factors collectively for the hotel review summarizations. In this manner, this study addressed the following research questions:

  • How can the proposed multi-text summarization technique accurately generate a useful summarized review from a set of online hotel reviews?

  • With the consideration of author reliability, review time, review usefulness, and conflicting opinions, can the proposed approach offer better hotel review summarization results than the conventional text summarization approaches (i.e., consider review text only)?

In experimental evaluation, the hotel reviews were collected from TripAdvisor.com and the abovementioned four factors were jointly considered. In particular, the first three factors were used to calculate the importance score of each sentence. To resolve conflict opinions, both content and sentiment analyses were performed to determine the similarity between two sentences. Based on the new similarity measure, the sentences can be clustered into k groups and the most representative sentence in each group can be utilized to form the final text summarization result (i.e., top-k informative sentences).

The remainder of this paper is organized as follows: Section 2 reviews both online hotel reviews and previous research on text summarization; Section 3 formally discusses the research method of this study; Section 4 presents the preparation of data, experimental setup, and the evaluation results; and finally, Section 5 concludes the paper.

Section snippets

Online hotel reviews

UGC has become one of the main information sources on the Internet. Different types of electronic word-of-mouth (eWOM) such as online reviews and opinions have been recognized as the most influential communication channel among service providers and consumers as well as among consumers themselves (Casaló, Flavián, & Guinalíu, 2010).

As the arising of online tourism services, the hotel industry is strongly affected by eWOM. Cantallops and Salvi (2014) reviewed the published articles regarding the

Research method

Our research process (as shown in Fig. 1) can be divided into five principal steps: hotel review collection, review preprocessing, sentence importance calculation, sentence similarity calculation, and top-k sentence recommendations. First, hotel reviews were collected from TripAdvisor.com. The collected reviews were then preprocessed by removing all unnecessary information. The importance score of each sentence was then determined using the proposed approach. The conflicting review comments

Data collection

In this study, the two hotels, Red Roof Inn and Gansevoort Meatpacking Hotel, were selected from TripAdvisor.com. All of the reviews regarding the two investigated hotels were collected from January 1, 2012 to March 31, 2013. However, the reviews that we analyzed presented the following limitations: (a) TripAdvisor includes reviews in several languages, such as Chinese, Japanese, and French. This study performed analyses of the English reviews only. (b) TripAdvisor is a cross- platform website.

Practical and theoretical implications

A number of practical implications may be derived from this study. First, the developed review summarization technique can be served as a guideline to develop a smart tourism information system (STIS) for travel websites. Specifically, when a user browses the reviews of a hotel in a travel website, the STIS can directly generate a summarized hotel review from thousands of reviews of the target hotel. Because online viewers usually have limited time to handle a large amount of online hotel

Conclusions

This research proposes a novel text summarization technique to determine the top-k most informative sentences from online hotel reviews. Most previous studies on review summarizations have focused on text preprocessing, which disregards critical factors such as the credibility of review authors, review time, review usefulness, and conflicting opinions. In addition to the information (i.e., keywords and key phrases) extracted from the hotel reviews, this study also considered all the

References (61)

  • E. Lloret et al.

    A novel concept-level approach for ultra-concise opinion summarization

    Expert Systems with Applications

    (2015)
  • S.M.C. Loureiro et al.

    Corporate reputation, satisfaction, delight, and loyalty towards rural lodging units in Portugal

    International Journal of Hospitality Management

    (2011)
  • M.P. O'Mahony et al.

    A classification-based review recommender

    Knowledge-Based Systems

    (2010)
  • J.P. Qiang et al.

    Multi-document summarization using closed patterns

    Knowledge-Based Systems

    (2016)
  • A. Papathanassis et al.

    Exploring the adoption and processing of online holiday reviews: A grounded theory approach

    Tourism Management

    (2011)
  • M.H. Peetz et al.

    Estimating reputation polarity on microblog posts

    Information Processing & Management

    (2016)
  • Y. Sankarasubramaniam et al.

    Text summarization using Wikipedia

    Information Processing & Management

    (2014)
  • B.A. Sparks et al.

    The impact of online reviews on hotel booking intentions and perception of trust

    Tourism Management

    (2011)
  • I.E. Vermeulen et al.

    Tried and tested: The impact of online hotel reviews on consumer consideration

    Tourism Management

    (2009)
  • D. Wang et al.

    SumView: A Web-based engine for summarizing product reviews and customer opinions

    Expert Systems with Applications

    (2013)
  • Q. Ye et al.

    The impact of online user reviews on hotel room sales

    International Journal of Hospitality Management

    (2009)
  • Q. Ye et al.

    The influence of user-generated content on traveler behavior: An empirical investigation on the effects of e-word-of-mouth to hotel online bookings

    Computers in Human Behavior

    (2011)
  • M. Ady et al.

    Consumer research identifies how to present travel review content for more bookings

  • A. Balahur et al.

    Challenges and solutions in the opinion summarization of user-generated content

    Journal of Intelligent Information Systems

    (2012)
  • A.S. Cantallops et al.

    New consumer behavior: A review of research on eWOM and hotels

    International Journal of Hospitality Management

    (2014)
  • N. Chung et al.

    Adoption of travel information in user-generated content on social media: The moderating effect of social presence

    Behaviour & Information Technology

    (2015)
  • R.L. Cilibrasi et al.

    The google similarity distance

    IEEE Transactions on Knowledge and Data Engineering

    (2007)
  • H.P. Edmundson

    New methods in automatic extracting

    Journal of the ACM

    (1969)
  • M. Gambhir et al.

    Recent automatic text summarization techniques: a survey

    Artificial Intelligence Review

    (2016)
  • V. Gupta et al.

    A survey of text summarization extractive techniques

    Journal of Emerging Technologies in Web Intelligence

    (2010)
  • Cited by (247)

    • Sentiment review of coastal assessment using neural network and naïve Bayes

      2024, International Journal of Electrical and Computer Engineering
    View all citing articles on Scopus
    View full text