Customer preference identification from hotel online reviews: A neural network based fine-grained sentiment analysis

https://doi.org/10.1016/j.cie.2022.108648Get rights and content

Highlights

  • Customer preferences are identified from online reviews of hotels in this study.

  • Customer preference identification is conducted by fine-grained sentiment analysis.

  • An improved Neural Network approach is proposed in this study.

  • An empirical study of Ji hotel is used to illustrate the proposed approach.

  • The results show that our approach is effective in preference identification.

Abstract

As a kind of user-generated information, online reviews contain customers’ preferences for different aspects of hotels, which not only influence customers’ booking decisions but also help hotel managers to improve service quality of hotels timely. The key of deriving customers’ preferences from hotels’ online reviews is to identify fine-grained sentiment towards hotel attributes. In general, fine-grained sentiment analysis involves multiple fundamental tasks such as sentiment element extraction, aspect-opinion pair (i.e., AOP) identification and sentiment orientation analysis. However, existing fine-grained sentiment analysis methods cannot efficiently identify AOPs, especially when dealing with Chinese reviews. To this end, we construct an improved convolutional neural network (i.e., CNN) model, which can comprehensively utilize unstructured features and structured features, to improve the performance of AOP identification. We further propose a refined fine-grained sentiment analysis methodology to calculate accurate customer sentiment intensity value for each evaluated aspect rather than positive or negative sentiment, integrated with aspect term clustering algorithm, to identify customers’ specific preferences for different hotel attributes. Finally, to illustrate the reasonability and advantages of the proposed methodology, we conduct an empirical study with hotels’ online reviews crawled from Ctrip.com. Empirical results indicate that our proposed method can indeed improve the performance of AOP identification, and can effectively identify customer preferences from hotels’ online reviews. Furthermore, we find that customers show different preferences for different hotel attributes, and these vary across the types of customers.

Introduction

In past decades, hotel industry has witnessed substantial growth all over the world due to ever-increasing international and domestic tourism and business activities. A recent report shows that the market size of the global hotel and resort industry has increased year by year during 2011–2019, and reached at 1.47 trillion U.S. dollars in 2019 (Lock, 2021). Coupled with the rapid development and wide applications of information technologies, more and more giant online travel agencies (i.e., OTAs) have been emerged and boomed in recent years, e.g., Airbnb, Booking.com, TripAdvisor, Ctrip.com and Qunar.com. Customers are increasingly booking hotel rooms via these OTAs. In particular, about 1,550,000 stays are booked via Booking.com every day (Anjana, 2021). Nonetheless, customers inevitably face some uncertainty about hotel room attributes such as facilities and service quality, because of physical separation between customers and listed products or services (e.g., hotel rooms) at the time of ordering online (Zhang et al., 2021a). In this case, customers may first read online reviews on OTA platforms to make decision. A recent survey shows that more than two-thirds of travelers would usually read online reviews before booking a hotel room, and 93% of those say online reviews influence their booking decisions (Nilashi et al., 2021b). The underlying intuition is that online reviews are posted by those customers who have experienced the products or services, and thus the information conveyed in the reviews is more convincing to potential customers than hotels’ advertisements (Gavilan et al., 2018). Online reviews thereby have been regarded as an important mechanism to mitigate the uncertainty in online transactions (Kwark et al., 2014), which will have a significant impact on a company's profitability by affecting customers’ behaviors. However, in the face of the “information overload” of online reviews, it is difficult to manually extract valuable customer information comprehensively and quickly from massive reviews. Therefore, it is crucial to develop a methodology to automatically identify customer preferences from hotels’ online reviews.

In recent years, customer preference mining from online reviews has gained increasing concern. However, current research has mainly focused on extracting the sentiment orientation of each online review at the overall level, which cannot distinguish customer preferences for specific hotel attributes, e.g., price, sanitation and facilities. As observed in online reviews, a user-generated comment may generally contain several opinions, even diametrically opposite sentiment orientations, on different hotel attributes (Feldman, 2013). For example, the comment “The service quality of the hotel is super great, and the price is relatively low, whereas some facilities need to be replaced” indicates a positive sentiment towards the service quality and the price, but a negative one towards the facilities. This indicates that the customer is generally satisfied with the hotel, but a little dissatisfied with the hotel's facilities. In this case, to help improve the hotel’s performance, it is necessary to identify customers’ particular preference for each attribute from online reviews via the fine-grained sentiment analysis rather than the overall sentiment analysis. As such, we focus on fine-grained sentiment analysis of online reviews in this work.

Fine-grained sentiment analysis, namely aspect-based sentiment analysis, generally involves multiple fundamental tasks such as sentiment element extraction (e.g., aspect term extraction, opinion term extraction), aspect-opinion pair (i.e., AOP) identification and sentiment orientation analysis (Çalı and Balaman, 2019, Mao et al., 2021). The first task is generally treated as a sequence labeling task that assigns a label to each token (i.e., English word or Chinese character) in a review. Considerable efforts have been exerted to improve sentiment element extraction (Ji et al., 2020, Alshammari and Alanazi, 2021). In this study, to obtain the exact sentiments of customers towards each attribute of a hotel, we use the integrated method of Bi-LSTM (i.e., bi-directional long short-term memory) model and the CRF (i.e., conditional random fields) model to identify evaluated aspects, sentiment words, and affective modifiers of sentiment words.

As the second task, the goal of AOP identification is to recognize the sentiment words corresponding to the evaluated aspects. Researchers have never stopped exploring how to enhance the performance of AOP identification, but the outcome is still not very satisfactory. Related approaches can be generally grouped into two categories, i.e., knowledge-based methods (Qiu et al., 2011, Cambria et al., 2014, Chang et al., 2021) and learning-based methods. The knowledge-based methods usually employ a lot of linguistic knowledge when building rules for discovering the relationships of sentiment words and evaluated aspects in a review, but the formed rules are generally specified for a particular product or service, which are not compatible for others (Wang et al., 2018). In learning-based methods, the task of identifying AOPs is usually treated as a binary classification task (Mao et al., 2021), that is, the task is accomplished by judging whether any possible combination of evaluated aspect and sentiment word in the review has a matching relationship. The learning-based methods mainly include statistical machine learning methods and deep learning methods. Under statistical machine learning methods, the key task is to extract and filter features, and the performance highly depends on the meticulousness of the manually selected feature set. In the extant studies, a number of structured features, e.g., the position feature, the part-of-speech sequence feature, have been demonstrated to be suitable for classification or aspect-opinion pair recognition (Su et al., 2011; Chen and Manning, 2014). Deep learning methods show a distinct advantage in automatically obtaining word vectors (i.e., the digital text content representation) from textual information and applying them to well perform classification or prediction (Kim, 2014, Liu et al., 2020). Unfortunately, few efforts have been launched to introduce structured features into deep learning models along with unstructured textual content representation, especially regarding AOP identification. In this study, we propose an improved convolutional neural network model that can well fuse unstructured text content representation and structured features (i.e., dependency parse feature, part-of-speech sequence feature, sentiment element types and position feature). Moreover, we combine these structured features with the textual content representation to construct a new feature (i.e., the comprehensive feature), which is also fed into the convolutional neural network (i.e., CNN) model, for identifying AOPs.

Regarding the third task, sentiment orientation analysis generally aims to distinguish customers' positive or negative sentiment towards a specific aspect following AOP identification (Zhang et al., 2021a). However, our innovative work makes both tasks of sentiment element extraction and AOP identification perform well, so we further refine the aspect-level sentiment analysis framework, i.e., perform sentiment value calculation after sentiment element extraction and AOP identification, to achieve accurate identification of customer sentiment intensity value for each evaluated aspect. Specifically, we first use the proximity principle to match sentiment words and affective modifiers to form opinion phrases, and then construct three types of dictionaries (i.e. sentiment dictionary, degree adverb dictionary and negative adverb dictionary) to calculate sentiment values of opinion phrases corresponding to each evaluated aspect.

Furthermore, to identify customers’ specific preferences for different hotel attributes, we combine the aspect term clustering algorithm, that is, applying the K-means algorithm to cluster the evaluated aspects based on the word vectors obtained by word2vec (i.e., a widely used word embedding tool) to recognize the attribute categories that customers particularly concern. Finally, we apply our proposed aspect-level sentiment analysis framework to empirically examine customers’ preferences of Ji hotel in China, with hotels’ online reviews crawled from Ctrip.com.

Our empirical analysis leads to the following important findings. First, our proposed method that comprehensively considers the structured features and unstructured features can indeed improve the performance of the AOP identification. Second, we find that hotel customers mainly focus on 11 hotel attributes (i.e., service quality, room type, surrounding, price, transportation, facilities, catering, sanitation, parking, environment and epidemic prevention), and customers show different preferences for different hotel attributes. Notably, we find that service quality is the most critical factor for hotels. Third, customers’ preferences are closely related to their travel purposes. It is shown that customers vary their concerns about each attribute category of the hotel across their types and exhibit different positive and negative sentiment polarities for each attribute category of the hotel. Finally, we have further found that both the geographical distribution of the Ji hotels and the COVID-19 pandemic have slight influences on customers’ preferences.

The rest of this paper is organized as follows. Section 2 reviews the related literature. Section 3 presents the developed methodology. Section 4 applies the proposed methodology to identify customer preferences for the Ji Hotel from online reviews. Some interesting and important findings and insights are obtained from the empirical analysis in Section 4. Section 5 concludes this paper.

Section snippets

Literature review

Our work is closely related to customers preference analysis of hotels’ online reviews and the sentiment analysis of online reviews. We in this section review the most relevant studies.

Methodology

In this section, we develop an improved fine-grained sentiment analysis methodology, combined with aspect term clustering algorithm, to identify customers’ preferences for hotels’ attributes (e.g., service quality, prices and facilities) from online reviews. The framework of the proposed approach is depicted in Fig. 1.

As shown in Fig. 1, our approach includes four sequential steps: sentiment element extraction (i.e., aspect term extraction and opinion term extraction), aspect-opinion pair

Experiment results and customer preferences analysis

In this section, we first illustrate the proposed approach by using an experimental study, and then use it to extract customer preferences of Ji hotel in China. In particular, we design two groups of experiments to examine the performance of sentiment element extraction and aspect-opinion pair identification. Dataset description, experimental results and analysis, and customer preference analysis are presented below.

Conclusions

Online reviews, as a kind of user-generated information, contain a wealth of useful and valuable feedbacks from the customers who have used the products or experienced the service. In recent years, customer preference mining from online reviews, which can help improve a company’s performance, has attracted increasing attention in the literature. In this work, we develop a fine-grained sentiment analysis methodology consisting of three successive steps, i.e., sentiment element extraction, AOP

CRediT authorship contribution statement

Yiwen Bian: Conceptualization, Validation, Visualization, Supervision, Methodology. Rongsheng Ye: Methodology, Investigation, Data curation, Writing – original draft. Jing Zhang: Methodology. Xin Yan: Methodology, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was partly supported by programs granted by the National Natural Science Foundation of China (NSFC) (Nos. 72031004 and 71901053). The authors would like to thank the editor and the anonymous reviewers for their helpful comments and suggestions on earlier versions of the manuscript.

References (73)

  • Z.X. Ji et al.

    Power entity recognition based on bidirectional long short-term memory and conditional random fields

    Global Energy Interconnection

    (2020)
  • P.J. Lee et al.

    Assessing the helpfulness of online hotel reviews: A classification-based approach

    Telematics and Informatics

    (2018)
  • F.G. Liu et al.

    Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification

    Neurocomputing

    (2020)
  • M. Nilashi et al.

    Travelers decision making through preferences learning: A case on Malaysian spa hotels in TripAdvisor

    Computers & Industrial Engineering

    (2021)
  • S. Park et al.

    Understanding the dynamics of the quality of airline service attributes: Satisfiers and dissatisfiers

    Tourism Management

    (2020)
  • T. Radojevic et al.

    The effects of traveling for business on customer satisfaction withhotel services

    Tourism Management

    (2018)
  • M. Schuckert et al.

    A segmentation of online reviews by language groups: How English and non-English speakers rate hotels differently

    International Journal of Hospitality Management

    (2015)
  • X. Shi et al.

    Online consumer review and group-buying participation: The mediating effects of consumer beliefs

    Telematics and Informatics

    (2017)
  • Y. Song et al.

    Does hotel customer satisfaction change during the COVID-19? A perspective from online reviews

    Journal of Hospitality and Tourism Management

    (2022)
  • C.Y. Tsai et al.

    Improving text summarization of online hotel reviews with review helpfulness and sentiment

    Tourism Management

    (2020)
  • P. Vijayaragavan et al.

    An optimal support vector machine based classification model for sentimental analysis of online product reviews

    Future Generation Computer Systems

    (2020)
  • J. Wang et al.

    Using a stacked residual LSTM model for sentiment intensity prediction

    Neurocomputing

    (2018)
  • J.Y. Wang et al.

    Research on the role of influencing factors on hotel customer satisfaction based on BP neural network and text mining

    Information

    (2021)
  • Z. Xiang et al.

    What can big data and text analytics tell us about hotel guest experience and satisfaction?

    International Journal of Hospitality Management

    (2015)
  • X. Xu et al.

    Utilizing the platform economy effect through EWOM: Does the platform matter

    International Journal of Production Economics

    (2020)
  • X. Xu et al.

    The antecedents of customer satisfaction and dissatisfaction toward various types of hotels: A text mining approach

    International Journal of Hospitality Management

    (2016)
  • J. Zhang et al.

    Deriving customer preferences for hotels based on aspect-level sentiment analysis of online reviews

    Electronic Commerce Research and Applications

    (2021)
  • J. Zhang et al.

    Customer preferences extraction for air purifiers based on fine-grained sentiment analysis of online reviews

    Knowledge-Based Systems

    (2021)
  • W.H. Zhang et al.

    Weakness finder: Find product weakness from Chinese reviews by using aspects-based sentiment analysis

    Expert Systems with Applications

    (2012)
  • Anjana. (2021). Online Travel Agencies-A Brief Introduction. Retrieved from...
  • G. Bodet et al.

    Hotel attributes and consumer satisfaction: A cross-country and cross-hotel study

    Journal of Travel & Tourism Marketing

    (2017)
  • A. Brochado et al.

    Toward a better understanding of backpackers’ motivations

    Review of Applied Management Studies

    (2013)
  • A. Brochado et al.

    Exploring backpackers’ perceptions of the hostel service quality

    International Journal of Contemporary Hospitality Management

    (2015)
  • J.D. Burkholder et al.

    Health and well-being factors associated with international business travel

    Journal of Travel Medicine

    (2010)
  • E. Cambria et al.

    SenticNet 3: A common and common-sense knowledge base for cognition-driven sentiment analysis

    The 28th AAAI Conference on Artificial Intelligence. AAAI

    (2014)
  • D. Chen et al.

    A fast and accurate dependency parser using neural networks

    The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL

    (2014)
  • Cited by (22)

    • Live streaming selling strategies of online retailers with spillover effects

      2024, Electronic Commerce Research and Applications
    View all citing articles on Scopus
    View full text