Elsevier

Expert Systems with Applications

Volume 87, 30 November 2017, Pages 199-208
Expert Systems with Applications

Designing the Content Analyzer of a Travel Recommender System

https://doi.org/10.1016/j.eswa.2017.06.028Get rights and content

Highlights

  • We designed the Content Analyzer for a travel recommender system.

  • Our system is unsupervised, automatic, and relies only on public data sources.

  • The system effectiveness has been tested against a ground truth made by experts.

  • We developed and published on-line a proof-of-concept implementation.

Abstract

Content-based travel recommender systems suggest touristic attractions based on a best match between users’ preferences and a given set of points of interests, called POIs for short. When designing such systems, a critical aspect is to equip them with a rich enough knowledge base that, for each POI, indicates how much the POI is relevant for a set of possible topics of interests, also called TOIs for short. This paper focuses on the problem of designing the Content Analyzer of a content-based travel recommender system. The Content Analyzer is a module that receives as input a set of POIs and a set of TOIs and it computes the relevance of each POI with respect to each TOI. The proposed approach is unsupervised, fully automatic, and it relies on publicly available sources of information. We describe an implementation of the technique in a system called Cicero and present an experimental evaluation of its effectiveness against a ground truth generated by experts.

Introduction

Planning a trip by taking into account personal preferences often becomes a time-consuming and difficult task, given the overwhelming amount of information available on a large variety of digital sources (such as institutional web sites, travel blogs, travel guides, etc.). A travel recommender system guides a tourist through this large space of possible options by matching touristic and leisure attractions (technically called Points of Interest, or POIs for short) with traveler’s interests. As pointed out in an early work of Staab et al. (2002) and in a more recent survey of Borrás, Moreno, and Valls (2014), the most common travel recommender systems use a content-based approach, in which the user expresses her needs by associating values to a set of attributes and the system matches these needs with a given set of available POIs. As depicted in Fig. 1, the architecture of a content-based travel recommender system consists of three main modules: a Content Analyzer, a Profiler, and a Matching Module. The Content Analyzer gathers information from various sources and it computes a suitable description of the POIs that is stored in a Knowledge Base. The Profiler constructs users’ profiles by collecting and analyzing data representative of their interests. The Matching Module outputs a travel recommendation, i.e., a ranked list of those POIs that are most suitable for the users’ profiles. A limited list of content-based travel recommender systems includes, for example, (Brilhante, Macedo, Nardini, Perego, Renso, 2015, Cenamor, de la Rosa, Núñez, Borrajo, 2017, Győrödi, Győrödi, Dersidan, 2013, Huang, Bian, 2009, Lee, Chang, Wang, 2009, Lim, Chan, Leckie, Karunasekera, 2015, Martínez Santiago, Ariza López, Montejo-Ráez, Ureña López, 2012, Montejo-Ráez, Perea-Ortega, García-Cumbreras, Martínez-Santiago, 2011, Ruotsalo, Haav, Stoyanov, Roche, Fani, Deliai, Mäkelä, Kauppinen, Hyvönen, 2013, Vansteenwegen, Souffriau, Berghe, Oudheusden, 2011).

We focus on the widely adopted keyword-based vector space model to represent users’ profiles and POIs’ descriptions (de Gemmis, Lops, Musto, Narducci, & Semeraro, 2015). According to this model, the POIs’ description and the users’ profiles are represented as two vectors in an n-dimensional space, where each dimension corresponds to a Topic of Interest, or TOI for short. A TOI is a keyword, that describes a subject on which a traveler can find interesting attractions in the region; examples of TOIs could be History, Religion, Art, and so on. The travel recommendation is then computed by means of a suitable matching technique between these two vectors. For example, suppose that the profile of a tourist in Rome defined on the TOIs History, Religion, and Art has higher scores on History, then the Matching Module will return a list of POIs where the Colosseum and the Roman Forum are ranked higher than the National Gallery of Modern and Contemporary Art.

A critical aspect in the design of an effective content-based travel recommender system is to create a rich enough Knowledge Base with a suitable description of a large set of POIs. The description of a POI in the Knowledge Base should encode how much this POI is relevant for the set of TOIs presented to the user. Obtaining such a description for each POI often relies on a manual classification which is an expensive task (see, e.g., Batet, Moreno, Sánchez, Isern, and Valls (2012); Gavalas et al. (2015); Gavalas and Kenteris (2011); Lim et al. (2015); Lucas et al. (2013); Martínez Santiago et al. (2012); Meehan, Lunney, Curran, and McCaughey (2013); Savir, Brafman, and Shani (2013); Umanets, Ferreira, and Leite (2014); Vansteenwegen et al. (2011)). Also, if the list of TOIs changes, a different description for the POIs may be needed. Thus the maintenance of these systems is an expensive task and it may become particularly critical in those contexts where territorial policy makers want to offer their tourists TOIs that vary based on available seasonal attractions and events. For example, suppose that in a certain period there is a special event in a region, such as a music festival; a regional policy maker can decide to introduce the new TOI Music in the system so that tourists who visit the region because of the festival will find POIs that are particularly interesting for them.

This paper focuses on the design of the Content Analyzer for a travel recommender system. Motivated in part by a research grant of the Umbria Region of Italy1 (see also  Binucci, Didimo, Liotta, Montecchiani, & Sartore (2013)), we study the following problem: Given a geographic region R and a set of TOIs (defined by a policy maker who wants to promote the touristic attractions of a territory), compute the relevance of the POIs in R with respect to the given TOIs. We propose a graph-based algorithmic method specifically tailored for the context of a travel recommender system and we embed such method in a content analyzer called Cicero2. The main features of Cicero are as follows.

  • Cicero receives as input a region R and a set of TOIs T={t1,t2,,tn}; it retrieves a set of POIs P={p1,p2,,pm} in region R and it computes as output a description for each POI. The description of a POI piP is an array whose j-th element is a numeric value in the range [0, 1] that represents the relevance of pi with respect to TOI tj.

  • The computation of the descriptions of the POIs is unsupervised and automatic. Since it is unsupervised, it does not require a training set of reliable data, which can be difficult to obtain. The fact that it is automatic allows us to handle large geographic regions with thousands of POIs, for which a manual assignment would be impractical or too expensive.

  • Cicero relies on publicly accessible Web resources and does not use an ontology. Ontologies are often used to represent (and reason about) the tourism domain knowledge (see, e.g., Castillo et al. (2008); Lee et al. (2009); Moreno, Valls, Isern, Marin, and Borrás (2013); Ruotsalo et al. (2013); Wang, Zeng, and Tang (2011)). However, ontologies for travel recommender systems are typically designed ad-hoc and built manually, which can be a time-consuming and labor-intensive task. Instead, the proposed technique computes a description of each POI by executing a shortest path algorithm on a suitable concept network that is extracted by automatically crawling Wikipedia3 and OpenStreetMap4.

The rest of this paper is organized as follows. A critical discussion about the main differences and similarities between the approach of Cicero and existing literature in the areas of travel recommender systems and of semantic technologies can be found in Section 2. In Section 3 we present a reference architecture upon which we designed and implemented our system. In Section 4 we discuss the main principles and methods behind Cicero. We describe how public encyclopedic sources can be used to construct a network of concepts that is then exploited to compute the descriptions of the POIs with respect to the given TOIs. To have an indication about the effectiveness of the proposed techniques, in Section 5 we:

  • 1.

    Use Cicero to create a concept network for two Italian touristic cities, namely Rome and Perugia; the network is computed with respect to three popular TOIs that are particularly relevant for the two chosen cities, namely Art, History, and Religion.

  • 2.

    Evaluate some structural properties of the created network that affect the effectiveness and the efficiency of our approach.

  • 3.

    Use the network to create a Knowledge Base and compare the results against a ground truth defined by professional touristic guides and experts of the two cities. While for their sizes and number of available POIs the two chosen cities have different characteristics, the experimental analysis suggests that the algorithmic technique behind Cicero can be effective in both scenarios.

  • 4.

    Realized a proof-of-concept implementation of a content-based recommender system that uses the computed Knowledge Base.

Finally, conclusions and future research directions are discussed in Section 6.

Section snippets

Related Work

The research in this paper naturally relates with both the literature about (travel) recommender systems and the literature about semantic relatedness analysis. In this section we briefly recall some of the most relevant references of these research areas and spend a few more words about those contributions that adopt an approach similar to ours.

Several travel recommender systems have been described in the literature; as already pointed out in Section 1, the majority of them (see, e.g., 

A Reference Architecture

In this section we discuss the main ideas behind the design of our Content Analyzer. In order to help the reader, we report in Table 1 the main symbols and notations used, which will be defined and explained in the following. The reference architecture upon which we designed and implemented Cicero is given in Fig. 2. The architecture is composed of two main modules described in the following.

Graphic User Interface (GUI). The GUI enables the end-user to select a region and define a set of TOIs.

The Algorithmic Pipeline of Cicero

The Algorithmic Engine of Cicero implements the pipeline shown in Fig. 4. This pipeline consists of the following five steps:

  • 1.

    The POIs of the geographic region of interest are retrieved from OpenStreetMap together with the tags associated with them. OpenStreetMap is a collaborative project to create a free editable map of the world that allows users to tag geographical elements including touristic attractions.

  • 2.

    A concept network is constructed by crawling the Wikipedia pages (starting from the

Experiments and Implementation

In this section we present experiments and implementations of the techniques of Section 4. We begin by reporting the results of an experiment on the crawling procedure that constructs the concept network. This experiment aims at estimating the number of hops that a crawler needs to perform in order to reach all TOIs (Section 5.1), which is related with both the effectiveness and the efficiency of the step that constructs the concept network. Second, we perform an experimental evaluation to have

Conclusions and Future Work

We studied the problem of designing the Content Analyzer of a content-based travel recommender system. We described and implemented a technique to automatically compute a description of the POIs of one or more geographic regions using publicly available sources of information (OpenStreetMap and Wikipedia). We performed an experimental evaluation against a human-created ground truth. Even though these experiments do not constitute the ultimate aim of the present work, they however provide a good

References (37)

  • A.S. Niaraki et al.

    Ontology based personalized route planning system using a multi-criteria decision making approach

    Expert Systems with Applications

    (2009)
  • S.P. Ponzetto et al.

    Knowledge derived from wikipedia for computing semantic relatedness

    J. Artif. Intell. Res. (JAIR)

    (2007)
  • R. Rada et al.

    Development and application of a metric on semantic nets

    IEEE Transactions on Systems, Man, and Cybernetics

    (1989)
  • S. Staab et al.

    Intelligent systems for tourism

    IEEE Intelligent Systems

    (2002)
  • W. Wang et al.

    Bayesian intelligent semantic mashup for tourism

    Concurrency and Computation: Practice and Experience

    (2011)
  • Z. Wu et al.

    Verbs semantics and lexical selection

    Proceedings of the 32nd annual meeting on association for computational linguistics

    (1994)
  • S. Banerjee et al.

    Clustering short texts using wikipedia

    Proceedings of the 30th annual international acm sigir conference on research and development in information retrieval, SIGIR ’07

    (2007)
  • C. Binucci et al.

    TRART: A system to support territorial policies

  • Cited by (31)

    • How smart is e-tourism? A systematic review of smart tourism recommendation system applying data management

      2021, Computer Science Review
      Citation Excerpt :

      For the decision-making process, various factors were considered such as tourists’ preferences as well as emotional, traditional, individual and social factors. The geographical data in the recommender system for the content-based traveller used public data to determine the POIs [64]. In addition, tourists’ food preferences and opinions can be acquired from feedback on online restaurants through text data analysis [112].

    • Weighted Hybrid Recommendation System for Toba Tourism Based on Google Review Data

      2022, ICOSNIKOM 2022 - 2022 IEEE International Conference of Computer Science and Information Technology: Boundary Free: Preparing Indonesia for Metaverse Society
    View all citing articles on Scopus
    View full text