Designing the Content Analyzer of a Travel Recommender System
Introduction
Planning a trip by taking into account personal preferences often becomes a time-consuming and difficult task, given the overwhelming amount of information available on a large variety of digital sources (such as institutional web sites, travel blogs, travel guides, etc.). A travel recommender system guides a tourist through this large space of possible options by matching touristic and leisure attractions (technically called Points of Interest, or POIs for short) with traveler’s interests. As pointed out in an early work of Staab et al. (2002) and in a more recent survey of Borrás, Moreno, and Valls (2014), the most common travel recommender systems use a content-based approach, in which the user expresses her needs by associating values to a set of attributes and the system matches these needs with a given set of available POIs. As depicted in Fig. 1, the architecture of a content-based travel recommender system consists of three main modules: a Content Analyzer, a Profiler, and a Matching Module. The Content Analyzer gathers information from various sources and it computes a suitable description of the POIs that is stored in a Knowledge Base. The Profiler constructs users’ profiles by collecting and analyzing data representative of their interests. The Matching Module outputs a travel recommendation, i.e., a ranked list of those POIs that are most suitable for the users’ profiles. A limited list of content-based travel recommender systems includes, for example, (Brilhante, Macedo, Nardini, Perego, Renso, 2015, Cenamor, de la Rosa, Núñez, Borrajo, 2017, Győrödi, Győrödi, Dersidan, 2013, Huang, Bian, 2009, Lee, Chang, Wang, 2009, Lim, Chan, Leckie, Karunasekera, 2015, Martínez Santiago, Ariza López, Montejo-Ráez, Ureña López, 2012, Montejo-Ráez, Perea-Ortega, García-Cumbreras, Martínez-Santiago, 2011, Ruotsalo, Haav, Stoyanov, Roche, Fani, Deliai, Mäkelä, Kauppinen, Hyvönen, 2013, Vansteenwegen, Souffriau, Berghe, Oudheusden, 2011).
We focus on the widely adopted keyword-based vector space model to represent users’ profiles and POIs’ descriptions (de Gemmis, Lops, Musto, Narducci, & Semeraro, 2015). According to this model, the POIs’ description and the users’ profiles are represented as two vectors in an n-dimensional space, where each dimension corresponds to a Topic of Interest, or TOI for short. A TOI is a keyword, that describes a subject on which a traveler can find interesting attractions in the region; examples of TOIs could be History, Religion, Art, and so on. The travel recommendation is then computed by means of a suitable matching technique between these two vectors. For example, suppose that the profile of a tourist in Rome defined on the TOIs History, Religion, and Art has higher scores on History, then the Matching Module will return a list of POIs where the Colosseum and the Roman Forum are ranked higher than the National Gallery of Modern and Contemporary Art.
A critical aspect in the design of an effective content-based travel recommender system is to create a rich enough Knowledge Base with a suitable description of a large set of POIs. The description of a POI in the Knowledge Base should encode how much this POI is relevant for the set of TOIs presented to the user. Obtaining such a description for each POI often relies on a manual classification which is an expensive task (see, e.g., Batet, Moreno, Sánchez, Isern, and Valls (2012); Gavalas et al. (2015); Gavalas and Kenteris (2011); Lim et al. (2015); Lucas et al. (2013); Martínez Santiago et al. (2012); Meehan, Lunney, Curran, and McCaughey (2013); Savir, Brafman, and Shani (2013); Umanets, Ferreira, and Leite (2014); Vansteenwegen et al. (2011)). Also, if the list of TOIs changes, a different description for the POIs may be needed. Thus the maintenance of these systems is an expensive task and it may become particularly critical in those contexts where territorial policy makers want to offer their tourists TOIs that vary based on available seasonal attractions and events. For example, suppose that in a certain period there is a special event in a region, such as a music festival; a regional policy maker can decide to introduce the new TOI Music in the system so that tourists who visit the region because of the festival will find POIs that are particularly interesting for them.
This paper focuses on the design of the Content Analyzer for a travel recommender system. Motivated in part by a research grant of the Umbria Region of Italy1 (see also Binucci, Didimo, Liotta, Montecchiani, & Sartore (2013)), we study the following problem: Given a geographic region and a set of TOIs (defined by a policy maker who wants to promote the touristic attractions of a territory), compute the relevance of the POIs in with respect to the given TOIs. We propose a graph-based algorithmic method specifically tailored for the context of a travel recommender system and we embed such method in a content analyzer called Cicero2. The main features of Cicero are as follows.
- •
Cicero receives as input a region and a set of TOIs ; it retrieves a set of POIs in region and it computes as output a description for each POI. The description of a POI pi ∈ P is an array whose j-th element is a numeric value in the range [0, 1] that represents the relevance of pi with respect to TOI tj.
- •
The computation of the descriptions of the POIs is unsupervised and automatic. Since it is unsupervised, it does not require a training set of reliable data, which can be difficult to obtain. The fact that it is automatic allows us to handle large geographic regions with thousands of POIs, for which a manual assignment would be impractical or too expensive.
- •
Cicero relies on publicly accessible Web resources and does not use an ontology. Ontologies are often used to represent (and reason about) the tourism domain knowledge (see, e.g., Castillo et al. (2008); Lee et al. (2009); Moreno, Valls, Isern, Marin, and Borrás (2013); Ruotsalo et al. (2013); Wang, Zeng, and Tang (2011)). However, ontologies for travel recommender systems are typically designed ad-hoc and built manually, which can be a time-consuming and labor-intensive task. Instead, the proposed technique computes a description of each POI by executing a shortest path algorithm on a suitable concept network that is extracted by automatically crawling Wikipedia3 and OpenStreetMap4.
The rest of this paper is organized as follows. A critical discussion about the main differences and similarities between the approach of Cicero and existing literature in the areas of travel recommender systems and of semantic technologies can be found in Section 2. In Section 3 we present a reference architecture upon which we designed and implemented our system. In Section 4 we discuss the main principles and methods behind Cicero. We describe how public encyclopedic sources can be used to construct a network of concepts that is then exploited to compute the descriptions of the POIs with respect to the given TOIs. To have an indication about the effectiveness of the proposed techniques, in Section 5 we:
- 1.
Use Cicero to create a concept network for two Italian touristic cities, namely Rome and Perugia; the network is computed with respect to three popular TOIs that are particularly relevant for the two chosen cities, namely Art, History, and Religion.
- 2.
Evaluate some structural properties of the created network that affect the effectiveness and the efficiency of our approach.
- 3.
Use the network to create a Knowledge Base and compare the results against a ground truth defined by professional touristic guides and experts of the two cities. While for their sizes and number of available POIs the two chosen cities have different characteristics, the experimental analysis suggests that the algorithmic technique behind Cicero can be effective in both scenarios.
- 4.
Realized a proof-of-concept implementation of a content-based recommender system that uses the computed Knowledge Base.
Finally, conclusions and future research directions are discussed in Section 6.
Section snippets
Related Work
The research in this paper naturally relates with both the literature about (travel) recommender systems and the literature about semantic relatedness analysis. In this section we briefly recall some of the most relevant references of these research areas and spend a few more words about those contributions that adopt an approach similar to ours.
Several travel recommender systems have been described in the literature; as already pointed out in Section 1, the majority of them (see, e.g.,
A Reference Architecture
In this section we discuss the main ideas behind the design of our Content Analyzer. In order to help the reader, we report in Table 1 the main symbols and notations used, which will be defined and explained in the following. The reference architecture upon which we designed and implemented Cicero is given in Fig. 2. The architecture is composed of two main modules described in the following.
Graphic User Interface (GUI). The GUI enables the end-user to select a region and define a set of TOIs.
The Algorithmic Pipeline of Cicero
The Algorithmic Engine of Cicero implements the pipeline shown in Fig. 4. This pipeline consists of the following five steps:
- 1.
The POIs of the geographic region of interest are retrieved from OpenStreetMap together with the tags associated with them. OpenStreetMap is a collaborative project to create a free editable map of the world that allows users to tag geographical elements including touristic attractions.
- 2.
A concept network is constructed by crawling the Wikipedia pages (starting from the
Experiments and Implementation
In this section we present experiments and implementations of the techniques of Section 4. We begin by reporting the results of an experiment on the crawling procedure that constructs the concept network. This experiment aims at estimating the number of hops that a crawler needs to perform in order to reach all TOIs (Section 5.1), which is related with both the effectiveness and the efficiency of the step that constructs the concept network. Second, we perform an experimental evaluation to have
Conclusions and Future Work
We studied the problem of designing the Content Analyzer of a content-based travel recommender system. We described and implemented a technique to automatically compute a description of the POIs of one or more geographic regions using publicly available sources of information (OpenStreetMap and Wikipedia). We performed an experimental evaluation against a human-created ground truth. Even though these experiments do not constitute the ultimate aim of the present work, they however provide a good
References (37)
- et al.
Turist@: Agent-based personalised recommendation of tourist activities
Expert Systems with Applications
(2012) - et al.
Context-aware points of interest suggestion with dynamic weather data management
- et al.
SAMAP: An user-oriented adaptive system for planning tourist visits
Expert Systems with Applications
(2008) - et al.
Planning for tourism routes using social networks
Expert Syst. Appl.
(2017) - et al.
Fast layout computation of clustered networks: Algorithmic advances and experimental analysis
Inf. Sci.
(2014) - et al.
Concept-based information retrieval using explicit semantic analysis
ACM Trans. Inf. Syst.
(2011) - et al.
A web-based pervasive recommendation system for mobile tourist guides
Personal and Ubiquitous Computing
(2011) - et al.
Roget’s thesaurus and semantic similarity
- et al.
A hybrid recommendation approach for a tourism system
Expert Systems with Applications
(2013) - et al.
Context-aware intelligent recommendation system for tourism
Pervasive computing and communications workshops (percom workshops), 2013 ieee international conference on
(2013)
Ontology based personalized route planning system using a multi-criteria decision making approach
Expert Systems with Applications
Knowledge derived from wikipedia for computing semantic relatedness
J. Artif. Intell. Res. (JAIR)
Development and application of a metric on semantic nets
IEEE Transactions on Systems, Man, and Cybernetics
Intelligent systems for tourism
IEEE Intelligent Systems
Bayesian intelligent semantic mashup for tourism
Concurrency and Computation: Practice and Experience
Verbs semantics and lexical selection
Proceedings of the 32nd annual meeting on association for computational linguistics
Clustering short texts using wikipedia
Proceedings of the 30th annual international acm sigir conference on research and development in information retrieval, SIGIR ’07
TRART: A system to support territorial policies
Cited by (31)
How smart is e-tourism? A systematic review of smart tourism recommendation system applying data management
2021, Computer Science ReviewCitation Excerpt :For the decision-making process, various factors were considered such as tourists’ preferences as well as emotional, traditional, individual and social factors. The geographical data in the recommender system for the content-based traveller used public data to determine the POIs [64]. In addition, tourists’ food preferences and opinions can be acquired from feedback on online restaurants through text data analysis [112].
Accommodation Recommendation on Shared Platforms Considering Bidirectional Selection and Review Mechanisms
2024, International Journal on Artificial Intelligence ToolsNormalized category travel personality by considering explicit and implicit feedback (NCTP): approach for improving travel recommender systems search result
2023, International Journal of Information Technology (Singapore)Based on neutrosophic fuzzy environment: a new development of FWZIC and FDOSM for benchmarking smart e-tourism applications
2022, Complex and Intelligent SystemsWeighted Hybrid Recommendation System for Toba Tourism Based on Google Review Data
2022, ICOSNIKOM 2022 - 2022 IEEE International Conference of Computer Science and Information Technology: Boundary Free: Preparing Indonesia for Metaverse Society