Elsevier

Information Systems

Volume 38, Issue 4, June 2013, Pages 524-544
Information Systems

A data-mining approach to preference-based data ranking founded on contextual information

https://doi.org/10.1016/j.is.2012.12.002Get rights and content

Abstract

The term information overload was already used back in the 1970s by Alvin Toffler in his book Future Shock, and refers to the difficulty to understand and make decisions when too much information is available. In the era of Big Data, this problem becomes much more dramatic, since users may be literally overwhelmed by the cataract of data accessible in the most varied forms. With context-aware data tailoring, given a target application, in each specific context the system allows the user to access only the view which is relevant for that application in that context. Moreover, the relative importance of information to the same user in a different context or, reciprocally, to a different user in the same context, may vary enormously; for this reason, contextual preferences can be used to further refine the views associated with contexts, by imposing a ranking on the data of each context-aware view. In this paper, we propose a methodology and a system, PREMINE (PREference MINEr), where data mining is adopted to infer contextual preferences from the past interaction of the user with contextual views over a relational database, gathering knowledge in terms of association rules between each context and the relevant data.

Highlights

► The paper proposes a methodology to mine contextual preferences on tuples and attributes of a relational database. ► Preferences are used to personalize context-aware views over a database. ► Preferences are mined extracting association rules from log data requiring nouser intervention. ► Test data is collected by making real users interact with a prototype of our system. ► Our approach shows better recall with respect to other methodologies of the literature.

Introduction

The current ecosystem of available digital information represents an unprecedented opportunity for the users, but at the same time risks to overwhelm them during decision-making [1]. The effect of this problem is amplified for users who access data by means of mobile devices, which are equipped with limited resources and connectivity and thus impose that only the most valuable information should be kept on board. Imagine you want to keep on your smartphone some information for on-line trading but also to support your shopping activity and your travels: some of the personal data you need for these operations resides on your device, but keeping what is necessary for all three operations on the smartphone all the time is not really sensible. Instead, eliminating, at any time, the redundant information will speed-up your work both in terms of device efficiency and of the effectiveness that you can achieve by working in the absence of information noise.

However, distinguishing useful data from all the information which is irrelevant to the specific application or user is not a trivial task, since the same piece of information can be considered differently, even by the same user, in different situation or places—in a single word, in a different context.

This emergent problem has been tackled in the literature by introducing context models (see [2], [3], [4] for surveys) allowing the personalization of data repositories on the basis of a set of perspectives, or dimensions, such as the user's role and location, the time, his or her interests and the situations he or she is involved in [5]. However, data personalization based on context may be only a partial solution, since the tailoring of the available dataset may still be too coarse-grained. For example, if we consider a movie dataset and Bob – a young teenager who is interested in movies – a contextual system will suggest the movies played in cinemas close to Bob's location and appropriate for people of his age, but will not be able to propose any ranking or further filtering of this contextual data according to Bob's personal tastes: for example, Bob might like watching comedies when alone and thrillers when with his friends.

Therefore, to attain more effective personalization, this work couples the notion of context with the user personal preferences: this allows to rank the information delivered to Bob differently in each different context (alone or with friends).

The approaches already proposed for personalizing relational data (tuples or attributes) on the basis of contextual preferences [6], [7], [8] rely on the collaboration of the users for preference indication. However, with a large variety of data and a considerable number of possible contexts, the manual specification of an extensive list of preferences may be a trying experience which discourages the user. A way around this problem is exploiting other information, implicitly provided by the past querying activity of the user. This activity can be of various kinds, e.g. Bob might formulate queries to visualize the titles of the available comedies, then select “The Muppets” to see further details and subsequently repeat the same operation for other Disney movies in the list, and finally decide to watch one of them. A system analyzing Bob's activity may discover that he is often attracted by Disney comedies.

Given this rationale, this paper's contribution is the PREMINE (PREference MINEr) methodology and the related system, which use data mining algorithms to learn the contextual preferences of the users on both tuples and attributes of relational databases. Our interest towards the relational technology is motivated by the fact that most commercial databases, and also a significant part of the deep web rely on it, therefore handling relational preferences, have long been recognized as an important issue [9].

Contextual preferences are thus used to further personalize the set of data associated with each context (called contextual view) and can be applied with two goals: (1) to minimize the information noise, presenting a list of the data ordered by their relevance for the user with the effect of “recommending” the highest-ranked data, (2) to fulfill the memory requirements imposed by small devices, by loading only the data which have been ranked high according to the user preferences. Our approach starts from the contextual preference model introduced in [7] and adds a sophisticated technique to mine contextual association rules (that is, co-occurrences between each context and the browsed data) from the past interaction of the user with the contextual views over a given relational dataset.

Although there are several degrees of freedom for the personalization, leading to a large set of possible approaches, in this paper we focus on the preference mining part and give a quick account of how the mined preferences are used to produce the personalized contextual view.1 Also, we remark that our proposal, differently from the majority of recommendation systems, does not require any explicit input from the users about their preferences.

The procedure goes as follows: on the user's device runs a client application accessing a contextual view of the global database. This portion of data is initially selected only on the basis of the user current context; the user's querying activity and subsequent browsing in the list of the returned tuples allow the PREMINE server-side application to gain knowledge about the correlations between a context and the properties of the data preferred in that context. Afterwards, when the device connects to the application server, this knowledge is used to further filter and personalize the contextual view.

Note that the proposed approach does not completely exclude the manual specification of preferences; in fact, the two approaches can be used in conjunction: the user can manually add preferences or adjust the mined ones, when they do not reflect any more his or her actual needs. Some encouraging experiments performed with real users interacting with the dataset of a European company of video on demand show the practical impact of our proposal.

Running example: Fig. 1 shows the relational schema of the running example we use throughout the paper (a simplification of the mentioned case study), namely the information system of a company offering services of video on demand and reservation of movie tickets. All the applications composing the information system rely on a central database storing all the managed information. This database is also used for the experimental session at the end of the paper.

Paper structure: The structure of the paper is as follows. Section 2 presents the state of the art, Section 3 introduces some preliminary notions and Section 4 presents the mining framework. 5 Mining, 6 Mining describe our strategies for mining preferences, respectively, on tuples and attributes. Section 7 shows the effectiveness of the approach illustrating the experiments we have performed and, finally, Section 8 draws the conclusions.

Section snippets

State of the Art

The technique of mining contextual association rules has been proposed in [10] with the purpose of analyzing frequent user accesses to available services; however, in that case the authors focus on the mining process without using the discovered association rules for data personalization.

The problem of learning user preferences has been recognized as important in many applications. For example, problems related to ordering data on the basis of preference information explicitly provided by the

Preliminaries

PREMINE is an extension of the context-based personalization framework presented in [7]: thus, before introducing the innovative aspects of our proposal, in this section we quickly describe the background notions our work relies on.

Preference mining framework

Our approach for mining contextual preferences is integrated within the wider framework for contextual-view tailoring and personalization shown in Fig. 3.

Users interact with the information system by means of different kinds of portable devices, like PDAs or smartphones, but also by other systems, such as hot-spot terminals or desktop computers, depending on the application scenario. The users' devices run the client applications accessing the context-relevant data portions, i.e., the

Mining σ-preferences

In this section, we describe how the phases depicted in Fig. 4 are implemented for σ-preferences.

Mining π-preferences

In this section we describe how the steps depicted in Fig. 4 are implemented for π-preferences.

Experiments

PREMINE is implemented in Java,5 and integrated within the personalization methodology described in our previous work [7].

To evaluate our approach, we studied the user experience of a set of candidates with the PREMINE client prototype by collecting their activities. Specifically, we built a client in the movie domain, allowing users to browse a commercial database actually adopted by a European company of

Conclusions

This paper has proposed PREMINE, a methodology – and associated tool – exploiting data mining for the automatic extraction of contextual preferences on relational databases, in order to determine the personalized portion of data that will be provided to the end user at run time, in the current context.

The overall approach has been tested with real users, proving it an effective means for context-aware view personalization for relational databases. As future work, we plan to study how new

Acknowledgments

This research has been partially funded by the European Commission, Programme IDEAS ERC, Project 227977-SMScom and by the Italian project Industria 2015, Program no. MI01 00091 SENSORI. Sincere thanks are due to Paolo Garza, for carefully reading the paper and for his precious suggestions, especially on the experimental section. We also thank Paolo Cremonesi and Roberto Turrin for the dataset they provided.

References (42)

  • J. Hong et al.

    Context-aware systemsa literature review and classification

    Expert Systems with Applications

    (2009)
  • C. Bolchini et al.

    CARVEcontext-aware automatic view definition over relational databases

    Information Systems

    (2013)
  • D. Agrawal, P. Bernstein, E. Bertino, S. Davidson, U. Dayal, M. Franklin, J. Gehrke, L. Haas, A. Halevy, J. Han, H.V....
  • M. Baldauf et al.

    A survey on context-aware systems

    International Journal of Ad Hoc and Ubiquitous Computing

    (2007)
  • C. Bolchini et al.

    A data-oriented survey of context models

    SIGMOD Record

    (2007)
  • C. Bolchini et al.

    And what can context do for data?

    Communications of the ACM

    (2009)
  • K. Stefanidis, E. Pitoura, P. Vassiliadis, Adding context to preferences, in: Proceedings of the ICDE 2007, 23rd...
  • A. Miele, E. Quintarelli, L. Tanca, A methodology for preference-based personalization of contextual data, in:...
  • P. Ciaccia, R. Torlone, Modeling the propagation of user preferences, in: Proceedings of the ER 2011, 30th...
  • G. Koutrika et al.

    Personalizing queries based on networks of composite preferences

    ACM Transactions on Database Systems

    (2010)
  • E. Baralis, L. Cagliero, T. Cerquitelli, P. Garza, M. Marchetti, Context-aware user and service profiling by means of...
  • B. Jiang, J. Pei, X. Lin, D.W. Cheung, J. Han, Mining preferences from superior and inferior examples, in: Proceedings...
  • R.C.-W. Wong, J. Pei, A.W.-C. Fu, K. Wang, Mining favorable facets, in: Proceedings of the KDD 2007, 13th International...
  • B. Mobasher et al.

    Automatic personalization based on web usage mining

    Communications of the ACM

    (2000)
  • T. Joachims, Optimizing search engines using clickthrough data, in: Proceedings of the KDD 2002, Eighth International...
  • J. Chomicki

    Preference formulas in relational queries

    ACM Transactions on Database Systems

    (2003)
  • P. Ciaccia, Processing preference queries in standard database systems, in: Proceedings of the ADVIS 2006, Fourth...
  • K. Stefanidis et al.

    A survey on representation composition, and application of preferences in database systems

    ACM Transactions on Database Systems

    (2011)
  • R. Gheorghiu, A. Labrinidis, P.K. Chrysanthis, Database preferences—a unified model, in: PersDB 2012, Sixth...
  • S. Holland, M. Ester, W. Kießling, Preference mining: a novel approach on mining user preferences for personalized...
  • S. Jung et al.

    A statistical model for user preference

    IEEE Transactions on Knowledge and Data Engineering

    (2005)
  • View full text