NectaRSS, an intelligent RSS feed reader

https://doi.org/10.1016/j.jnca.2007.09.001Get rights and content

Abstract

In this paper a novel article ranking method called NectaRSS is introduced. The system recommends incoming articles, which we will designate as newsitems, to users based on their past choices. User preferences are automatically acquired, avoiding explicit feedback, and ranking is based on those preferences distilled to a user profile. NectaRSS uses the well-known vector space model for user profiles and new documents, and compares them using information retrieval techniques, but introduces a novel method for user profile creation and adaptation from users’ past choices. The efficiency of the proposed method has been tested by embedding it into an intelligent aggregator (RSS feed reader) which has been used by different and heterogeneous users. Besides, this paper proves that the ranking of newsitems yielded by NectaRSS improves its quality with user's choices, and its superiority over other algorithms that use a different information representation method.

Introduction

A blog or weblog is a website with entries (usually called posts) made in journal style and displayed in a reverse chronological order. Weblogs often provide commentaries or opinions on a particular subject, such as gadgets, politics, or local news; some of them work as more personal online diaries. A typical weblog combines text, images, and links to other weblogs, web pages, and other media related to its topic.

One of the advantages of weblogs, and possibly a factor in their success, is that any new post is automatically published in several formats. HTML (Hypertext Markup Language) is the default, but most if not all weblog publishing systems generate other formats too. These formats strip all non-essential information (such as navigation, ads or simply format marks) from the posts, leaving just the newsitem (title and content) and related metadata (such as author and date of publication). One of these formats, based on XML (eXtended Markup Language) is RSS1. RSS is read through programs called feed readers or aggregators, thus the user subscribes to a feed by supplying to their reader a link to the feed; the reader can then check the user's subscribed feeds to see if any of those feeds have new contents since the last time it checked, and if so, retrieves that content and presents it to the user.

The blogosphere offers millions of weblogs on different topics and in different languages; besides, RSS and other similar formats, such as Atom, are increasingly popular, and most web-based publications (such as mainstream media sites, and even website updates from sites such as arXiv2) offer it. Daily browsing of even a small percentage of these weblogs can be very tedious and unapproachable in practice. RSS feed aggregators, which read RSS feeds chosen by the user to a desktop program or to a website, avoid website-to-website browsing, but even so, the task of selecting what to read from a few dozen feeds usually exceeds practical limits. Users often get tired of checking information before reaching whatever they are interested in.

In this paper, we propose the NectaRSS system (Samper, 2005), for filtering information gathered from the web by scoring it according to the user's implicit preferences, that is, preferences obtained with the only effort of clicking in whatever newsitem he/she is going to actually read. The system incrementally builds user profiles based on the content (heading or extended content) of these choices.

These techniques will be applied in a novel way to an aggregator of contents to endow it with a certain degree of “intelligence”, by ordering the information recovered according to the user profile. Experiments have shown that the results of NectaRSS largely improve those obtained offering the information sorted at random and also using a simple binary algorithm which selects as relevant documents those containing the query terms.

The rest of the paper is organized as follows: in Section 2, we review the state of the art on personalized information access systems. In Section 3, we propose novel approaches to providing relevant information that satisfies each user's information need by capturing changes in the user's preferences without the user's effort. In Section 4, we present the experimental results for evaluating our proposed approaches. Finally, we conclude the paper with a summary and directions for future work in Section 5.

Section snippets

State of the art

Recommendation systems have quickly evolved within interactive web environments. Along this line, Schafer et al. (2001) establish a taxonomy of recommendation systems attending to three categories of features: income and exit functionalities, recommendation methods and design dependent aspects. Middleton et al. (2001) present the recommendation system Quickstep to find scientific and research papers. The user's preferences are acquired by monitoring his/her behavior when navigating on the web,

NectaRSS

The system that we propose, called NectaRSS,5 is designed to rank newly arrived information according to an automatically elaborated user profile. We will restrict our system to information that appears periodically and whose structure is similar to a news story. Thus, the pieces of information the system retrieves will be generically referred to as newsitems, each of which will be composed by a headline, a hyperlink to its content and optionally a

Experiments and results

In order to obtain reliable results and determine the validity of our proposal, several sessions were carried out with different real users. Each user was offered a headline list, ordered by RSV, from which he selected whatever headlines he found interesting. The number of headlines offered, in this case 14, allowed the user to see all of them at the same time, without the need of vertical page displacements (which would introduce a bias in the selection). Fifteen users with heterogeneous

Conclusions and discussion

NectaRSS has demonstrated to be useful in the personalization of intelligent retrieval systems, providing them with flexibility and some kind of intelligence. Considering the experimental results obtained in Section 4, we can assert that the newsitems scoring achieved applying the user profile computed via the NectaRSS algorithm is significantly useful. The user is shown more interesting documents or, at least, more documents related to his/her preferences. These advantages of the proposed

Acknowledgments

This paper has been partially funded by projects TIC2003-09481-C04-04 and TIN2007-68083-C02-01 awarded by the Spanish Ministry of Science and Technology; P06-TIC-02025, awarded by the regional Science and Technology Council, and by Resolution 8-7-2004 of General Management of Educative Innovation and Professorship Formation of the Science Department of the Regional Government.

References (13)

  • Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th international...
  • R. Baeza-Yates et al.

    Modern information retrieval

    (1999)
  • U. Hanani et al.

    Information filtering: overview of issues, research and systems

    User Modelling User-Adapted Interact

    (2001)
  • Merelo JJ, Carpio J, Tricas F, Ferreres G, Prieto B, Castillo PA. Weblog recommendation using association rules. In:...
  • Middleton S, De Roure D, Shadbolt N. Capturing knowledge of user preferences: ontologies in recommender systems. In:...
  • Mizzaro S, Tasso C. Ephemeral and persistent personalization in adaptive information access to scholarly publications...
There are more references available in the full text version of this article.

Cited by (0)

View full text