A web personalizing technique using adaptive data structures: The case of bursts in web visits

https://doi.org/10.1016/j.jss.2010.06.026Get rights and content

Abstract

The explosive growth in the size and use of the World Wide Web continuously creates new great challenges and needs. The need for predicting the users’ preferences in order to expedite and improve the browsing though a site can be achieved through personalizing of the Websites. Recommendation and personalization algorithms aim at suggesting WebPages to users based on their current visit and past users’ navigational patterns. The problem that we address is the case where few WebPages become very popular for short periods of time and are accessed very frequently in a limited temporal space. Our aim is to deal with these bursts of visits and suggest these highly accessed pages to the future users that have common interests. Hence, in this paper, we propose a new web personalization technique, based on advanced data structures.

The data structures that are used are the Splay tree (1) and Binary heaps (2). We describe the architecture of the technique, analyze the time and space complexity and prove its performance. In addition, we compare both theoretically and experimentally the proposed technique to another approach to verify its efficiency. Our solution achieves O(P2) space complexity and runs in k log P time, where k is the number of pages and P the number of categories of WebPages.

Introduction

The World Wide Web presents the same appearance to every user regardless of the browsing history. However, during the past few years, there have been proposed several personalization techniques. These techniques personalize a user's web experience by joining personal information with global information to effectively tailor what the user sees.

Personalization can be defined as the design, management and delivery of content based on known, observed and predictive information. Personalization techniques match an individual, his/her preferences and WebPage click stream habits with tailored content based on a user profile. In today's world of information overload many similar technologies are used as a way to filter and organize the data most important to them.

Correctly executed, personalization of the visitor's experience makes his time on a site, or in an application, more productive and engaging. Personalization can also be valuable for an organization, a portal or an e-store, because it drives desired business results such as increasing visitor response or promoting customer retention.

In this work we try to enhance the case of burst of visits in the personalizing of Websites. Many aspects of everyday life are described by events (Zhang and Shasha, 2006). An unexpectedly large number of events occurring within a certain time period is called a burst, suggesting unusual actions or processes. Bursts may occur in many everyday situations from economics to natural phenomena, such as trading stocks and falling stars. Depending on the importance of the phenomenon or the process observed, efficiently detecting bursts is critical. By definition, a burst depends on the temporal region we are focusing on, that is the window size.

Bursts occur in a Website's traffic as well and affect its functionality in many aspects as the one it follows. As more and more commercial enterprises go online, it is vital to make their Websites attractive to customers. One way to attract Website traffic is online advertising on search engines. In this case, besides the search results, an ad is placed in the search engine's WebPage. If the visitor clicks the ad, the advertiser has to pay a fee to the search engine. A problem that has arisen with pay-per-click is click fraud. Someone can use an automated script or program to simulate multiple clicks by a browser on an ad. Of course, the number of clicks has to be large enough in order to gain a considerable amount of money. Therefore, a burst of clicks may be an indication of a click fraud.

In this paper, we deal with the case of a burst of visits to a WebPage and how someone can gain knowledge from this fact and aid the personalization of the web. A pattern of visits or accesses is ‘bursty’ when they occur with high intensity over a limited period of time. In particular, in bursty cases, a few WebPages become very popular for short periods of time and are accessed very frequently in a limited temporal space. Such patterns have been also observed in various Internet applications in a number of studies (Zhou et al., 2004). In bursty web search pattern cases, the user attempts to find specific results that belong to limited categories of interest within a short time period. As a consequence an efficient retrieval and storing mechanism is needed to keep the users’ personalized categories and frequent results.

Section snippets

Description of a burst scenario

We have a set of categories of WebPages and a number of random visits are executed to all WebPages by users. We define a set of WebPages to be preferred by the user when these WebPages are the most highly visited by him/her based on the visits recorded for a certain time interval. In particular, we count for each WebPage, how many accesses have taken place since the last time it was visited. If this number is sufficient to denote this WebPage as preferred and the time interval during which the

Details on personalization theory and related work

Before proceeding with the related work on the topic at hand let us present in short details on the personalization theory in order to facilitate the readership. In general, WebPages are personalized based on the interests of an individual. Personalization implies that the changes are based on implicit data, such as items purchased or pages viewed. There are two categories of personalization:

  • 1.

    Rule-based

  • 2.

    Content based

Web personalization models include rules-based filtering, based on “if this, then

Introduction to adaptive data structures for personalization

We propose a personalization technique, which uses advanced adaptive data structures, in order to record every user's visit to a certain WebPage or category of WebPages and based on the information gathered from the visits to a Website, it suggests to the user other possibly interesting or useful links. Our goal is, every time a user clicks on a page that belongs to a certain category A, to which we have observed a burst of visits by this user, to suggest a number of most popular categories,

The proposed solution: personalization for burst visits scenarios

We describe the problem as follows: we have a set of P categories of WebPages and N users. Each WebPage belongs to a certain category and each user's profile is kept in a splay tree. As profile, we define the “logfile” of the WebPages that the user has visited so far. In the splay tree, we store the categories of the WebPages. According to the splay tree's properties, the item that was last visited is brought to the root of the tree. In our case, we modify the tree, so that the most frequently

Solution with arrays

In order to establish the efficiency of our algorithm, we first describe a naïve approach to our problem, which is based on the use of arrays. The reason we choose this kind of approach is that this would be the obvious and simplest solution in case we had to work on a plain RAM model or use the traffic data stored in a database. Our solution is proven to be better in terms of time and space as well. We consider the case of W pages, P categories and N users. For each user's profile we need a

Experiments

As far as the experiments are concerned, in order to test our algorithm we implemented the arrays’ solution and our solution and compared them in terms of time and efficiency. The implementation environment selected is the one of the object oriented programming language Java. The experiments were performed on a Pentium(R) IV 3.00 GHz system with 1 Gb Ram and Microsoft Windows XP SP3 operating system. The platform used for the implementations is NetBeans IDE 6.8. In addition we had to choose a

Conclusions and future work

Recommendation and personalization algorithms aim at suggesting WebPages to users based on their current visit and past users’ navigational patterns. In this paper, we propose a web personalization technique, based on advanced data structures. The main concept of this work is to deal with the case of a burst of visits to a WebPage by designing an algorithm that suggests to the visitors of a certain category of WebPages A, other categories of WebPages that previous visitors of A prefer to visit

Evangelos Sakkopoulos is an adjunct assistant professor at the Computer Engineering and Informatics, University of Patras. He received his PhD at the Computer Engineering and Informatics Department, University of Patras, Greece. He has also received the MSc degree with honors and the diploma of Computer Engineering and Informatics at the same institution. His research interests include Web Services, Mobile Web, Web Engineering, Web Usage Mining, Web Searching, Large Data set Handling, Data

References (25)

  • F. Ergün et al.

    A Dynamic Lookup Scheme for Bursty Access Patterns

    (2001)
  • J. Fong et al.
  • Cited by (0)

    Evangelos Sakkopoulos is an adjunct assistant professor at the Computer Engineering and Informatics, University of Patras. He received his PhD at the Computer Engineering and Informatics Department, University of Patras, Greece. He has also received the MSc degree with honors and the diploma of Computer Engineering and Informatics at the same institution. His research interests include Web Services, Mobile Web, Web Engineering, Web Usage Mining, Web Searching, Large Data set Handling, Data Mining, Web based Education and Intranets. He has more than 60 publications in international journals and conferences at these areas.

    Dimitris Antoniou is a computer engineer and researcher in the Department of Computer Engineering and Informatics at the University of Patras. He has obtained his diploma from the Department in 2004 and his MSc in 2006. Since 2006, Dimitris Antoniou has been a PhD student at the Computer Engineering and Informatics Department of the University of Patras. His research interests focus on Data Structures, Information Retrieval, String algorithmics, Bioinformatics, Software Quality Assessment, Web Technologies and finally GIS. He has scientific work published in international journals and conferences.

    Adamopoulou Poulia is a member of the Graphics Multimedia and GIS Laboratory of the University of Patras since 2003. She has graduated from the Computer Engineering and Informatics Department and she has an MSc in “Computer Science and Engineering”. She is currently a PhD candidate of the same department. She has participated as a software engineer in a number of National and European projects. Her academic interests lie in the area of web engineering and web services.

    Nikos Tsirakis is a computer engineer and researcher in the Department of Computer Engineering and Informatics at the University of Patras. He has obtained his BE from the Department in 2004 and his MSc in 2006. From 2006 Nikos is a PhD student at the University of Patras, Computer Engineering and Informatics Department. His research interests focus on String algorithmics and data structures, Hypertext modeling and searching, Software Quality Assessment, Web Technologies and finally GIS. He has scientific work published in international journals and conferences, while he has also co-authored books & encyclopedia chapters.

    Athanasios K. Tsakalidis is a computer-scientist, professor of the University of Patras. Born on 27.6.1950 in Katerini, Greece. He obtained Diploma of Mathematics from University of Thessaloniki in 1973, Diploma of Informatics in 1980 and Ph.D. in Informatics in 1983, University of Saarland, Germany. Career: 1983–1989, researcher in the University of Saarland. He has been student and cooperator (12 years) of Prof. Kurt Mehlhorn (director of Max-Planck Institute of Informatics in Germany). 1989–1993 associate professor and since 1993 professor in the Department of Computer Engineering and Informatics of the University of Patras. 1993–1997 and 2001–today, chairman of the same Department. 1993–today, member of the Board of Directors of the Research Academic Computer Technology Institute (RACTI), 1997–today, coordinator of Research and Development of RACTI, 2004–today vice-director of RACTI. He is one of the contributors to the writing of the “Handbook of Theoretical Computer Science” (Elsevier and MIT-Press 1990). He has published many scientific articles, having an especial contribution to the solution of elementary problems in the area of data structures. Scientific interests: Data Structures, Computational Geometry, Information Retrieval, Computer Graphics, Data Bases, and Bio-Informatics

    View full text