Keywords

1 Introduction

Personalisation is becoming ever more important in the delivery of timely, contextually aware information, with websites capable of recomposing and adapting content on-the-fly for the user. However, there is growing concern by both user’s and legislators over privacy on the web. This concern is particularly seen in the personalisation domain, which inherently requires collection of user’s personal information [3]. The EU recently adopted the General Data Protection Regulation, aiming to give control to individuals over their personal data, a similar act, California’s Consumer Privacy Act (CCPA), will go into effect in 2020 [4] and there are numerous other regulations under consideration in the US, China, Brazil and many other countries [9]. Regardless, companies continue to employ personalisation techniques as these have consistently shown to increase user engagement and increase economic returns [14]. For users, personalising webpages to their individual needs makes browsing more convenient, efficient and relevant. Thus there is a conflict between privacy and personalisation; The more information there is about a user, the better the system can adapt to the user’s needs but the less the privacy of the user’s personal data is protected. This situation has been referred to as the personalisation-privacy paradox [10]. While privacy has always been an issue in personalised websites, only recently have we seen a notable change in consumer’s behaviour. The personal information harvested, stored and shared by content providers [14] has become more apparent through regulations and users are seeing frequent consumer data breaches. As users realise how exposed they are to personal information leakage, they are increasingly adjusting privacy controls, thus negatively impacting the effectiveness of personalisation services. As personalised content is essential to the success of online providers, securing customer privacy and therefore trust is necessary for the future of personalisation.

Privacy-conscious frameworks such as Client-Side Personalisation (CSP), attempt to shift the data storage approach, storing user data with trusted third parties or on the client’s own device [5, 18]. This keeps the user data and user model under the control of the client, allowing users to enjoy personalised content without compromising the privacy of their personal data. Each approach successfully reduces the leakage of the user’s personal information, gaining some privacy on their behalf. Pure client-side solutions not only store user data on the client but also perform the personalisation of webpages at the client-side. This creates intellectual property issues, as content providers are unwilling to deliver personalisation or user modelling algorithms to the client device. Distributed architectures propose tackling this by utilising trusted external services to perform operations, while maintaining control and storage of the data on the client. However, both solutions have significant problems with scalability and performance. As client-devices are resource limited, they struggle to handle the client-side processing requirements for coordinating personalisation or service interaction. With an ever-increasing demand for rich multimedia, particularly on more lightweight mobile devices, improving performance is critical to provide a seamless user experience.

This study explores a responsive, lightweight, privacy-conscious personalisation approach, termed Intelligent Client-Centric Personalisation (ICCP). This aims to enhance the performance of current distributed approaches through predictive precaching. The framework is microservice based, employing trusted third-party services for personalisation and user modelling while maintaining data storage on the client only. Maintaining user privacy in an ICCP framework requires a series of principles to be followed; user data must be stored on the client-side, it can only be shared explicitly with trusted service providers and no external service may store any user data. This enables privacy while also allowing for service scalability and protection of content provider’s intellectual property. To combat the increased network activity and overhead on the client in interacting with these services precaching techniques are used to preload webpages on the client device before they are requested.

This paper presents the design of an ICCP framework, along with an initial evaluation aiming to examine it’s performance compared to a traditional server-side approach. It is established that performance can be improved through predictive prefetching.

2 Related Work

2.1 Privacy Focused Frameworks

A range of techniques have been researched aiming to provide personalisation without unduly compromising user’s privacy. These can be grouped into three broad categories; Architectural, Algorithmic and User-centric [12]. Architectural approaches look at Software architectures, platforms, and standards designed to minimize personal data leakage [7, 13]. Architectural solutions to user privacy vary in three broad aspects; where the user data is stored, where the data analysis is performed and where the personalisation occurs. Client-side personalization (CSP) originally proposed that all of these process occurred on the user’s own device i.e. the client [13]. This results in very little, if any, personal data stored on the content server. However, this approach puts the processing burden on the client which, particularly on mobile phones, may significantly impact performance [11]. It also raises proprietary concerns as the personalisation logic must be delivered to the client device in order to perform the personalisation on the client. This code often includes confidential algorithms and is at risk of exposure through reverse engineering. As these concerns have grown a second branch of CSP has emerged in which trusted third party software is used to create a distributed approach [18]. The user’s data must still be stored on the client device, however, the data analysis and personalisation may occur remotely. This distributed approach requires user information to be transmitted to a remote service as such, additional measures must be in place to maintain privacy. Employing an associated security model has been explored with the PersonisJ architecture [11]. This proposes an access mechanism which mediates interactions and transmits only necessary data; for example, allowing a playlist application access to a persons favourite genre but not the full catalogue of favourite songs. Another branch of research looks at the use of trusted software [12]. This is software that can make guarantees about data storage policies, linkability, and disclosure. It is proposed that these systems would undergo technical audits and obtain certification by a trusted third party in order to be incorporated into such an architecture [12]. Distributed Client-Side personalisation techniques enable the privacy benefits of the traditional pure-client side approach. However, by using trusted third party microservices, they tackle the issues relating to scalability and Intellectual Property. Given the increased network activity and overhead on the client these Distributed approaches while inherently more efficient than CSP frameworks, still suffer performance issues when compared with the traditional server side model. The research into Distributed CSP to date, has been focused on the development and regulation of these trusted micro-services [5, 11, 18]. Few frameworks have been proposed which tackle the orchestration of these services and a full framework has not been evaluated.

2.2 Prefetching

Web caching and prefetching play an important role in improving web performance. Resources that are likely to be visited in the near future are kept closer to the client. This ranges from storage on the server, storage in a proxy to storage on the user’s own device i.e. Client-Side Caching [6]. Traditionally, caching strategies simply chose frequently used or recently used resources to cache. However, even with a cache of infinite size, it has been shown that the hit ratio i.e. the number of requested resources that are cached lies between 40–50% regardless of the caching scheme employed [15, 16]. This is due to the fact that most users frequently request webpages they have not yet visited. To address this fact and improve the hit ratio, content providers are attempting to predict in advance what a user might be interested in visiting i.e. web prefetching. Many studies have shown that the combination of caching and prefetching doubles the performance compared to single caching [2]. According to [1] a combination of web caching and prefetching can potentially improve latency up to 60%, whereas web caching alone improves the latency up to 26%. However, if a prefetching scheme is deployed and the user ends up requesting very few of these prefetched resources, the scheme can actually slow down performance. Thus, a prefetching approach must be carefully designed to ensure a net benefit effect. In the literature, prefetching strategies are generally separated into two types; content- based and history-based. Content-based prefetching analyses the layout and content on a webpage to predict the likely links the user might click [19] whereas history-based prefetching observes the user’s previous access behaviour. Content-based prefetching is not well suited to a server side implementation as the overhead for parsing every single page served is too great [8]. In recent years, the data mining techniques have been shown to be the most effective for prefetching [17]. In this research, Clickstream data and other fine-grained navigational patterns are analysed to predict the future behaviour of the user.

Fig. 1.
figure 1

ICCP architecture

Fig. 2.
figure 2

Comparative architecture

3 ICCP Framework

Figure 1 shows the architecture of an ICCP framework, involving a client coordinator along with external personalisation and prediction services. The client coordinator is embedded in the user’s browser and is responsible for gathering, storing and managing both user information and web caches as well as orchestrating microservice interactions. The logic for prefetching is contained within the client coordinator, deciding which resources to fetch and when. Once prefetched the contents of a page are stored in the client-side cache.

Prediction Service: For this research, the user modelling is achieved through a prediction service. This predicts the behaviour of a user, more specifically their propensity to click on certain links on a webpage. The output from this Propensity Prediction Service (PPS) is used both to inform the cache prefetching strategy and the page’s personalisation. The client coordinator passes the user model to the PPS for prediction which then returns an updated model without storing any user information.

Personalisation Service: The personalisation microservice uses propensity as an influencing factor in its personalisation decisions. Thus, the context of the real-time page interactions must impact the result of the personalisation service. e.g. The user’s interaction behaviour, like moving the mouse, must be one of the factors determining how the next page will be personalised. Content layout is therefore personalised, this involves rearranging the layout of the page to meet the user’s preferences. For example, If a user always scrolls past the wall of text in an article to get to the video at the end the video might be moved up to the top on future pages.

In order to implement a privacy conscious distributed service no user data or profiles can be stored. Therefore the following requirements must be applied to the server-based prediction and personalisation services: User data must be processed as a stream; User profiles must be updated incrementally; and User data and profiles must be discarded immediately after use.

Privacy is maintained by ensuring no user data persists on the server-side. Instead the client coordinator passes the relevant parts of the user model to the services which is then returned for storage on the client device. As the control of user data lies with the client, access controls could also be utilised to restrict the information available to the distributed services.

The content server provides both a template and context for each webpage; the template outlines the barebones structure of the page with placeholders for personalised content and the context then provides the array of options available to fill out those placeholders. The context objects used are then selected through the subsequent external personalisation step.

4 Evaluation

The evaluation aims to investigate the performance of the ICCP, comparing it’s system latency to a traditional server-side approach.

The users for this experiment were gathered through ProlificFootnote 1, a crowd-sourcing platform for research participants. A reverse proxying method was used to allow users to interact with live sites while their interaction data such as mouse movements and scrolling was tracked. Three website case studies were used to reflect diversity in interaction behaviour, these were an e-commerce website, an informational website and a commercial website. A simulation based, comparative evaluation was performed against a typical server-side approach.

In the server-side approach shown in Fig. 2, the personalisation and prediction occur on the content server along with the prefetching logic. While regular resource caching occurs on the client-side, the caching of prefetched and pre-personalised pages remains on the server. For the evaluation the Content Server, PPS and Personalisation Service were deployed on an external server to the client.

4.1 Simulation Evaluation

The simulation consisted of replaying the webpage interactions captured during user trials to mimic the same behaviour on the two architectures. This put each architecture under considerably more strain due to the large quantity of background processes which may trigger prefetching and personalisation refreshes.

Unseen Page: Initially, the response time of a page which the user had not previously seen and which the system had not prefetched was examined. This meant that page fetching and personalisation was performed at request time. The average user latency response times for an ICCP and comparative framework were 93.5 ms and 90.52 ms respectively. As expected the ICCP framework performs slightly more slowly due to the increased network requests. However, the response time falls within a reasonable margin, providing no noticeable difference to the user. The variation in these response times over the 100 requests is illustrated in Fig. 3.

Fig. 3.
figure 3

Unseen page response time

Fig. 4.
figure 4

Prefetched page response time

Prefetched Page: The response time of a page which has been predicted and prefetched by the framework was then examined. The results from repeating this process 100 times are shown in Fig. 4. Here the average response times for the ICCP and comparative framework are 21.15 ms and 75.19 ms. When a page has been prefetched we would expect the speed of the ICCP framework to be better than that of the server-side as prefetched pages are cached on the client and server side respectively. The results align with this expectation, the ICCP performs considerably faster under these conditions.

5 Conclusion

This study proposed Intelligent Client-Centric Personalisation (ICCP) which minimises the leakage of user data while using server-based personalisation and prediction. It was proposed that through the addition of prefetching to a Client-Side Personalisation framework, performance and user latency could be improved. The ICCP could then provide a more privacy-conscious framework than a traditional server-side approach while offering reasonable performance.

The evaluation aimed to investigate the system latency of the ICCP, comparing this to a traditional server-side approach. It was shown that for an unseen page the ICCP framework performs, on average, more slowly; though the response time falls within reasonable bounds. However, for a prefetched page the ICCP framework performs considerably faster than the traditional approach.

Thus, performance benefits can be achieved in Client-Side Personalisation through the incorporation of prefetching techniques. In use cases where the prefetching strategy has high predictive accuracy the average user latency over a session should be lower than the traditional approach. Further research is required to investigate and quantify when the ICCP offers a better solution than the traditional server-side approach.