B-hist: Entity-centric search over personal web browsing history
Introduction
Searching over one’s own personal browsing history is useful to locate information that was previously seen and that is once again needed. Often, when trying to remember some previously looked-up information, people rather search the web instead of searching over locally stored files or over their web browsing history. This is mostly due to the fact that search tools for the web are more effective than those available for desktop or browsing history search.
Web Search today exploits semantic data in order to improve search results and provide more information satisfying the user’s query intent: Search Engine Result Pages (SERPs) are enriched with structured content including pictures, maps, and factual data—in addition to the standard links pointing to web pages [1], [2]. This is possible thanks to structured knowledge bases and Linked Open Data (LOD) datasets such as Freebase [3] and thanks to semantic annotations of web pages using, for instance, schema.org. Search over personal web browsing logs is a related task, though it has in our opinion not yet received the full benefits of semantic techniques. While studies have shown that re-finding information online is more effective when supported by search tools [4], most browsers provide a very limited keyword-based search over previously visited pages, which has not changed much in the last 20 years.
In this paper, we present a new system that lets users search over their personal web browsing history in an entity-centric fashion. The goal of our system, called B-hist (standing for ‘Better history’), is to bring entity-centric access to personal browsing activities thanks to semantic technologies such as the ones we developed in our recent pieces of work [5], [6] for entity disambiguation and entity-type selection.
Semantic technologies leveraged by B-hist include the use of structured metadata about entities extracted from web pages which have been visited by the users: our system leverages entities from DBpedia and creates links from web pages to entities. Moreover, B-hist leverages a broad entity-type hierarchy build on top of DBpedia, YAGO, and schema.org types.
By mining entities in web pages and leveraging their types to cluster pages in meaningful groups, we allow users to access their web history from multiple entry points: they can type queries which get auto-completed with the entities mentioned in their history. They also can filter results based on the time dimension thanks to a heat map calendar showing browsing activity over time, and by clicking on entity types or on clusters of coherent web browsing sessions.
The rest of the paper is structured as follows: Section 2 briefly summarizes work from related areas and existing software aiming at enhancing the web history search experience. Section 3 presents the different components of B-hist. Section 4 describes the results of an online survey on web browsing and history search based on more than 200 participants. It also offers the results of our evaluation of different approaches for clustering web pages. Finally, Section 5 concludes the paper and highlights the main novelties of our system.
Section snippets
Web search
Web Search has been studied extensively over the last 15 years. Early work in the area includes the study of different types of web search information needs: informational, navigational, and transactional [7]. Notable focuses in this domain have been put on both improving efficiency of search engines by means of mining user activities [8] and on proposing effective information retrieval models [9].
More recently, Semantic Web technologies have been used to enhance the experience of web search
System description
Our system provides a multi-dimensional access to one’s personal web history by letting users select the desired pieces of information by means of several filters: temporal, entity-centric, and session-based. In the following, we describe the main components of B-hist and its data processing back-end architecture.
Experimental validation
In this section, we present the result of an online survey conducted to support design choices. Specifically, we asked more than 200 web users which functionalities they would appreciate in a tool like B-hist. We also present an experimental comparison of different clustering techniques for web browsing sessions.
Conclusions
In this paper, we described B-hist: The first system offering semantic functionalities over web history search. The main contribution of this work is the enrichment of classic web history search by means of entity-centric search, entity-type and time range filters, and clustering based on web browsing sessions. We have shown how modern technologies that have been successfully applied to web search can be ported to enrich the web browsing history search experience. Our results show that users
Acknowledgments
This work was supported by the Swiss National Science Foundation under Grant number PP00P2_128459, and by the Haslerstiftung in the context of the Smart World 11005 (Mem0r1es) project. We also thank for their help and feedback Martin Grund, Eugenia Martin, Ruslan Mavlyutov, and Vincent Pasquier.
References (21)
- P. Mika, Microsearch: An interface for semantic search, in: SemSearch, 2008, pp. 79–88. URL...
- et al.
Semantic search
- et al.
Freebase: a collaboratively created graph database for structuring human knowledge
- et al.
Keeping and re-finding information on the web: What do people do and what do they need?
Proc. Amer. Soc. Inf. Sci. Technol.
(2004) - et al.
Combining inverted indices and structured search for ad-hoc object retrieval
- A. Tonon, M. Catasta, G. Demartini, P. Cudré-Mauroux, K. Aberer, TRank: ranking entity types using the web of data, in:...
A taxonomy of web search
Mining query logs: turning search usage data into knowledge
Found. Trends Inf. Retr.
(2010)- et al.
A language modeling approach to information retrieval
- et al.
A characterization of online browsing behavior
Cited by (1)
Big Data Semantics
2018, Journal on Data Semantics