The Impact of SearchTrails on the Quality of Collaborative Search: A Novel Trail-based Visualization Facilitating the Search Process

Sebastian Franken; Ulrich Norbisrath; Wolfgang Prinz

doi:10.1515/icom-2016-0041

Publicly Available Published by Oldenbourg Wissenschaftsverlag April 3, 2017

The Impact of SearchTrails on the Quality of Collaborative Search

A Novel Trail-based Visualization Facilitating the Search Process

Sebastian Franken
Dr. Sebastian Franken is a researcher previously at Fraunhofer FIT, Sankt Augustin, Germany. He studied Computer Science and Business Administration at RWTH Aachen University and graduated in 2010 and 2011. He earned his doctoral degree from RWTH Aachen University with this work on search trails in June 2016.
, Ulrich Norbisrath
Dr. Ulrich Norbisrath is an independent technical consultant, researcher, and teacher. He has worked for universities in the US, Germany, Austria, Estonia, and Kasakhstan and consulted various IT projects of companies in San Francisco bay area, Germany, and Estonia. His doctoral degree is from RWTH Aachen University.
and Wolfgang Prinz
Professor Wolfgang Prinz, PhD, studied informatics at the University of Bonn and received his PhD in computer science from the University of Nottingham. He is vice chair of Fraunhofer FIT in Sankt Augustin, division manager of the Cooperation Systems research department in FIT, and Professor for Cooperation Systems at RWTH Aachen University. His main research interest is in CSCW, web-based collaboration, and the application of AR/VR technologies in the workplace.

From the journal i-com

https://doi.org/10.1515/icom-2016-0041

Abstract

Several collaborative search systems build upon real-time collaboration during search processes. With the software SearchTrails, we present a novel way of capturing and exchanging the search process between collaborators. We achieve this by asynchronously exchanging the newly developed search trails between collaborators and thus overcome the necessity of real-time interaction for search support. In a study with 29 participants, we evaluate the value of search trails as collaboration artifacts to answer the research question whether search trails improve the quality of collaborative search results. We confirm this and show that users can build upon work of co-searchers in a very efficient way by analyzing and extending the given search trails.

Keywords: Cooperative Systems; Computer Supported Cooperative Work; Complex Search; Collaborative Search; Search Support; Search Trails

1 Introduction

A number of approaches exist when it comes to the support of collaborative search processes. These approaches can be based on discussions [24], preprocessed databases [6, 7], user-curated website collections (www.searchteam.com), or efforts towards supporting synchronous browsing processes [17]. They build upon recommendations, tags, bookmarks, or user-defined sets of information. With the software SearchTrails, we realize an approach that goes beyond existing means of supporting collaborative search processes by capturing the user’s search process in its entity. Furthermore, our approach allows the user to enrich the search process information (see Fig. 1) by valuable website excerpts (highlights) marking important, or by indicating explicitly negative search results.

Figure 1

An example search trail, as captured by SearchTrails.

In this contribution, we address the research question whether the exchanging of search trails improves the quality of collaborative search results. We conducted a study with 29 participants to investigate the value of search trails for individual and collaborative search scenarios. We compare the exchanging of two types of artifacts with each other: On the one hand, written reports containing relevant information on a topic in prosaic form as a representative search result, and on the other hand the search trails containing a network of links and user-defined highlights that have led to the written reports. In our study, we compare building upon written reports as well as search trails for starting a search on a topic which is new for the participants. We analyze how the type of given artifact influences the number of visited resources and the quality of the artifacts resulting from the search process.

The work presented in this contribution builds upon the SearchTrails tool for supporting asynchronous, discontinuous, collaborative, and complex search processes. An early version of SearchTrails was evaluated in a qualitative study with users to show the effectiveness of the approach [13] and was able to show that search trails can be easily evaluated by evaluators [12]. For this contribution, we use a more refined version of SearchTrails, which allows the exchange and the recreation of search trails and therefore the exchange of search processes between collaborators. With the refined version of SearchTrails, we could already show by comparing usability metrics gained by the user experience questionnaire (UEQ) that search trails are superior to written reports when it comes to the exchange of search results [14].

In the following sections, we first present the historical origins, the underlying theoretical concepts, and related approaches for supporting collaborative search processes. We then describe SearchTrails’ architecture and implementation in more detail. An overview of the performed study and its results contribute the main part of this contribution. The last section gives summarizes the findings and implications and provides an outlook on future work with SearchTrails.

2 Historical Origins

Users frequently perform searches that do have more than simple fact retrieval in mind. They aim for more comprehensive results which try to cover a certain topic, assemble information from different sources, or learn about something new for the user. These search tasks may stretch over a large amount of time – in terms of days or even weeks – or require different sessions to be completed. Such search tasks are classified as exploratory [1, 23] or complex [27].

Examples for search tasks like these could be: booking a holiday trip, building some technical device on your own, or staying informed about the market situation for one specific appliance or good over a period of time. Searchers stumble upon promising web sites during the course of complex searches like these, but quickly move on from them to other web sites that seem more relevant or provide additional information. Some days later, they might be interested again in the information on some of the visited sites, but are not able to get back to them. Even if traces of the visited pages can still be retrieved from the browser history, most of the times the desired pages are lost, as the context by which these pages were found is gone.

This is where a visionary idea of one of the pioneers of analog computing, Vannevar Bush, comes into place. He suggests that the navigation through an information repository gets stored in so-called ‘trails’ that could later be recalled and extended: ‘Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item’ [5]. Bush even proposes handing over a trail to a friend who is interested in the same topic: ‘And his trails do not fade [...] [He] photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.’ [5]. This friend would at first examine the trail to get an overview over the results. To this day, this is not possible. Our system SearchTrails is built with exactly this idea in mind.

Figure 2

Model of Berrypicking search [3].

Bush’s ideas influenced – amongst many others – Bates, who already suggests that the search for information is comparable to the process of ‘Berrypicking’, in which high quality information bits get picked during the course of a search (see Fig. 2, [3]). She opposes this to classic information retrieval, in which a query should be answered by one perfectly matching document. Today’s search engines get better and better in helping users to find fact-based information and also start providing context-based information, but they lack support for compiling many pieces of information on one specific topic [28], e.g. booking a holiday trip, which includes comparing different ways of traveling, different hotels, and different leisure activities. Our approach does exactly this: it visually logs the user’s paths when traversing the web, offers possibilities to capture specific information, and provides a way back to information already seen in earlier stages of the search process.

Arguably, bookmarking the valuable pages might solve the mentioned problems. However, this does not work in the aforementioned cases, where users retroactively start appreciating the value of a visited website after they have already left it. Bookmarks also often end up being stored unsorted in the bookmark folder or a browser toolbar. Managing bookmarks as a part of the follow-up process of search is time consuming, especially with the help of the built-in browser bookmark managing tools. Social bookmarking services like Delicious^[1], Tumblr^[2], or Pinterest^[3] also do not help in our cases, as they do not store the context in which the bookmark was created and require similar follow-up work as browser bookmarks. If no bookmarks exist, re-finding a site via the back button or the browser history is tedious; especially when a user has visited many pages on a certain topic. Sometimes, the visit was so far in the past, the searcher cannot even remember the exact point in time, which makes re-finding the page even harder.

Bookmarking therefore also fails with handing over a trail in Bush’s sense: a consistent path of navigation through the web. Obviously, bookmarks could be sent to other searchers and web-based link collections can be shared, but they can hardly cover the negative information in terms of all the places visited without relevant information [16]: Which pages were visited and turned out not to be helpful? This can be helpful for evaluating the thoroughness of a search. For example, teachers receiving search trails from students and trying to evaluate the thoroughness of the students’ research efforts. Searches can be very thorough although not many results have been found. Our tool helps confirming such an effort.

3 Theoretical Background and Related Work

Singer et al. [29] show that some discontent with the search support of search engines is based in the character of the complex search task, which is not supported well by today’s search engines. Current search engines perform well on lookup or fact finding tasks [19] and therefore yield in this discipline positive user satisfaction [29]. They are trained on fact retrieval and on delivering exactly matching documents for the searcher [23]. This type of search could be described as a kind of one shot search, i.e. a ‘search and forget’ mode, in which the search results for a given query are returned, but then forgotten by the search engine. Complex or exploratory search deals with search tasks that are not well supported by today’s search engines [23].

First approaches in identifying different natures of search tasks start with Broder [4], who analyzes user surveys and search engine logs to identify navigational, transactional, and informational search tasks by the query entered into the search engine. When performing navigational queries, users aim for reaching a particular site, expecting to find some specific information on that site. With transactional queries, users try to find a page where some action should be performed, e.g. accessing a database or downloading data. With informational queries, users try to find information that should be processed by reading and either satisfies the information need, or triggers a new query to refine the found information. In some of these cases, searchers are looking for information that they assume not to find on one specific site, but on a collection of pages, where every page adds to the overall result: ‘in almost 15% of all searches the desired target is a good collection of links on the subject, rather than a good document’ [4]. Broder’s work got independently refined later in a number of studies by Rose and Levinson [26]; their results match in their core with Broder’s results. They also developed a trichotomy of user goals, consisting of navigational, informational, and resource-oriented (instead of transactional) goals. Additionally, the authors added a sub-classification of goals. While Broder states that 39–50% of all queries are informational, Rose and Levinson show that up to 60% of all queries are informational. Lewandowski [22] shows for multiple search engines that almost 50% of all queries sent to search engines are informational, while navigational queries account for a much larger number of queries than transactional ones. These results are also reflected in the social search model by Evans and Chi [9]. See Fig. 3 for an overview and aggregation of the results achieved. Considering that informational search tasks tend to be time consuming and span over multiple sessions [27], these search tasks may count for a large share of time consumed when searching. This shows the importance of exploratory or complex search tasks.

Figure 3

Overview and aggregation of search task classification results from several studies.

Such research of structuring user goals in web search laid the foundations for more formal definitions of informational search tasks. Search activities were grouped by Marchionini [23] into the three overlapping categories of ‘Lookup’, ‘Learn’, and ‘Investigate’. ‘Lookup’ comes close to navigational and transactional search tasks, as it consists of e.g. fact retrieval, navigation, transaction, or verification. ‘Learn’ consists of more complex activities, such as e.g. knowledge acquisition, comparison, or aggregation, while ‘Investigate’ consists of analysis, synthesis, or evaluation. Exploratory search, as framed by Marchionini, can be seen as a combination of the search activities of Learn and Investigate. He states that ‘investigative searching is more concerned with recall […] than precision […] and thus not well supported by today’s Web search engines.’ [23].

A definition of exploratory search is given in [32], stating that ‘Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision-making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal’.

The overview of definitions in Fig. 4 (based on [23] and [28]) shows that the core components of exploratory search are ‘Learn’ and ‘Investigate’, each of them again containing different subtasks. These subtasks imply relatively high cognitive effort, especially when it comes to tasks like analysis, evaluation, or planning. The necessary cognitive effort is hard to measure by purely implemented solutions, like we are presenting here. Therefore, we stick to the definition of complex search, as presented in [27], who defines complex search as ‘tasks where users are required to follow a multi-step and time consuming process that is not answerable with one query, requiring synthesized information from more than one retrieved web page or document to be solved. The process to work to complex search tasks usually comprises at least one of the process steps aggregation, discovery, and synthesis.’ This avoids the cognitively high loaded aspects of Marchionini’s definition [27]. This has the added benefit of covering complex search activities with limited cognitive loads into the definition, which would not be covered by Marchionini’s definition, e.g. checking the availability and prices of certain products in a number of web portals. However, complex search tasks require synthesis of the information found and they yield discoveries which extend or alter the initial search goal.

Figure 4

Connection between the definitions of exploratory and complex search (based on [23] and [28]).

To find out how users cope with the challenges of complex search tasks, systems have been developed which try to register the user interactions during the search process. In most cases, these search logging systems were constructed as browser plug-ins, storing information about the visited pages and sometimes pose questions to the users while performing certain search tasks. An early approach was an Internet Explorer plug-in with built-in questionnaires [10] that was used to compare explicit and implicit measures of user satisfaction. Another approach was the Wrapper system [20] which was installed to the user’s system and logged the interaction with the browser and the browser’s interaction with the system. It could be shown that users ‘may seek information over an extended period [of time] and on multiple information systems’. Another system was ‘Search-Logger’, built by Singer et al. [30]. It consists of a Firefox plug-in allowing configuring a list of search-tasks and selecting one of them, recording all user interactions with the browser, and storing this information on a central logging server to be evaluated. However, the logged information is not visible to the searcher. Therefore, there is no direct feedback to the user which could facilitate or support the search task. Search-Logger was mainly used to proof the existence of complex search and to allow its classification. It was also used to find shortcomings of current search engines. Based on the observations derived from the user studies carried out with Search-Logger, a user centric model for increasing user satisfaction was developed. The presented search logging systems mainly focused on gathering data from the users via browser logging, system logging, and questionnaires. Unfortunately, none of these approaches tried to derive added value from these logs for the individual user or for enhancing collaboration during search processes. Our tool SearchTrails goes beyond that, as it uses the information gathered to actively support the users in their search tasks.

Early approaches on collaborative search, such as the co-browsing idea in [18] or collaborative bookmarking in the Social Web Cockpit [25] supported remotely controlled browsers for synchronous browsing or creation of community bookmarks, but no cooperative, context-preserving, or asynchronous exploration of information spaces. Faceted search can alleviate information complexity in very specific cases, e.g. when databases are preprocessed with large effort, or by relying on object-unspecific metadata [15]. But these approaches can only offer very little support when it comes to complex search cases, as these span over time and a multitude of resources.

Even not finding a specific piece of information has a value. This value of not finding a desired search result and interpreting it to be positive was first appreciated by Garfield [16]. He coined the term of ‘negative search’ for it, meaning that finding no information on a given topic can be positive if it was the user’s wish to not find information or to be confirmed that on a particular subject or in a particular resource certain information does not exist. This can be mapped to exploratory search: ‘Exploratory searches may also seek the discovery of gaps in existing knowledge so that new research ground can be forged or unpromising directions can be avoided.’ [33]. As our tool SearchTrails stores the complete search trail and does not judge about the value or correctness of the visited pages, it also captures this ‘negative information’. This covers all the pages of a search where the desired information has not been found.

More recent approaches for supporting collaborative search rely more on direct interaction of searchers, curated information collections, preprocessed databases, or ratings and recommendations. A collaborative search support system is presented by Golovchinsky et al. [17], where distinct roles of the searchers during the search process allow splitting work between collaborators. ResultsSpace [6, 7] relies on preprocessed databases and derives recommendations from the relevance ratings of its users. Recent approaches like content curation [34] consider portals like Pinterest and Last.fm as user-curated content collections. These systems allow users to generate and curate information collections on certain topics and to share them with friends or colleagues. However, these approaches do not capture the search process as a whole and therefore do not store the sidetracks of search processes, or the places where a user did not find relevant information.

One approach exploiting the concept of search trails is based on a Microsoft Internet Explorer plug-in and was followed by Singla et al. [31], where the authors distributed a plug-in collecting anonymous trails and sent them back for further analysis. The authors use a number of different algorithms for analyzing the trails with regards to length, breadth, depth, or diversity. While the authors claim that there is a ‘value in trails’, and hope that the best paths ‘outperform the average over all trails followed by users’, they do not perform a user evaluation where they return paths to searchers and evaluate their actual value. On a higher level, a more recent study by Awadallah et al. [2] presents an approach where the query logs of the search engine ‘Bing’ are evaluated with regards to the IP addresses of users. The authors recreate the users’ trails on the search engine and identify frequently visited clusters of pages. These trails were used to generate recommendations for further user investigation.

Our system SearchTrails overcomes the limitations of existing related work by combining key features from search logging and collaborative search support systems to provide support for asynchronous, complex search processes. SearchTrails therefore is the first system to investigate the individual value of search trails for collaboration support.

4 SearchTrails

SearchTrails aims for supporting asynchronous, discontinuous, collaborative, and complex search tasks by supporting the threefold of aggregation, discovery, and synthesis and the exchange and recreation of search trails.

SearchTrails supports aggregation by recording the search trail with all its visited web pages as nodes and side tracks, which helps keeping the context of search results.
It supports synthesis by collecting valuable information pieces (highlights) in text-form from websites. These text blocks are collected in the highlights overview, and are stored with the node.
Discovery is supported by the visual representation of the search process as a force-directed graph as well as search term suggestions derived from the keywords of visited web pages. These keyword suggestions guide searchers into related, but new search directions.

The force directed graph visualization is a generally accepted concept for visualizing large information collections [8], and is used for the dynamic layout of the search trails. SearchTrails regularly stores the user’s search trail on a remote server to avoid losing data in case the system is closed. This mechanism allows the recovery, recreation, and exchange of own and foreign search trails. The user-defined highlights are integrated into the search trail and are displayed in a highlights overview, combining important information and its sources. Negative search results are indicated by purple nodes and clusters (colored hulls) indicate nodes from the same host.

Figure 5

Architecture of SearchTrails.

Literature and technical review suggest realizing SearchTrails as an extension for the Google Chrome browser, as it poses only minor installation effort and allows open-ended search processes, while unobtrusively logging the user’s interactions with the browser. SearchTrails consists of three engines and the user interface on client side, and the server-side infrastructure for saving and recovering the search trails (see Fig. 5). On client-side, the logging engine captures all events and transforms them into a JSON data structure, representing the search process. This data structure is stored in the browsers’ background storage. The background storage reports about the changes of the data object, which triggers updates of the visualization by the visualization engine. The storage engine retrieves the search trail object from the background storage, and stores it with the help of server-side services. SearchTrails generates a unique ID for each search trail, which is stored on the user’s hard drive and can be used for retrieving the search trail. This way, foreign users can retrieve the search trail data objects, given they know the ID.

SearchTrails monitors the opening, closing, and switching between tabs and the change of URLs with the help of a logging engine which logs, filters, and interprets the user interactions. It also catches metadata from the web pages and generates keywords from the pages’ content. The user’s navigation is transformed into nodes in the graph visualization by the rendering engine and shown in a separate tab, which itself is excluded from being logged. The metadata gets attached to the nodes and is visible to the user. The visualization is based on a directed graph in a forced-directed layout. For each visited URL, important keywords are derived, stored and displayed in a table on the right, if a keyword appears on two or more pages the user has visited. This metadata gets also stored in the search trails. If more than three nodes belong to the same host, they are clustered by a colored hull and can be closed using a mouse click to reduce the complexity of the visualization, but still being sensible to user interactions. Irrelevant nodes can be deleted by the searcher. The storage engine stores the search trail JSON object regularly on a remote server which allows the authors to access all data, while restricting access for the participants.

Figure 6

Screenshot of the SearchTrails user interface.

Besides the visualization, users can interact with the following features:

To ease the synthesis of information (as one of the key points of complex search), users can select text and store it as a highlight by pressing ALT. This stores the selected text in an overview table, together with its source URL, and the corresponding node in the visualization being marked blue (see Fig. 6), which allows the users to immediately recognize the highlight-pages in the graph.
A tap on ‘–’ marks a page’s node in the search trail purple, to signal that it was not helpful.
Highlights can also be added as text from other sources, to allow manual enrichment of the information, and erroneously set highlights can be removed.
The JSON object corresponding to the search can be viewed, imported, and exported, although this was mainly used for evaluation purposes. The views of the highlights and JSON can be switched on or off.
The keywords derived from the pages’ contents are shown as a table and are selectable; this results in the nodes with the matching keywords being marked in the visualization. A click on ‘Search’ starts a search for the selected keywords in a new tab.

Technically, the visualization is done by making use of the ‘Data-Driven Documents’ Java Script framework d3js (http://d3js.org/), which allows highly dynamic and efficient visualization of large data structures. The browser plug-in inserts JavaScript code into all the pages the user visits, to extract keywords from the pages. Events as well as the keywords are shared within the application by message passing. On every loading of a site, a new node is created in the visualization. When this is finished, the node is updated by the keyword information and some page metadata, such as the description of a page. When a searcher moves from one URL to another by using the back button or by switching browser tabs the nodes corresponding to this URL are connected.

A search session starts with opening SearchTrails and entering a username and a title for the search. After that, a unique SearchTrails ID is stored locally under which the trail is stored on the server. While searching, the search trail is stored automatically every five minutes, as well as on closing SearchTrails. For continuing a search, SearchTrails uses the stored trail ID to trigger the trail being fetched from the logging server. This way, also a trail ID received via e-mail can be used to fetch a trail from another user. This trail can then be used as a starting point for the users’ own search and be extended by highlights and annotations.

When users receive search trail IDs from colleagues, they enter the ID into the retrieval mechanism. The server-side components retrieve the search trail data from the storage server and trigger its visualization. The visualization recreates the search trail, the keyword list, and the highlights without the nodes that were deleted by the previous user. For exploring an unknown search trail, users evaluate the search trail, the keyword list, and the highlight overview. All search trail nodes can be dragged and rearranged in the force-directed layout. Hovering over a node opens a window with the most important details on the visited URL. The search trail is clustered by the visited hosts, and a cluster can be reduced to one larger cluster node, which replaces all nodes in the cluster. This way, users can evaluate search trails in a structured way, and detect more and less valuable clusters and add nodes to the search trail by visiting new web pages while SearchTrails is active.

5 SearchTrails User Study

For our study, we invited 29 students of a university lecture on Computer Supported Collaborative Work (CSCW) in a master course as a representative sample of tech-savvy users with experience in web search. The students used a multitude of platforms, and we provided detailed instructions on how to install SearchTrails and made sure the installation went well. For the study, we developed two search topics to avoid biasing the study results by the selection of just one topic. The search tasks required the evaluation of given artifacts and were checked to fulfill the seven requirements for complex search tasks in [21]. The first search task covered 3D printing, while the second search task covered home automation. By analyzing all artifacts produced, we found that the search tasks did not at all influence any of the search process results.

The study consisted of two phases of one week duration each, in which we divided the study participants in two groups, based on their technical support by SearchTrails and the type of given artifact to start from. During the first phase, group A used SearchTrails, while group B was not supported by SearchTrails during searching. During the second phase, group A received a written report to start their search from, while group B received a search trail with similar information content as the written report. In each of the two groups, the participants worked on both topics to make sure that the results were not biased by any specific topic. The topics were handed out with alternating groups through the rows and seats of the lecture hall. Neighboring students were assigned to different groups. This prevented plagiarism, as we assume that students who potentially exchange information would sit next to each other. After the first phase, all participants were asked to produce a written report which contained all relevant information such that anyone who did not perform the search has an overview of the results. As the participants from group A used SearchTrails, each of them produced a search trail with highlights on their topic. From these artifacts, we selected one average search trail and its corresponding report on each topic which contained a good basic set of information, leaving out more specific information about cars or home security. For the exchanged search trails, we made sure that they contained no personal information.

For the second phase of the study, the topics were exchanged between the participants, such that every participant started on an unknown topic. We then supplied the participants of group A with a report and from group B with a search trail from the first phase and asked them to build upon this material for answering more specific questions on cars or home security. The participants who were given the search trail recreated it and started by evaluating it, while the participants with the reports started by reading them. This procedure resembles asynchronous collaboration on complex search tasks on a new and a traditional search process artifact. For 3D printing, we asked: ‘Based on the given material, find applications of 3D-Printing in the car manufacturing domain. Which applications exist, which ones will come? Will 3D-Printing change the way of manufacturing cars in the future?’. For home automation, we asked: ‘Based on the given material, find applications of home automation dealing with home security. Which applications exist, which ones will be available? Which applications would you prefer?’. We asked the participants to write a report on the new topic with all necessary information. Including URLs was not mandatory, and we did not request a minimum amount of text, to avoid the production of filler text. After one week, we collected the reports and stored the generated search trails in a safe place.

From our participants, we received 26 search trails and 21 reports. All three authors independently graded the participants’ reports with academic grades from 1 to 5 without knowing about the given artifact to objectively judge the quality of the reports (1 is the best grade and 5 means ‘failed’). As the reports were requested to be able to inform someone who did not perform the search about its key results, we graded the reports by their quantitative breadth and their qualitative depth of information. We similarly clustered the resulting search trails to resemble academic grades and performed statistical analyses on the generated search trails.

6 SearchTrails Study Results

We focus on the quality of the search results, as we assume that the quality of the collaborative search result is more important than the quality of the search process. This is because the common goal of the collaborators is more on producing a collaborative result than in experiencing a high quality search process. In our case, the search results are the search trail and the report. From these search results, the search trail can be considered a direct result of the search process, while the report is an indirect result. First, we compare the average grades of the reports, depending on the given collaboration artifacts. We did not split the results by the given search topics, as analyses show that the topics themselves have no impact on the quality of the artifacts produced by each group. We performed statistical significance analyses on all discovered differences. While the average grade for group A, who received the report as collaboration artifact is 3.20, while the average grade for group B is 2.21. The results are highly significant on a 5% error level (see Tab. 1). They show that the average grade of the reports heavily depends on the type of the given artifact. The reports of group B, whose participants were equipped with a search trail as a collaboration artifact, are graded approximately one full grade better than the reports of group A, whose participants were equipped with a report as collaboration artifact.

Table 1

Comparison of the grades of the generated reports for both groups of the study.

2^nd phase Average grade of the reports	Group A Given artifact: report	Group B Given artifact: search trail
Average grade	3.20	2.21
T-test value	4.732
Critical T-test value	2.000

Table 2

Statistical data of search trails and their differences between the two groups.

	# Nodes	# Edges	# Steps	# Clusters	# Highlights	Duration (s)	Time spent not on search engines (s)	Average loop length
Group A: Report	34.4	64.4	108.3	2.8	2.9	2752	1532	2.8
Group B: Search trail (added value)	23.0	28.4	42.8	1.0	1.0	1497	1006	5.0
T-test 5%	no	sig	sig	sig	no	no	no	sig
T-test 10%	no	sig	sig	sig	no	sig	no	sig

We also clustered the search trails from the second phase in an expert workshop. This clustering made use of the full spectrum of academic grades. The final clustering of the search trails shows that the search trails from group A spread around the full range of academic grades and achieve an average value of 3.17, while the search trails from group B achieve a statistically significant better average value of 1.78 (5% error level). This is not too astonishing when keeping in mind that the participants from group B were equipped with a proper search trail, which they should evaluate and enlarge, and the participants did not deteriorate the given search trail, e.g. by deleting nodes or highlights.

However, a statistical comparison of search trail characteristics reveals a number of significant differences between the two groups (see Tab. 2). The values for group B show exclusively the value added by the participants during the search process of the second phase of the study; the values of the given search trails were already subtracted. The last two rows show whether the difference between the average values of group A and group B (added value) is significant on a 5% or a 10% error level. In many cases, the net value added to the given search trail by the participants from group B is significantly smaller than for group A.

The key properties of a search trail are the number of nodes, edges, steps, clusters, and highlights. Other key characteristics are the duration and the number of seconds not spent on search engine pages. The last characteristic is the average loop length. The numbers of nodes, edges, and highlights of a search graph are self-explaining. The number of steps is the number of user induced actions during the search process for navigation from one node to another. When a user walks the same path within a graph several times, the graph is not altered anymore, but the number of steps through the graph increases. The number of clusters is the number of hosts from which more than three different web sites were visited. The duration is the number of seconds in which the participants have searched actively, meaning that no interruptions of more than 15 minutes occurred. When two user-induced events are more than 15 minutes apart, the times are counted as idle times. We furthermore calculated where the participants have spent their time. The second to last column shows the number of seconds that were spent on non-search engine pages. The last column shows the average loop length. We count the length of all paths that start at a search engine until the path reaches a search engine page again and divide this by the total number of paths. This number serves as an indicator for the average depth with which the participants dived into the topic.

Tab. 2 shows that all key characteristics of the search trails are on average larger for group A than they are for group B. These values show that the value that was added to the given search trail is significantly smaller for group B than for group A. This lower added value for group B did not occur randomly, but is significant in many cases: The lower numbers of edges, steps, and clusters are significant on a 5% error level, while the shorter duration is significant on a 10% error level. The participants from group B also produced significantly longer loops than the participants from group A. This means that these participants did longer tours through the Internet before going back to search engine pages and therefore dived deeper into the domain than the participants from group A. A search trail therefore seems to help avoid redundant searching and brings the users into a position where they are able to produce better results with lower efforts, meaning that they are significantly more efficient.

These results show that search trails as collaboration artifacts have an advantage over written reports as collaboration artifacts. They show that the participants who were given a search trail invested less resources into extending the given material than the participants who were given a report. These differences are statistically significant in most cases. Altogether, the collaborative efforts when extending the given search trail lead to significantly better search results. For group B, both the report and the search trail improve significantly. Adding to that, the results also show that the participants of group B were significantly more efficient: Group B participants invested only 54% of the time than group A participants needed to invest into the search process, but ended up with a report that was on average one full grade better than from group A participants (3.20 vs. 2.21, see above). These results show that we can strongly confirm our research question: Exchanging search trails in collaborative search improves the quality of collaborative search results. SearchTrails eases collaboration and increases the efficiency of the searchers in collaborative search scenarios.

7 Conclusion and Outlook

Our results from a field study with 29 participants over two weeks show that SearchTrails as a collaborative search support tool can induce significant improvements in terms of efficiency of collaborative work. This leads to a gain in the quality of collaborative search processes and positively answers our research question. Search trails as collaboration artifacts are shown to be a valuable means of exchanging search results and ease building upon previously done work of other searchers.

In further analyses beyond the ones described here, we investigated the correlation between statistical search trail characteristics and the grades of the search trails and reports for all participants as one group. We see that even though we graded the report and the search trails independently, a number of search trail statistics show a positive correlation with the report grade. These characteristics are – among others – the number of visited non-search engine (NSE) pages, the number of steps through NSE pages, and the time spent on NSE pages.

This suggests that searchers with extensive search trails tend to produce good reports, and high quality information from the search trail makes its way into the participants’ reports, which indicates that a valuable search trail can be considered a head start into a topic. This may be based on the unfiltered insight into the collaborator’s search process, its resources, and its valuable or less valuable search results.

The results derived in this and the previous publications indicate that the concept of building, visualizing, exchanging, and evaluating search trails has an impact on collaboration for other systems. For the collaborating searchers, it seems to be easier to grasp the unfiltered contents of a collaborator’s search trail than to evaluate a linear written text. This may be due to the playful nature of visual representations of complex search processes and their results, which can be evaluated individually. Furthermore, this contribution shows that it is more efficient to extend search trails and to write reports from this starting point than to directly extend written reports. The value lies in the exploration and the possibility to individually learn about the search trails contents. Our results also show a significant gain in efficiency when relying on the visual representation of a search trail compared to a written report. These results strengthen the value of visualizations of search processes and their results and will influence forthcoming systems. We are currently carrying out research in two directions: First, transferring SearchTrails to hybrid mobile desktop scenarios, for allowing the transfer of search trails from cellphones to desktops and back. Second, as it is currently only possible to asynchronously hand over search trails to other users, we investigate possibilities to enable synchronous collaborative search with SearchTrails.

Although the results of our study are positive, a potential flaw of our study is that the students are representative users of SearchTrails, but tend to do only necessary work, and are less intrinsically motivated. Especially not setting fixed guidelines for the reports or search trails to be delivered may not have improved the quality of the search results. However, as all students were affected by this motivational problem, the overall results in more real test cases will most likely be stronger than in our recorded samples.

About the authors

Sebastian Franken

Dr. Sebastian Franken is a researcher previously at Fraunhofer FIT, Sankt Augustin, Germany. He studied Computer Science and Business Administration at RWTH Aachen University and graduated in 2010 and 2011. He earned his doctoral degree from RWTH Aachen University with this work on search trails in June 2016.

Ulrich Norbisrath

Dr. Ulrich Norbisrath is an independent technical consultant, researcher, and teacher. He has worked for universities in the US, Germany, Austria, Estonia, and Kasakhstan and consulted various IT projects of companies in San Francisco bay area, Germany, and Estonia. His doctoral degree is from RWTH Aachen University.

Wolfgang Prinz

Professor Wolfgang Prinz, PhD, studied informatics at the University of Bonn and received his PhD in computer science from the University of Nottingham. He is vice chair of Fraunhofer FIT in Sankt Augustin, division manager of the Cooperation Systems research department in FIT, and Professor for Cooperation Systems at RWTH Aachen University. His main research interest is in CSCW, web-based collaboration, and the application of AR/VR technologies in the workplace.

References

[1] Aula, A. & Russell, D.M. (2008). Complex and Exploratory Web Search. Information Seeking Support Systems An invitational workshop sponsored by the National Science Foundation. pp. 23–24.Search in Google Scholar

[2] Awadallah, A.H., White, R.W., Pantel, P., Dumais, S.T. & Wang, Y.-M. (2014). Supporting Complex Search Tasks. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. [Online]. 2014, ACM Press. pp. 829–838. Available from: http://dl.acm.org/citation.cfm?doid=2661829.2661912. [Accessed: 18 September 2015].Search in Google Scholar

[3] Bates, M.J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review. 13 (5). pp. 407–424.Search in Google Scholar

[4] Broder, A. (2002). A taxonomy of web search. In: ACM SIGIR Forum. 2002, ACM. pp. 3–10.Search in Google Scholar

[5] Bush, V. (1945). As we may think. Atlantic Monthly. 176. pp. 101– 108.Search in Google Scholar

[6] Capra, R., Chen, A.T., Hawthorne, K. & Arguello, J. (2012). ResultsSpace: An experimental collaborative search environment. Proceedings of the American Society for Information Science and Technology. 49 (1). pp. 1–4.Search in Google Scholar

[7] Capra, R., Chen, A.T., McArthur, E. & Davis, N. (2013). Searcher actions and strategies in asynchronous collaborative search. Proceedings of the American Society for Information Science and Technology. 50 (1). pp. 1–10.Search in Google Scholar

[8] Eades, P. & Huang, M.L. (2000). Navigating clustered graphs using force-directed methods. J. Graph Algorithms Appl. 4 (3). pp. 157–181.Search in Google Scholar

[9] Evans, B.M. & Chi, E.H. (2008). Towards a model of understanding social search. In: Proceedings of the 2008 ACM conference on Computer supported cooperative work. [Online]. 2008, ACM Press. p. 485. Available from: http://portal.acm.org/citation.cfm?doid=1460563.1460641. [Accessed: 18 September 2015].Search in Google Scholar

[10] Fox, S., Karnawat, K., Mydland, M., Dumais, S. & White, T. (2005). Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS). 23 (2). pp. 147–168.Search in Google Scholar

[11] Franken, S. (2016). Supporting asynchronous, discontinuous, collaborative, complex search tasks by the visualization of search trails. Dissertation. [Online]. Aachen: RWTH University. Available from: http://publications.rwth-aachen.de/record/659882/files/659882.pdf. [Accessed: 16 November 2016].Search in Google Scholar

[12] Franken, S. & Norbisrath, U. (2014). Supporting the evaluation of complex search tasks with the SearchTrails tool. In: Proceedings of 24th Annual International Conference on Computer Science and Software Engineering. 2014, Toronto, Canada: IBM Corp. pp. 262–274.Search in Google Scholar

[13] Franken, S. & Norbisrath, U. (2014). Trail Building During Complex Search Tasks. In: A. Butz, M. Koch, & J. Schlichter (eds.). Mensch & Computer 2014 – Tagungsband. 2014, München, Germany: De Gruyter Oldenbourg. pp. 135–144.Search in Google Scholar

[14] Franken, S., Norbisrath, U. & Prinz, W. (2015). Search Trails as Collaboration Artifacts – Evaluating the UX. In: S. Diefenbach, N. Henze, & M. Pielot (eds.). Mensch und Computer 2015 – Proceedings. 2015, Berlin, Germany: De Gruyter Oldenbourg. pp. 23–32.Search in Google Scholar

[15] Franken, S. & Prinz, W. (2009). FacetBrowse: Facettenbasiertes Browsen im Groupware Kontext. In: H. Wandke, S. Kain, & D. Struve (eds.). Mensch & Computer. 2009. pp. 123–132.Search in Google Scholar

[16] Garfield, E. (1970). When is a negative search result positive. Essays of an Information Scientist. 1. pp. 117–118.Search in Google Scholar

[17] Golovchinsky, G., Adcock, J., Pickens, J., Qvarfordt, P. & Back, M. (2008). Cerchiamo: a collaborative exploratory search tool. Proceedings of Computer Supported Cooperative Work (CSCW). pp. 8–12.Search in Google Scholar

[18] Gross, T. (1998). CSCW3: Transparenz- und Kooperationsunterstützung für das WWW. Groupware und organisatorische Innovation. Tagungsband der D-CSCW’98. pp. 37–50.Search in Google Scholar

[19] Jansen, B.J. (2006). Using temporal patterns of interactions to design effective automated searching assistance. Communications of the ACM. 49 (4). p. 72.Search in Google Scholar

[20] Jansen, B.J., Ramadoss, R., Zhang, M. & Zang, N. (2006). Wrapper: An application for evaluating exploratory searching outside of the lab. In: EESS 2006. [Online]. p. 14. Available from: http://www.researchgate.net/profile/Jim_Jansen/publication/228830541_Wrapper_An_application_for_evaluating_exploratory_searching_outside_of_the_lab/links/02e7e51f8ad289d769000000.pdf. [Accessed: 21 September 2015].Search in Google Scholar

[21] Kules, B. & Capra, R. (2009). Designing exploratory search tasks for user studies of information seeking support systems. In: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries. [Online]. 2009, ACM Press. p. 419. Available from: http://portal.acm.org/citation.cfm?doid=1555400.1555492. [Accessed: 21 September 2015].Search in Google Scholar

[22] Lewandowski, D. (2006). Query types and search topics of German Web search engine users. Information Services & Use. 26 (4). pp. 261–269.Search in Google Scholar

[23] Marchionini, G. (2006). Exploratory search: from finding to understanding. Communications of the ACM. 49 (4). p. 41.Search in Google Scholar

[24] Morris, M.R. & Horvitz, E. (2007). SearchTogether: an interface for collaborative web search. In: Proceedings of the 20th annual ACM symposium on User interface software and technology. [Online]. 2007, ACM Press. p. 3. Available from: http://portal.acm.org/citation.cfm?doid=1294211.1294215. [Accessed: 21 September 2015].Search in Google Scholar

[25] Prinz, W. & Gräther, W. (2000). Das Social Web Cockpit: Ein Assistent für virtuelle Gemeinschaften. Verteiltes Arbeiten – Arbeit der Zukunft. Tagungsband der D-CSCW 2000.Search in Google Scholar

[26] Rose, D.E. & Levinson, D. (2004). Understanding user goals in web search. In: Proceedings of the 13th international conference on World Wide Web. [Online]. 2004. pp. 13–19. Available from: http://dl.acm.org/citation.cfm?id=988675. [Accessed: 3 May 2013].Search in Google Scholar

[27] Singer, G. (2012). Web search engines and complex information needs. [Online]. Tartu, Estonia: University of Tartu, Estonia. Available from: http://dspace.ut.ee/bitstream/handle/10062/26463/singer_georg.pdf.Search in Google Scholar

[28] Singer, G., Danilov, D. & Norbisrath, U. (2012). Complex search: aggregation, discovery, and synthesis. Proceedings of the Estonian Academy of Sciences. 61 (2). p. 89.Search in Google Scholar

[29] Singer, G., Norbisrath, U. & Lewandowski, D. (2012). Ordinary search engine users assessing difficulty, effort, and outcome for simple and complex search tasks. In: Proceedings of the 4th Information Interaction in Context Symposium. [Online]. 2012, ACM. pp. 110–119. Available from: http://dl.acm.org/citation.cfm?id=2362746. [Accessed: 21 September 2015].Search in Google Scholar

[30] Singer, G., Norbisrath, U., Vainikko, E., Kikkas, H. & Lewandowski, D. (2011). Search-Logger: Analyzing Exploratory Search Tasks. In: Proceedings of the 2011 ACM Symposium on Applied Computing. [Online]. 2011, ACM. pp. 751–756. Available from: http://dl.acm.org/citation.cfm?id=1982350. [Accessed: 21 September 2015].Search in Google Scholar

[31] Singla, A., White, R. & Huang, J. (2010). Studying trailfinding algorithms for enhanced web search. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. [Online]. 2010, ACM Press. p. 443. Available from: http://portal.acm.org/citation.cfm?doid=1835449.1835524. [Accessed: 21 September 2015].Search in Google Scholar

[32] White, R.W., Marchionini, G. & Muresan, G. (2008). Evaluating exploratory search systems: Introduction to special topic issue of information processing and management. Information Processing & Management. 44 (2). pp. 433–436.Search in Google Scholar

[33] White, R.W. & Roth, R.A. (2009). Exploratory Search: Beyond the Query-Response Paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services. 1 (1). pp. 1–98.Search in Google Scholar

[34] Zhong, C., Shah, S., Sundaravadivelan, K. & Sastry, N. (2013). Sharing the Loves: Understanding the How and Why of Online Content Curation. In: ICWSM. [Online]. 2013. Available from: http://www.inf.kcl.ac.uk/pg/czhong/papers/icwsm13.pdf. [Accessed: 21 September 2015].Search in Google Scholar

Published Online: 2017-04-03

Published in Print: 2017-04-01