1 Introduction

Usability is a crucial factor for easy-to-use interfaces and websites. Many conferences in the field of human-computer interaction (HCI) discuss new approaches, case studies and methods in usability engineering and user experience design. Considering the domain expertise of the audience these websites should be extraordinarily usable and offer a positive user experience (UX). In this paper, we probe this hypothesis and present results of a comparative usability study with expert and novice users. Our findings show that previous experience with different conference websites has no significant influence on task performance or perceived usability. We have evaluated three conference websites, namely the platforms of HCII 2014Footnote 1, CHI 2014Footnote 2 and MobileHCI 2013Footnote 3. Although the results show no notable differences regarding the usability of the sites, the CHI 2014 website performed best with respect to usability. Although only a few severe problems could be identified during the tests, all sites show room for improvement regarding their usability and UX.

The rest of this paper is organized as follows: In Sect. 2, the selection of conference websites as objects of study is introduced. Section 3 explains the design of the study, the metrics used as well as the actual execution of the study. Major results are presented and discussed in Sect. 4, and finally Sect. 5 gives a short summary of our findings.

2 Conference Websites

Three international human-computer interaction conference websites served as object of investigation: CHI 2014, HCII 2014, and MobileHCI 2013. These sites were chosen according to the following requirements:

  • Relevance in the field of HCI (measured by the number of publications and citations).

  • International conference including an English website.

According to a list of top conferences in the field of human-computer interaction published by the Microsoft Corporation (2013) concerning publications and citations, the CHI conference can be identified as the biggest conference in this field. Because of its high popularity and its many participants the website of the HCII conference was chosen (HCI International Conference, 2015) as second object. Using a conference website may be relevant in three points in time:

  • Interest: Looking for submission deadline and general information

  • Submission: Submitting the paper

  • Attending: Looking up accommodation and program information if the paper got accepted

In order to map this procedure over time from submitting an abstract to actually attending the conference during the usability test we used the Wayback MachineFootnote 4 as a powerful tool to retrieve all states of a website over time.

Completing the list of websites to be tested we had to choose between the platforms of IUIFootnote 5, UXPAFootnote 6, InteractFootnote 7 and MobileHCI.

Examining and verifying the feasibility of possible tasks, the Wayback Machine and the archived states of the websites turned out to be another selection criterion. The website of MobileHCI 2013 therefore completed the list.

3 Study Design

3.1 Identifying Tasks

In order to identify relevant tasks we interviewed expert users and asked them about their typical procedure to submit a paper at a conference. As a result, we found out that the process to actively attend a conference can be divided in three steps. In the first step, potential authors may be interested in submitting a paper. In the second step, the user has to submit the paper or abstract. In the third step, the user is interested in information about accommodation and the conference program.

We could identify the following tasks for our usability study (see Table 1). Submitting a paper usually works through the conference management system (CMS). As these systems are independent from the actual websites we excluded them from our study as it was not our intention to evaluate conference management systemsFootnote 8. Finding the link to the CMS therefore is the only task in step two. Following a realistic timeline in the study, all participants had to execute the tasks shown in Table 1 in the same exact order. We are aware that this may have led to learning effects and may have influenced the rating of the system`s usability.

Table 1. Task description

The usability of the conference websites was measured using a between-subject study design with novice and expert users. Novice users had no experiences in submitting a paper, but as students could possibly do so in the future (6 female, 14 male, avg. age 24.2). We defined expert users by having at least submitted three papers in the last three years (5 female, 15 male, avg. age 31.5).

Because we used the thinking aloud method (Nielsen Norman Group, 2015) as a method to gather more user insights and because of the extended loading times during the use of the Wayback Machine, tracking the task time to measure the efficiency was not suitable. Instead, we used a key logger developed by Fimbel (2013) to track interactions and calculated – based on the KLM model by Card, Moran and Newell (1980) – the time on task without interruptions. The KLM provides different estimation times for different types of interaction; a keystroke for an average typist for example is estimated with .2 s, a click interaction with .1 s.

In order to handle the key logger, we implemented a local website, which showed the current tasks and started the key logger in the background as soon as the user decided to start a task (see Table 1). All data was stored in a local database.

3.2 Study Execution

The websites were tested on a 15.6″-laptop with an additional external mouse. The test was recorded using the usability software Morae (TechSmith, 2015). In addition to the test conductor, a transcript writer took notes during the test.

The order of the websites to be tested was randomized. Each participant had to solve all six tasks on each website. A new tab that showed the currently needed Wayback Machine version of the website was opened. When the task was solved, the user closed the tab and the key logger stopped tracking and restarted when the next task-tab was opened. The selection screen (1.) and the screens for task 1 on the website of CHI2014 (2.-4.) are shown in Fig. 1. After completing the tasks on one site, the website showed a follow-up questionnaire. Thereafter the next conference website was loaded and the participants were asked to start over.

Fig. 1.
figure 1figure 1

Important steps of the test website for the execution of our study

As follow-up questionnaires we used the system usability scale (SUS) and the AttrakDiff. The SUS is a fast method to measure the usability of a product (Sauro and Lewis, 2012, p. 198). It consists of ten questions answered on a 5-point Likert scale. The results can be summed up to a non-linear scale from zero to 100 points. A system with 100 points has the most suitable usability. In addition to the measurement of usability, user experience is an issue that is inevitable when it comes to the evaluation of websites. In order to quantify the UX of the conference websites the AttrakDiff was chosen. It is an established questionnaire developed to measure the user experience of a product (Hassenzahl, Burmester and Koller, 2008). It is based on 28 bipolar, seven-stage items, which can be mapped onto different qualities (pragmatic quality, hedonic quality (stimulation), hedonic quality (identity) and attractiveness). The results of the AttrakDiff questionnaire are shown in a coordinate system with a pragmatic and hedonic axis. A product positioned in the top right corner can be seen as most pragmatic and hedonic and therefore offers a good user satisfaction (User Interface Design GmbH, 2015).

3.3 Data Evaluation

Besides the study-website, which guided the participants through the test and managed the key logger, we implemented two further websites: one website for analysis, which conducted statistical tests and one website for result presentation that dressed up the results in graphs and tables. To analyze the time-on-task between the different websites, a t-test for unpaired samples was used, to compare the time of the different samples we used a student’s t-test (Sauro and Lewis, 2012, p. 63ff). All tests are based on a significance level α = 0.05.

The results of the SUS were drawn by the results-website in a diagram; the results of the AttrakDiff are analyzed and presented in a pdf by the AttrakDiff website.

4 Results

4.1 Differences Between Expert and Novice Users

In order to verify if domain knowledge and previous experience increases the task efficiency we took a look at the time needed for each task and each website and compared the two groups.

The first task was completed faster by expert users on each conference website but only significantly faster on the page of the MobileHCI (see the value highlighted in Table 2). With 28.05 % (df = 39, t = 1.9724), the experts needed less time than the novice users. On the CHI website, the experts were 22.11 % faster and on the HCII website 14.64 %.

Table 2. Differences in task times (in seconds) between experts and novice users. The table shows the results from the experts’ point of view.

Similar results were measured for the second and third task. The expert group was always faster than the first user group. That means for task two 17.89 % faster on the CHI website, 29.55 % faster on the HCII website and 11.64 % faster on the MobileHCI website. For task three 17.13 % faster on the CHI website, 20.78 % faster on the HCII website and 37.11 % faster on the MobileHCI website. Because of a quite large variance, none of those differences are significant.

A slight difference occurs on task four where again the experts were faster than first users by 15.33 % on the CHI website and 21.63 % on the MobileHCI website but only slightly faster on the HCII website (with 3.25 % less time needed).

For task five, results are quite different when compared to the first four tasks: Here, first users were faster than the experts on all pages. For the CHI website experts needed 28.80 % more time, for the HCII website 14.75 % and for the MobileHCI website 18.48 %. That means that on a single sided t-test the null hypothesis could not be neglected. Because of the big amplitude of the t-value (t = −1.9938) we considered conducting a two-sided t-test which did not deliver any significant results.

For the last task we found out that the expert group was faster for the CHI website by 7.28 % and the HCII website by 11.10 %, but for the MobileHCI website the first user group was faster by 25.91 %.

Overall, there are clear differences between times for first users and experts depending on website and task. Because of that, a closer look at the different pages is worthwhile to find out which page is better at which task.

4.2 Differences Between the Websites

To find out which website is designed as the most suitable for a task, we compared the times needed for each task and each group.

The first task was completed fastest with the MobileHCI website in both groups with non-significant differences to the other websites.

Task two was conducted fastest on the MobileHCI website again and even significantly faster in comparison to the HCII website (experts: t = 2.4181, first users: t = 2.1940).

This tendency does not last for task three. In this task, the MobileHCI website was the slowest for both groups with the HCII website being the fastest. These differences were significant for the first user group compared to the MobileHCI website (t = 2.5297) and the CHI website (t = 2.7709).

The same goes for the next task. In task four, the HCII website was the fastest and the MobileHCI the slowest. This time the differences were significant for both test groups. The HCII website was significantly faster than the CHI website (t(first users) = 4.7419, t(experts) = 3.3512) and the MobileHCI website (t(first users) = 4.0098, t(experts) = 3.0331).

Only for task five, the CHI website was the fastest. However only the differences to the MobileHCI website were significant (t(first users) = 3.1625, t(experts) = 3.0136). For this task the MobileHCI website was overall the slowest with also significantly higher times than the HCII website for the expert group (t = 2.6695).

In contrast to task five, the MobileHCI website was the fastest for task six. The differences for the first user group in comparison to the HCII website (t = 4.7284) and the CHI website (t = 3.7367) were significant. The HCII website was the slowest for this task. The expert group needed significantly more time on this website than on the MobileHCI website (t = 2.4017).

Overall, no page was always faster than the others but the MobileHCI website was the fastest in three of six tasks (see Table 3). To find out which problems slowed down the users mostly the next section will list the most important usability problems found.

Table 3. Most efficient website by user group (significant differences are highlighted)

4.3 Most Important Usability Problems

In step one, the biggest problem was to find the right template for the paper. Not only the position and the style of the link, which lead to the download of the file, but also the labeling of the function caused problems to the users. A second problem in this phase was to find the right submission date. Many users overlooked the information because of its positioning (HCII website) and especially first users had problems recognizing which of the given submission dates is the right one.

The next test phase consists of the task to find the link to the Conference Management System (CMS). The HCII provided a link to the CMS represented by an icon, but most of the participants could neither identify the icon as representation of the CMS nor identify the icon as a link. The other websites did not show a link on the landing page and the users had to look for it quite a while.

During the last phase – the phase in which the users want to attend the conference – the main problem was the layout of the conference program. All websites presented the program in a different way (only session titles or all papers, all at once or with pop ups), but in fact, the users had problems with every version. On the website of the HCII conference, the display of the time slots were more or less hidden and the long list overwhelmed the users. The CHI website presented sessions only in its program, with papers popping out after a click on a session. Most of the users did not recognize that the entries shown represent sessions, not papers. Finally, the MobileHCI site showed only a list with session titles, again without any notice of their status as sessions. To get an overview of the paper in a session, one had to click on the session – this was not clear to the users. Altogether, the CHI had a huge advantage in this step, because it provides a search function. This would be desirable for the other two programs too.

The presentation of the recommended hotels on the MobileHCI website was another problem. The website presented the hotels in different lists with a lot of text in between and no headers. It was not clear to the users that the different lists show different price categories and without scrolling, they did not even find the further (cheaper) hotels. Therefore, they often did not find the cheapest hotel.

For all phases in the study, the differences between the menu entries “Venue”, “Attending” and “Participate” was not clear enough so that the test users opened the wrong page when executing several tasks concerning content related to these menu items. This problem occurred mostly on the MobileHCI website, but there were several problems with the wording in the menus on the other websites, too. Figure 2 summarizes all problems found. It shows the usability findings grouped by category over all websites. Table 4 gives an overview of the observed usability problems along with their severity ranking and their relative occurrence for the two user groups.

Fig. 2.
figure 2figure 2

Usability findings by category

Table 4. Severity rating of usability problems.

4.4 System Usability Scale and AttrakDiff

According to the System Usability Scale questionnaire, none of the websites show good usability (see Fig. 3). For both user groups, the CHI website was the most usable one, but still with merely acceptable results. Big differences exist between results of the first user group and the experts group concerning the MobileHCI website. Experts judged it only four points worse than the CHI website whereas first users said it to be the worst of the three websites with eleven points behind the CHI website on the SUS scale. Overall, the expert group always gave the websites a higher rating than the first users. However, with a two sided t-test only the difference between the two groups was significant (t = 2.1680) and the difference of the websites among themselves were not significant.

Fig. 3.
figure 3figure 3

SUS results

According to the AttrakDiff, the CHI website again was able to score the best results. Its confidence interval partially overlaps the “task-oriented” area that is the best of the three pragmatic results a product can score. All websites are considered “neutral” which means that none shows especially good user experience and each website needs improvements.

5 Conclusion

The findings of the study presented in this paper show that expert users were in average more efficient in completing the tasks. Yet the comparison of the results of expert and novice users predominantly do not show significant differences. It can be said that previous experience does not make a big difference concerning task success and task time. Depending on the task, each website showed strengths and weaknesses. An outstanding website with respect to tasks, steps, and measures could not be identified.

Some limitations of this study are obvious: The usage of the Wayback Machine influenced the selection of the test website: As not all conference websites are archived to the same extent, the number of possible sites was constrained. Another critical point is that the study should be conducted over a longer period, appropriately accompanying the actual conference participation progress of the participants. This approach would allow for a free choice, as every conference website would be suitable to be tested.

Practice what we preach: The study results show that there is ample room for improvement concerning usability and UX of HCI conference websites. It might be suitable to focus even more on the different steps (see Table 1) in the attending process and adapt the website to the users’ need over time.