Keywords

1 Introduction

Evidence shows that emergency management training and exercising of plans and standard procedures can significantly improve the efficiency of collaboration between responders in the field and in emergency operation center [1], together with disaster response in general. The training is generally designed as middle- to large-scale field drills, or as a tabletop exercise with graded-difficulty challenges. The use of serious game techniques as the core of simulated or virtual training tools opens for new ways of training and learning in emergency and crisis scenarios [2]. However, available virtual training tools that address the needs of diverse user groups, such as disaster and emergency managers, first-responders and civilians, are limited and potentially costly. Emergency and rescue services are generally well-trained for the most frequent or predictable types of disasters, such as fires [3, 4]. The problem arises in complex, unpredictable disaster scenarios where it is challenging to cover the dynamic training needs of the user groups involved. In addition, the locality factor, i.e., the tie that the most frequent or likely disasters have to particular areas, regions or countries, may interfere with how stakeholders learn about the risks faced in other geographical areas, making training tools that did not take into account this factor rather ineffective in particular contexts. Thus, when designing a training tool for emergency and disasters, one of the key steps for successful training is to include all stakeholders’ needs and factors that may positively influence their learning.

This paper describes the co-design process [5] carried out for the development of a virtual training tool linked to the Norwegian context. Co-design refers to a cooperative design process where researchers and end-users together generate potential solutions for a defined problem. This method has already been used in the emergency management domain [6,7,8,9], as an iterative approach with participants representing various stakeholders whose degree of participation varied depending on user needs and the stage of the design process. Our co-design process aimed at the creation of an emergency and disaster virtual training tool prototype called “Operation Tyrsdal” [10].

Two workshops were run in the University of Agder (UiA), Norway, with representatives of stakeholders involved in an emergency/disaster scenario in Norway. The workshops were based on an extreme weather scenario, considered to have both high probability and high societal impact. Stakeholder representatives comprised local emergency services, i.e., police, firemen, emergency manager from a city of 90 000 inhabitants, volunteer organization, hospital and the county administration. In addition, researchers with expertise in co-design processes, emergency management, disaster e-health, and multimedia were part of the research team. The first workshop gathered end-user requirements for the training tool. We explored current gaps in the training practices, information needs and elements to improve training of decision making. In the second workshop, we focused on scrutinizing the detailed design, user interface, training use-case and learning points. The testing part focused on the design evaluation of the virtual training prototype with valuable feedback from non-real emergency responders. We included the findings from the second workshop into the prototype development, and will scale it up on a later phase into a complete virtual training testing with responders.

This paper is divided into 6 sections. Section 2 presents our literature review on co-design methods. Section 3 describes briefly the co-design method used in this article and the testing method of the prototype. Section 4 elaborates the prototype’s evolutions and its features. Section 5 reports the testing results, received feedback and discussions. Section 6 concludes the paper, with implications for future work.

2 Literature

Co-design is generally defined as a cooperation process between users and designers. It has been exercised in different contexts, such as art and design, architecture or information and communication technology. The underlying purpose is to cooperatively design an artefact for a specific problem statement, context of use or desire [11]. The term artefact comprises a wide range of possibilities, from a building to sophisticated technology solution. The common principle is that the representation of that artefact will be crafted and presented by co-designers with paper-based props, arts and crafts, developed in a technology-free environment.

Historically, co-design was initially developed in the Nordic countries [12] with the aim of enabling industry employees to take part in the decision and control of their production means. Ever since, co-design has been used in information and communication technology to transform end-users into active contributors of a technology solution design for and with them [5]. It is precisely end-users of a technology solution who expressed and describe their needs with their own voice. In this way, end-user representatives meet in the same space and time, to create a shared understanding, selection and prioritization of needs that will be later transferred into functional and non-functional requirements and look and feel of the desired system solution.

Co-design has been used in different ways and areas, such as building community participation in community strategy [13], governance [14] and developing regions [15]. Common to all of them was the recruitment, consent and interaction of user group representatives. End-user represents the person who is intended to use the outcome of the co-design process, such as a technology solution. In technology contexts it is common to present a problem-scenario that users need to get acquainted to, and in some cases, solve. In this case, the aim of the co-design process is how and what users choose, express and decide in order to solve or improve the situation described in the scenario [5]. It will be same participants who will explained by their words, what they mean and want at the end of such process.

There are examples in the literature showing co-design and similar methods, such as participatory design [7] co-creation [16], user-centred design [17], all of them used to design technology solutions involving end-users. There are also examples of studies with end-users in disaster and crisis management scenarios. For instance, community participation in the form of a collaborative action-research method by the Tasmania fire service to transform community education [18]; governance to solve policy issues on tension between the government agencies and local companies [14]; and a study of technological solutions in the domains of communication, microfinance and education in developing regions, such as India and Uganda [15]. More specifically, usability engineering principles have been described regarding crisis and disaster management [19]. However, the involvement of end-users in the co-design process of tools for disaster management is still scarce. There are several factors that may explain it. The recruitment of end-users who are usually employees require extra resources. More importantly, the unpredictability and complexity of disasters and crisis may hinder the appropriate selection of what groups are involved. Nevertheless, the probability of error, such as in communication and coordination processes, may increase the need for training and familiarity of user groups involved through a co-design process of a training tool.

3 Research Design

3.1 Co-design Methodology

The co-design process allowed to obtain results in each of the workshops. In this work, we have carried out two workshops, interviews and two development cycles using the Scrum method. In the first workshop, we identified several key game elements related to possible training scenarios, such as extreme weather, passengers trapped in a ship fire, flood, landslide/avalanche, loss of power and telecommunications infrastructure, and life-threatening violence. Several suggestions for content and feedback in the game were presented by the participants, such as: information that concerns responders, conducting responsibility, closeness, equality and cooperation principles, actionable information, consequences of a choice, and best practice feedback in the game on an unknown situation. The following factors were suggested as training purposes: cooperation, understanding of information flow, understanding of the role of decision makers, establishing appropriate situation descriptions, coordination and taking care of personnel.

For the game setting, the extreme weather and weather data were ranked as “relevant”. Scores as learning rewards and people saved were proposed as game incentives. Based on this discussion, we narrowed down the topic, and selected the extreme weather scenario for the training game, theme was mentioned by all stakeholders.

Some ideas from the first workshop were used in the prototype. The prototype included activities that were iteratively implemented using the Scrum method, consisting of seven bi-weekly iterative sprints (work cycles). The high-level requirements for the game were: focus on one, realistic scenario, concentrate on the individual experience, focus on managing resources and with dilemma situations, use local language (to enable user testing with local responders) and finally the game had to be playable from a laptop. We limited our target to Windows machines for availability purposes. The main game engine used for the development work was Unity, which allows to port the game into multiple platforms when required.

The co-design process occurred also in the Scrum processes, as we did test the developed tool especially after the 5th Sprint. Figure 1 shows the screenshot of the development of the same feature from 3rd Sprint to 7th Sprint. The human icon on the upper left represents the involvement of users or stakeholders during the development stage which was only started at Sprint 5. In the 1st and 2nd Sprints, the user interface did not reach a sufficient level of development to be shown to the testers.

Fig. 1.
figure 1

Scrum and sprints examples of the prototyping process

In the 2nd workshop, we explored various themes such as information flows, information needs of different stakeholders, and elicited the detailed user interface. The stakeholders also suggested some alternative designs that could be useful for the training purpose. The stakeholders were exposed to more information on the developed prototype. This action provided them with a more concrete experience on the expected output of the project and allowed them to adapt the requirements and suggestions more systematically.

One of the suggestions mentioned in the 2nd workshop was that the tool should fully interplay in the entire crisis organizations to enable learning about the work methods of other parties, as today emergency responders work in a “silo” mode, or sit in their own “glass-bowl”. In other words, a role-play type Virtual Training Tool (VTT) has been proposed where someone from each response unit can play, change role (login as another organization), and do a joint evaluation afterwards. In this way, each actor can learn about “outside” perspectives.

They also engaged in the workshop on eliciting elements required for the user interface. Figure 2 depicts the diagram shown in a presentation from a workshop participant on elements required in the training tool user interface. The full requirements as the results from the 2nd workshop have been documented in [20]. The results from the discussion were used as a basis for the 2nd Scrum process. In this stage, we set new requirements, such as that we wish to be able to split the visual information into different screens.

Fig. 2.
figure 2

Suggestions of user interface elements presented by end-users

The players could be more than one (preferably multiplayer), playing through networked computers, and including a map showing events and deployed resources. One of the split screens displayed a control room view. We also listed some requirements that are suitable to have, such as the possibility to combine the game with real data such as weather data, being able to simulate agents (flow of people), and have smaller events such as a tree falling over or power black-out. The last three can be considered as “nice to have” features, as in the second workshop the stakeholders suggested that information available for players is adequate if detailed visualization cannot be achieved. This was a basis for working in the 2nd Scrum process that consisted of 5 sprints. The results of the 1st and 2nd Scrum are presented in Sect. 4. However, the testing stage only covers the results from the 1st Scrum process.

3.2 Testing Methodology

The testing was held in a controlled environment where we conducted small-scale testing of the initial prototype. Moreno-Ger et al. [21] argue that prototype usability testing is especially important when the system is to be used by people who are not familiar with how to interact with new technologies, and that the usability issues are crucial. Heuristic methods with experts [22], video-assisted usability testing [22], task allocation [23], diagnostic evaluations [24], performance evaluations [24], subjective evaluations [25] and many other techniques, including the procedure suggested by the ISO 13407 [26] on the human design processes for interactive systems SUS Scale [27], SUMI [28], CSUQ [29] and USE [30], are among the possible methods to use in the user satisfaction survey.

Which appropriate method(s) to use is depending upon different conditions such as time and resources available, the possibility for direct access to the end users or any limitation in their skills and expertise, and the stage of the product development. Also, there is not a single method that alone could satisfy the testing purpose. For instance, Moreno-Ger et al. [21] point out that heuristic instruments do not always identify all the pitfalls in a design, while the prototype evaluation could fail to identify comprehensively all of the stumbling points in a design.

Currently, we do not consider that the “Operation Tyrsdal” product is in the “Test and Measure” stage, or even in the “Release” stage. It is not in the “Planning/feasibility” stage either, but rather lies between the “Design” and “Implementation” stage, where the “prototype evaluation” technique [31] or interface design pattern [32] methods could be relevant. The latter approach puts emphasis on making the end-user participants aware of design pattern examples, encouraging them to express interaction concepts in terms of design patterns, and reviewing the final results of requirements. While the former requires the testers to provide their impressions of a page design, which has also been implemented in our co-design process.

As we target the evaluation that can provide the impressions on the look and feel of the developed user interface, and identify potential barriers when interacting with the developed product, we adapted the simplified version of the SeGUE procedures [21] for testing our VTT prototype described in Sect. 4. This is a structured method to analyze a prototype play session where the evaluators can annotate events as they try to identify issues and stumbling points. This predefined set of event types is necessary to facilitate the annotation process as well as to provide structure for the posterior data analysis. Ideally, we need to have a design of the play session, selection of the testers, performance and recording of the play session, application of the instrument and annotation of the results and reconciliation of the results.

SeGUE’s framework suggests two dimensions as the basis for evaluation: the system dimension and the user dimension. The system dimension covers the functionality, layout, gameflow, content, technical error, and other events that have no categories or are not applicable. The user dimension entails learning, reflecting, satisfied/excited, mildly frustrated, frustrated, confused, annoyed, unable to continue, non-applicable, commenting and other perceptions not covered by the abovementioned categories.

The testers of the developed SG was a group of six male and two female university students, with age ranging between 24–26 years old (Fig. 3), as it was designed as a small-scale test. As mentioned earlier, we treated the game as in the “Design” and “Implementation” stage, and thus considered the use of non-real emergency responder participants to be sufficient. Regarding the number of the testers, the literature suggests that five users should be enough to detect 80% of the usability problems, as an increased number of testers do not necessarily yield new information [21]. Thus, we concluded to have a more than adequate number of testers. Intentionally, the testers had never seen the game before, as they were expected to be able to play based on the tutorial included in the beginning of the gameplay.

Fig. 3.
figure 3

Testing activities

As background information on the testers, five participants play game, while one rarely plays and another does not play games. The following games within different genres were listed as the most frequently played: Diablo 2 (action role-playing game), Heroes 3 (strategy game), Stardew Valley (farming simulation role-playing), World of Warcraft (massively multiplayer online role-playing game), Overwatch (multiplayer shooter game), the Binding of Isaac (action adventure) and the Sims (life simulation game). In addition to having experience from playing well-known games, all the testers had multimedia education as a background which added to their credibility and judgement during the testing. We also had a facilitator and observers that helped during the testing, and lead the discussion and debriefing process after the game testing. In this test seven participants delivered the questionnaires.

The SeGUE instrument was used for the usability evaluation, and was expressed as a questionnaire combining multiple choices and open questions. Our questionnaire to cover all aspects of “Event categories for the system dimension” and “Event categories for the user dimension” that were derived from the Operation Tyrsdal game [21]. Although it is suggested to use video recording and annotation from the multiple reviewers, we considered it unnecessary in our case as it was a less complex game, with approximately 30 min gameplay. The testers filled out the questionnaire and had a debriefing discussion with feedback to us. In the next section, we present the prototype results, while the testing results are reported, analyzed and discussed in Sects. 5 and 6 respectively.

4 Prototype Results

We developed two prototypes as seen in Figs. 4 and 5. The game was inspired by the big storm “Synne” that occurred in Eigersund on the west coast of Norway in December 2015, where continuous heavy rain caused significant evacuation of inhabitants. However, Synne is not the only extreme weather that has occurred in Norway. Between 1994 to 2017, there were at least 76 registrations of extreme weather by the Norwegian Meteorological Institute (MET). Since October 1995, MET provides names to each extreme weather to facilitate the communication between authorities, the public, media and other relevant actors. This practice is also recommended by the World Meteorology Organization [33].

Fig. 4.
figure 4

The graphical user interface in Prototype 1

Fig. 5.
figure 5

The graphical user interface in Prototype 2

Typical characteristics of extreme weather that will lead to warning and preparedness due to the risk of extensive damage or danger to life are: strong wind, heavy rainfall, storm tide and waves. The Synne extreme weather fulfilled these criteria, where the flood reached “red level warning” – the highest of four flood alert levels. The flooding supposedly exceeded the level of a 200-year flood. An escalating crisis over several days, such as Synne, offers a unique case to learn about managing a sequence of tight events over a longer period with limited resources. In “Operation Tyrsdal”, the time and place are fictional, but some elements such as the plots, stories, and sequence of events were modified from series of events during the storm Synne.

The technology used for developing the tool is Unity as the game engine and C# as programming language. We developed two sets of requirements in the first and second prototype. The requirement in each stage was extracted from the feedback obtained during the co-design workshops and the Scrum process. In the next sub-sections, we describe briefly the game mechanics, requirements, and the similarities and differences between the two prototypes.

4.1 Prototype 1

Prototype 1 was produced from the 1st Scrum (Fig. 4). The game was meant for a single scenario acting as On-Scene-Commander, and concentrated on the individual experience. The game platform was made for machines with Windows operating system, but in the future it would be possible to port the game to other systems as Unity supports multiple platforms.

The main background in the Graphical User Interface (GUI) is the game world showing the location of the area hit by the extreme weather. In the GUI in Fig. 4, we can see the dashboard that consists of a set of resources (the police, the health services, the firefighters and the volunteers) (1). In this screenshot, most of the resources are deployed and appear as symbols of each unit with the tracking bars in the right side of the screen (3). The main dashboard (2) is the area where the player will get the notification on an event. In the bottom right area, there are tracking bars that can be used for tracking all the resources deployed in different event locations in the game world (3). We also had an action report button (4) that would log all players’ decisions and interaction with the event and resources. Initially, we had included a quiz in the game, and the action report would also show the right and wrong answers selected from multiple choice by the player. However, this feature was then considered as a disturbance for the immersion of the game and was therefore removed from the final version of the first prototype. The time counter is seen in the upper middle of the screen (5), showing the game world time, while (6) is a button for the game setting. The game has a tutorial that will pop-up as a user starts playing. The tutorial will tell the user what to do with all different events and buttons in the GUI of the game.

4.2 Prototype 2

In the second prototype, the player(s) had the option to run the virtual training tool as a single player or as two players, where the other player can take a role as a control room operator in addition to the On-Scene-Commander role. Figure 5 illustrates the GUI of the second prototype for a single player. Most of the game world’s elements are the same, although some detailed visual objects are added such as electricity grids and some detailed buildings. The main tangible UI differences are that the game has a bigger notification dashboard (1), resources dashboard (2), and a mini-map of the game world (3). At the current stage, the design focused on the two-players networked machines. The notification dashboard (1) and the map (3) will be available in another screen so that it is less crowded compared to the single player’s GUI. This version still needs improvement, such as how to not block the view to the game world. Hence, the testing in this paper focused on the first prototype version.

5 Testing Results and Discussions

The summaries of the testing results can be seen in Tables 1 and 2. Table 1 shows the participants’ judgement of different features or elements in the game user interface, rating the features from “very good” to “very poor”. As it can be seen below, most features were judged as “neutral” which can be seen from the pie diagram icon or the color intensity of the neutral column. We focus here on the elements that were judged as “very good”, “good”, “poor” or “very poor”. We noticed that approximately half of the participants evaluated the tutorial (57.1%), text-size (42.9%), game color (42.9%) and information in the dashboard as “good”. While the poorest rated feature was “Deploying volunteer”, rated as “very poor” by 42.9% of the participants. As a background information, the volunteer resources were not immediately visible to the players, unless they clicked on a small volunteer button, deploy them, and return to the main resource dashboard. This procedure was subject for extensive comments and discussions (See Discussion section).

Table 1. The tester’s rating of various graphical user interface elements
Table 2. Results of the game experience

However, not all results are consistent with the evaluation of the players’ game experience presented in Table 2. Here, the answers are more distributed among “satisfied/excited”, “mildly frustrated”, “frustrated”, “confused” and “annoyed”. The positive consistent answer on the game features as in Table 1 are “Textsize in the Dashboard” and “Tutorial”, i.e. 85.7%, and “Setting button” in addition.

Some players were particularly “frustrated” (37.5%) when deploying the volunteers, and “confused” with the usage of the volunteer button (37.5%). Information in the dashboard (i.e. event notifications appearing in the dashboard) caused them to be “frustrated” (42.9%) and “mildly frustrated” (28.6%), respectively. While the impressions of the notification dashboard were divided equally with opposite opinions between “exciting/satisfied” (42.9%) and “confused” (42.9%). The reasons for these results are discussed further, after presenting some more results.

At the end of the session, we asked participants to describe their impression of the overall game which gave encouraging results (Fig. 6). The users saw the value of learning (71.4%), which is basically a process where the user figures out how to perform an action that was unclear before (learn to play), or when the user is actively engaging in consuming content (learn content). While in the detailed questions (Tables 1 and 2) there were some frustrations, annoyance and technical problems, from a macro perspective, actually none (0%) were frustrated, annoyed or unable to continue the game, as they were actually learning (71.4%), reflecting (28.6%) and satisfied/excited (57.1%).

Fig. 6.
figure 6

The participants’ impression on the overall game prototype

Even though half of the participants were satisfied and learning, some of them in general were also confused (42.9%). The following discussions and feedback can help explain the results shown in Tables 1 and 2, as well as in Fig. 6.

The sources for the quite good scores on satisfaction, excitement and learning come from several reasons. The testers found the overall game play to be good, because of the fact that the game met the expectations or criteria of the player such as what function should do what. The necessary buttons were easy to find, and worked as expected, although there are functions that are not yet there. Some of the users liked the game color and responded that it was not annoying for the eyes. However, the experience with the game color was not uniform as one tester considered the graphics to be a bit dark, making it sometimes difficult to track the vehicles in the game world.

On the clarity, understandability and helpfulness of the tutorial, the users considered that it was good because it was informational. The testers appreciated the tutorial since it covered all the necessary information in the beginning of the game. Text and images in the tutorial help but it should be less visible and hidden in a button such as a question mark icon, representing ask for help. If a player understood the gameplay, he/she can then skip the tutorial, and only press this button when more detailed information is needed. In other words, the players considered the tutorial to be functional, with clear responses and good explanations on what action to do in the game. By large, in Table 1 the tutorial is rated between good (57.1%) and neutral (42.9%), but 85.7% of the testers are mostly satisfied (Table 2).

Positive comments mentioned by the participants were that the test was fun and interesting. The game had a good information flow and was easy to use, and the camera tracking of the game world was satisfying. One participant thought the game to be promising, when the events and graphical user interface are handled seriously.

The confusion, frustration and annoyance originated from several reasons. Some buttons were not intuitive, such as the volunteer button. In the current version, when a player needed to activate the volunteers, he/she needed to click a very small button for volunteers and send them to the main resources dashboard. The testers did not see the point in why the volunteers should be hidden, and why they are not easily accessible like the other resources such as the police, fire brigade and health service personnel. This issue was addressed and confirmed by several testers, as represented by this statement: “Switching between the worker/volunteer tabs was very annoying; please keep the buttons static”. A user had an experience where some buttons did not function as expected. Figure 7 show the testers’ results when they were asked whether the buttons functioned correctly.

Fig. 7.
figure 7

Evaluation of all types of buttons in the game

The information in the notification-dashboard of the game was frustrating because it had a lot of lag (please consult Fig. 4 on features discussed from here onward). One player had difficulties to scroll down, so that the list of events was not findable. On the other hand, the events appear too fast, causing the player to miss some events that required actions and response. Notifications were not easy to read instantly, especially when several events happened almost simultaneously. One player claimed to lose a couple of events just because of getting several “all workers back” and “work finished” notifications almost simultaneously as a new event that needed an action popped-up on screen.

One player had trouble with the resource button (See Fig. 7), i.e. when the players should pick the right personnel to handle the events they had to click the resource button several times. From the developer perspective, it has to do with design, where the resource button is supposed to only be clicked once. If the player is used to do double clicks before something happens in other games or applications, then the picked resource would be returned to the dashboard, and only the third click would make the player succeed in selecting rescue personnel. Conversely, the right-hand event status/tracking bars were deemed confusing as well, as they stayed on screen until all personnel returned regardless of the mission status.

Some players were confused about where to find the place of an event in the game world. The players should look at the flow of new incoming events, but at the same time the players were busy with reading the information card on an event that came earlier, in order to take the correct action. A tester suggested to use the game world more actively, for example with arrows that point to off-screen events.

Regarding feedback on the game improvement, many have to do with the game’s GUI and functionalities. For example, one user suggested adding different lights such as night and day and “what time it is” in the game. Variations on the weather such as sun, rain and storm would make the game more realistic and alive than just a plain condition with a set of events that occur.

Instead of dividing the notifications into the right-hand bar and center panel, a player suggested to have these two unified. Readability would also be improved if the continuous flow of messages was dropped, and instead presenting mission cards that showed a quick overlook of the event: status, resources sent, time left, etc. without the need to click through to another window. As the game already have a good system of readable icons, the game should not have to rely on as much text as the current status.

On the topic of returning personnel, the players had no idea of how to find out how long time would personnel take to return. The testers suggested to have a simple timer next to the personnel’s name which would help users. One player suggested to add the Civil Defense as it is also considered a part of the Norwegian Crisis Management system, and therefore should be included.

To sum up, we have implemented the user testing to detect usability barriers and problems, perceptions of the user interface, and user experience when playing the game. These elements were evaluated both from negative experience such as frustrated, annoyed; technical difficulties as well as positive experience such as exciting experience, opportunity for learning and reflecting. The overall results indicated that the subjects tended to rate the game prototype as positive, as seen in Fig. 6. However, it is not the good or bad judgement that is the main focus in this user testing part, but how much the testers could reveal various usability problems, and providing alternative solutions and feedback on future improvement. At this point, we consider the testing result to be adequate as a departure point for the further enhancement of the VTT Operation Tyrsdal.

6 Lessons Learned

The selected methodologies for game development and testing have provided new understanding on the current status of the work and the results of these methodologies can be a basis for producing a final “Serious crisis management game”. The co-design approach has helped us to define what elements are important when developing a serious game for crisis management so that it is more realistic than merely relying on the literature. The results of the SeGUE instrument application to our prototype testing have not only helped us to identify improvement points as addressed in Sect. 5, but also gave knowledge in what way the user felt frustrated, annoyed or experienced barriers, and on elements that they found exciting. Some comments also provide a hint on possible technical issues that need to be solved. The lessons learned from our research for both co-creation and game testing methods are as follows:

On the co-creation methodology:

  • The co-creation with stakeholders can help sharpening the project goals, for example in our co-design process, a tool that enables learning on the different emergency response organizations’ working methods which will benefit from the training of local actors.

  • The flexibility in the elicitation process is important as sometimes there were borderline-interrelated themes supposed be discussed in two sessions, but then these “borderline” topics had fully covered already in one session. In other words, we used less time than planned in a particular topic. On the other hand, some areas can be challenging and require more time to discuss than initially planned for.

  • The audio-video recording was very useful for extracting the most relevant information from the workshop, especially requirements specification.

On the game testing methodology:

  • The players brought their own devices for the game testing. We experienced a mismatch between the testers’ machines and the game. Half of them have Mac machines while the prototype was designed for a Windows platform, assuming porting to multiple platforms would be trivial for later expansion. But this problem was solved quickly by running the testing in two sessions, which was no problem at all as the game highlights the individual experience. However, it required some extra time. For this type of testing, having extra machines as back-up would be recommended. An alternative would be to port the game to more than one platform.

  • The total time available for the discussion and debriefing was limited as the testers were also asked to fill out questionnaires. Time allocation for the discussion should be longer and a bit flexible. This would allow a facilitator to ask more detailed questions, and to stop whenever not more new information can be obtained or reaching a saturation point.

  • The suggested SeGUE technique uses annotated video during the testing, while we only observed the participants’ activities, and took notes in the discussion. Game experience on different aspects of the game was not so easy to be identified even if video recording was used, as the testers were busy solving the tasks in the game. The questionnaire really helped to elicit the user’s game experience. However, video recording is useful when extensive discussion is planned for the testing.

7 Conclusions

This paper presents our work on the development of a serious game for crisis management of extreme weather events. The game prototype was developed through a co-design process. The paper also describes the co-design methodology, i.e. workshops and the usability tests that are fit for a product that is still in the prototype or implementation stage. The workshops have resulted in a set of requirements that is then used for prototyping activities. For the usability testing, we adapted the SeGUE or “Serious Game Usability Evaluator” instrument, by focusing on “Event categories for the system dimension” and “Event categories for the user dimension”. The players substantially contributed to detecting usability problems in the Operation Tyrsdal game. Such information was very relevant for the future game development, to make the game playable by the end users, and used for reflection and learning.

The limitation of this study is that we conducted usability testing at the game implementation stage, and not at final or release stage. In addition, the testers were not end users (crisis responders), but qualified for evaluating the graphical user interface and expressing the experience and technical barriers during the testing. We did not implement the video annotation technique to supply the usability test process, which can be a future improvement on the methodology part. When the game is considered a final product, we will involve the first responders to implement a more comprehensive usability testing, combining the SeGUE method with standard user satisfaction questionnaires and the eye-tracking usability technique.