Keywords

1 Introduction

Due to the increasing power and flexibility of large projected and screen-based displays, as well as the many types of interaction now easily accessible to even moderately skilled programmers and designers, with frameworks such as Openframeworks, Cinder, and Processing [17], there has been a surge of public installation work in recent years within museums, concerts, classrooms, and public outdoor areas. Such installations frequently allow interaction of large group of users with the system.

Though touched on in some research, crowd-computer interaction, where a crowd of people whose combined actions interact with an interactive installation, remains elusive. Examples of prior work includes Baarkhuus et al’s “Cheering Meter” where the culminated volume of a crowds’ cheers determined the outcome of rap battles [1], Maynes-Aminzade et al’s “Pong” where the combined movements of a large classroom move a virtual paddle [5], and O’Hara et al’s “Red Nose Game” where crowds’ movements moved a virtual ball [6].

In this paper, we propose a new way of visualizing and identifying crowd, and furthermore an interaction approach that uses “crowd shape” as a variable that can provide specific inputs and also be affected by system outputs. We investigate three methods (blobby/approximated, precise/people tracked, and a combination of those two) of displaying “crowd shape” and study the usability of these shapes as a means of interaction between large user groups and computers, and conversely as individuals interacting with large groups through natural computer interfaces. Through these experiments, we investigate cases where user is part of the crowd, or is interacting with a crowd. We believe such a “crowd shape” can be a helpful method for groups of users to coordinate their actions or for individuals and automated systems to monitor and react to crowds, especially if we can successfully associate certain crowd features such as mood or energy to the shape.

In our experiments we focus specifically on how the visual feedback and display of a crowd can effect participation effectiveness, pleasantness, ease-of-use and suitability through the use of three objective-based exercises grouped into two different experiment phases: Ball-catch and Pattern-Match for exploring multiple users collaborating together using crowd shapes, and Swarm-Chase for exploring how individuals view and react to a crowd shape. These exercises were built to be “continuously variable and socially familiar” [7] with a group-dependent nature to reduce the social embarrassment factor [2], while also utilizing Reality-Based Interaction themes such as “Naïve Physics” and “Body Awareness Skills” to greatly reduce the gulf of execution for participants [4]. Our results, though inherently noisy, suggest that there is a significant difference between the three different crowd shapes used, and furthermore that the difference is focused on crowd shapes that reflect some feedback as to the individuals positioning in both the Precise and Combined shapes.

In Sect. 2, some related work will be reviewed. Section 3 introduces our crowd visualization methods, and Sects. 4 and 5 will discuss the experiment design and results. Some concluding remarks will be presented in Sect. 6.

2 Related Work

In the literature of social psychology and collective behavior, a group or crowd can act as a unique entity. A great example of this comes from Baarkhuus et al’s “Cheering Meter” where it was cited that “as soon as approximately 25 % of the audience is applauding, the applause quickly cascades to 100 %” [1]. In this section we will overview the research done thus far with a selection of projects that focus on crowd interaction. To remain identifiably different from the many types of interactive installations we will focus on projects where many individuals act as a “crowd” in which the group displays an “illusion of unanimity” [9] as they complete collaborative objectives.

2.1 Cheering Meter

Researchers Barkhuus and Jorgensen created a sound-monitoring system that was manually controlled to receive and measure the amplitude (volume) of the sound generated by a crowd of spectators for rap-battles. This is to determine which of the rap-battle performers received the loudest cheers, and thus arguably the victor of the rap-battle itself [1]. The audience as a crowd is significant as “many crowds are formed as audiences” [3].

By cheering with a large number of others there was no real opportunity for one to see their own or others’ individual output; though individuals did express “joy over being part of the concert” [1]. It should be noted that this could also be a weakness in crowd-computer interactions – the difficulty of seeing your own contribution and receiving individual feedback. Also the cheering meter was lauded by its researchers for its ability to enhance the performance rather that detract from it [1].

2.2 Crowd Collaborative Collective Games

Dan Maynes-Aminzade, Randy Pausch, and Steve Seitz, inspired by a crowd-controlled game at SIGGRAPH in 1991, created: “Audience Movement Tracking”, “Beach Ball Shadows”, and “Laser Pointer Tracking.” Audience Movement Tracking is a game that allowed a crowd to control a paddle’s left and right movements in a Pong-type game by leaning left or right in concert. Beach Ball Shadows uses the shadow of a beach ball hit into the air by a crowd cast to deflect missiles from hitting the virtual cities on the ground in a Missile Command type game. Laser Pointing Tracking consists of several games that track many individual laser pointers in the crowd to interact with a projected image. Specific uses of this technology are a “scratching game” that tracks laser pointers to scratch and reveal a hidden image (like a scratch and win lotto card), a graffiti wall that allows multiple coloured lines to be drawn simultaneously, and a whack-a-mole type game that required laser pointers to “catch the moles” [5].

Over the eight months, they tested these games on crowds ranging from 150–600 students. Through observations and short surveys completed afterwards by the college-level students, the developed several principles of system design and social factors, such as “focus on the activity not the game, not required to sense every participant, make the control obvious, play to emotional sensibilities of the crowd, and facilitate collaboration between participants” [5].

Taking in the principles recorded by researchers Maynes-Aminzade, Pausch, and Seitz above, we start to see a crowd-computer interactional framework form, remarking that in crowd-computer interactions it is most important that everyone feels involved even if the technology does not always allow them to be.

2.3 Urban Screen Game

Within three UK cities, researchers Kenton O’Hara, Maxine Glancy, and Simon Robertshaw created a camera and projection based collaborative game called “The Red Nose Game” [6]. Each of the three “Big BBC Screens” high above a public space features a camera image of the area directly below it. Superimposed on the camera image are several red blobs. As people walk into the camera image on the projection their bodies are tracked and are able to push around the red blobs into each other so that they combine, ultimately all combining into one large blob. When all are combined together a point is scored and the game is then restarted [6].

Crowd computer interaction is difficult. In the Red Nose and Lecture Clicker studies researchers made sure that all participants could interact; whereas when the crowd became too large, in the case of the 600 persons classrooms of the Collaborative Games study, the Cheering Meter, and with the “large modes” of the Light beyond the Edges study, the interaction became much more subtle and was based upon the collective body as opposed to the culmination of each of the individuals interactions. Do all participants feel connected to the interaction when acting as a collective body? And if they do not, is it really that important as the interaction, as in the case of the cheering meter, is merely enhancing their experience of another event (the rap-battle in this case)? Also we must consider the use of technology in crowd-computer interactions. In this sense, we find that most researchers use cameras and image processing software such as computer vision tools to help remove the technology from the hands of the participants and instead rely on more “natural interaction modes” such as voice and the “body awareness skills” [4]. This would make sense as Natural Interfaces do lower the gulf of execution, “the gap between a user’s goals for action and the means to execute those actions” [4].

3 Crowd Visualization

In both of our experiment phases, we explore three types of crowd visualization. Specifically the Blobby shape, the Precise shape, and the Combined shape, as shown in Fig. 1. While the Precise shape represents a typical view of the crowd with emphasis on individuals, the Blobby shape aims at visualizing the crowd as a single entity. Such visualization may potentially help viewers understand and interact with crowd as a single element instead of focusing on individual movements. Our primary research hypothesis is that using, or adding such visualization will help improve interaction when individual movements are less important than collective actions, and/or when a collective and overall shape can provide a better way of understanding and tracking movement.

Fig. 1.
figure 1

The three shape types used within this experiment, as visualized during the Ball-Catch exercise. From left to right, Precise, Blobby, and precise/blobby (combined).

Two different methods have been used to create these three types of visualization: using Kinect2 3D sensor, and using regular 2D webcams. The first method was more precise but could only be used for groups of up to six users, i.e. small crowd. This was the case in two of our research experiments that involved participants acting as a member of group. The second phase of experiments involved participants interacting with a large crowd “from outside” and the second method of creating visualization was utilized in that case.

The Precise shape represents the unique silhouettes of each participants, each coloured a different colour to better represent each individual. This is accomplished by tracking each individual participant using the Kinect2 for Windows 3D depth camera [17] which we found was much less noisy than using a web camera and computer vision libraries such as OpenCV [14]. For our Blobby shape we took all participant user silhouettes and combined them into a single texture we could then apply various per-pixel filters onto such as a blur, erode, dilate, and finally a threshold until we got a very rough approximation of the participants. For the Combined shape, we layered the precise silhouettes onto the Blobby shape. This method was used in Ball-Catch and Pattern-Match experimental games where participants were part of a relatively small crowd.

For the second experiment phase involving the Swarm Chase game we used a slightly different method. As we wanted to simulate a much larger crowd of 30 participants we wanted to create an application that would more likely resemble the sort of application that would be used for actual crowd-computer interactions. Because of the six user tracking limit, and relatively short range, of the Kinect2 we decided to use a colour camera in this case – a PS3 Eye camera noted for its low cost and high quality image.

To track the participant we used an OpenCV face-tracking algorithm, smoothed with a Kalman filter for predictive tracking [14]. We decided that we only needed to track the face of a participant as we could assume they would be facing the screen (and thus also a well positioned camera). Experiments training our own head-tracking algorithm were far too slow to be used in real-time; and deemed unnecessary for one participant. For the Precise shape we merely displayed a green person graphic where the participants position was determined, and the crowd was a 2D point cloud that moved together using variations of Craig Reynolds steering behaviours [19]. Each point was displayed as a salmon-coloured person graphic. For the Blobby shape we took the crowd point cloud and connected each into triangles using Delaunay Triangulation [16] to form a polygon which we could then draw into an OpenGL framebuffer object and blur using pixel/fragment shaders. Again, for the Combined visualization, we layered the Precise shape onto the Blobby shape (please see Fig. 1, farthest right image).

4 Experiment Design

The fundamental idea behind this research was that visualization through appropriate crowd shape can improve the users’ performance in systems with large number of concurrent users. In order to verify this hypothesis, we started by developing three possible crowd shapes as described in Sect. 3. These shapes represent two possible approaches to visualizing crowds: individual and collective. Our initial pilot tests showed that users can potentially be interested in both of these approaches so we introduced a third “combined” option that we hypothesized will be associated with the best performance.

Once the possible visualizations have been determined, we designed two sets of experiments that demonstrate possible forms of crowd interaction: participating as part of the crowd, and participating against a crowd. First, we explored how several participants work together as a group (in this case 5–6 participants), creating the “illusion of anonymity” [9], to interact with a large screen in two objective-based forms; and secondly an additional experiment that explores how an individual reacts to a crowd shape without the added noise of an actual crowd present – instead focusing on the shape itself, generated from a simulated crowd and crowd movement. Overall we ran three experiments, two for the first “group” phase (Ball-Catch and Pattern-Match) and one for the second “individual” phase (Swarm-Chase).

At the end of each experiment we gave each participant a questionnaire to fill out that asked them to rate on a 7-point Likert scale, chosen for greater variance, how strongly they felt about experience across four dimensions: Effectiveness, Pleasantness, Ease-of-Use, and Suitability. Our pilot studies very closely aligned with our final experiments; and helped to point out some bugs in the code as well as determine that in order to reduce learning bias towards trials ran after previous trials we had to run the experiments in different orders for each group/participant. We were also able to better focus our questionnaires with better descriptions, and even images in Swarm Chase, to help understanding.

Each program was developed using Cinder Frameworks [12], a C++ and OpenGL coding framework, running on a PC to be displayed on a large 54ʺ TV and using a Kinect2 for Windows depth camera and SDK [17] for the first two experiments; and a PS3 Eye camera and the OpenCV C-based library for live image processing and facial detection [14] for the last experiment, Swarm-Chase (Fig. 2).

Fig. 2.
figure 2

The three “games” developed for testing the roles of shape in crowd-computer interactions. From left to right we have Ball-Catch where balls falling from the top had to bounced into a virtual basket within 30 s, Pattern-Match, where a score was displayed signifying how closely the participants crowd shape matched the stegosaurus silhouette, and Swarm-Chase where participants were asked to avoid a simulated crowd wandering.

And after each trial was completed, the score was noted (if appropriate), and each participant was asked to fill out the same 7-point Likert scale questionnaire that rated each crowd shape’s effectiveness, pleasantness, ease-of-use, and suitability during the experiment, as well as any additional comments. The results of these tests can be found in Sect. 5.

Ball-Catch was developed to test a group’s ability to work together to form an optimal shape for collecting falling circles into a basket on the screen. The shape that would provide a platform for the circles to collide and collect upon was the crowd shape itself; and each group was given 30 s to collect as many balls into the basket as possible.

For Ball-Catch each group of 5–6 members would stand in front of a large 54ʺ screen that displayed the crowd-shape and game, and a Kinect for Windows version 2 depth camera that allowed us to capture and track each users silhouette so that we could colour each uniquely for the Precise shape trial.

Pattern-Match was the second experiment of the multi-user phase that asked participants to use their crowd-shape displayed on the screen to match another shape displayed on the screen in front of them (the silhouette of a stegosaurus). Each group of 5–6 was given as much time as required to try and get the highest score possible for each of the three crowd-shape types (score is a number from 1–100 that reflects how closely the groups’ crowd shape matches the stegosaurus shape, 1 being no intersection at all to 100 a perfect fit).

Swarm-Chase was developed as a mirror of the interactions we were studying in the first phase of experiments with groups of participants for Ball-Catch and Pattern-Match. Instead of studying how a group of people interact together, we created an application that allowed us to explore how an individual can react and relate to a crowd displayed on the screen as one of the three crowd-shapes. In this particular case, using flocking algorithms [15], we simulated the movement of a crowd of thirty salmon-coloured people moving around on a large projected display; and using facial detection to track the participant as a green-coloured person that could move around the display by moving side to side and up and down when facing towards the screen. In this sense we were hoping to capture the type of full body interaction we would expect in an actual crowd interaction.

Their objective was simply to avoid the crowd as they randomly moved together across the screen, which the researcher could subtly control by setting points of interest so that the crowd was always moving close and/or towards the participants avatar on the screen. For each trial we spent about 30 s having the participant actively avoid the crowd for each crowd-shape type presented.

5 Experimental Results

The first phase of experiments included 3 groups of 6-5-6 members, giving us 17 participants in total for “group” part. In the “individual” part, Swarm-Chase, we had 20 participants. For both phases we had approximately 67 % males and 33 % females, and most participants were in the process of, or completed, a post-secondary degree. Ages for the first phase were almost completely within the range of 18–24 (predominantly university students) while the second phase expanded to include most of its participants within the ages of 18–34 (predominantly university students and young secondary school teachers), with 22 % ages 35–54 and another 22 % 55+. Computer expertise was slightly higher in the lower age groups. It should be noted that in the Swarm-Chase two participants’ data were removed as the questionnaires were filled incorrectly and their comments afterwards about “hoping they answered how the researchers would like” suggested a possible Hawthorne effect where they were not answering for themselves [18]. This brought down our sample total for Swarm-Chase from 20 to 18.

5.1 Quantitative Data

Table 1 shows the mean and standard deviation for responses on our evaluation criteria in three experiments.

Table 1. Mean and standard deviation (in brackets) for all evaluation criteria

As these results were obtained from Likert scale data (ranges 1–7) we can only assume that the data is both ordinal and non-normal leading us to analyze them using non-parametric statistical methods [11]. Since we used repeated testing procedures with the same participants to collect the data respective to each of the three crowd shapes (Blobby, Precise, and Combined) across four dimensions (Effectiveness, Pleasantness, Ease-of-Use, and Suitability), we used a Asymptotic Sig. (2-tailed) Friedman test to determine if there is significant variance between the crowd-shape response data [11]. The results are shown in Table 2.

Table 2. All recorded Friedman P-Values at 0.05 alpha - asymptotic Sig. (2-tailed) with degrees of freedom of 2. Those highlighted represent where we see significant differences between the response data i.e. we reject the null hypothesis that the samples are the same.

After determining where there are differences between the three dependent groups of data we conducted further post hoc testing using the Wilcoxon signed-rank test to determine where the significant differences lie between each pair (Blobby-Precise, Blobby-Combined, and/or Precise-Combined) [11]. Please see Table 3 for each p-value calculated at alpha 0.05.

Table 3. As the Friedman and Wilcoxon test medians this table presents all the point estimates (pseudo medians), consistent with the Wilconox test. We ignore Ball-Catch as there were no significant differences detected by the Friedman test.

In Fig. 3 we can see two examples of the point estimates/pseudo-medians of both Pattern-Match and Swarm-Chases’ graphed with their appropriate confidence intervals calculated in R.

Fig. 3.
figure 3

Visualized here are the point estimates (pseudo medians) and confidence intervals consistent with the Wilcoxon signed-rank test that show possible differences between the Blobby, Precise, and Combined shapes for the pleasantness response for the Pattern-Match (left) and Swarm-Chase (right) experiments.

Looking at the point estimates in Fig. 3 we can see that there should be some differences between responses and so Wilcoxon signed rank tests were performed on all sets that passed the Friedman test, using a Bonferroni adjustment of the p-value from 0.05 to 0.017 (0.05/3) to determine significance in Table 4.

Table 4. All Wilcoxon signed-rank tests on data pairs for each game type (except Ball-Catch as no Friedman significance detected). If significant i.e. p < 0.017 (via Bonferroni adjustment) then the cell will be highlighted.

Looking at our results we see that there are some perceived differences and patterns within the data. We can see that the Friedman test exposes some sample fluctuations in Pattern-Match and Swarm-Chase across all four dimensions, with the exception of suitability in Swarm-Chase.

Within Pattern-Match we can see that after the Friedman tests expose differences within the samples, that using Wilcoxon post hoc tests, show the differences tend to lie between the Blobby and Combined shapes for both Pleasantness, Ease-of-Use, and Suitability. Interestingly the Combined shape is considered more pleasant, more easy-to-use, and more suitable than the Blobby shape in Pattern-Match. We can also see that the Precise shape is rated higher than Blobby in both Pleasantness and Suitability. We can see this result echoed in Swarm-Chase where the significant differences within the sample lie between the Blobby and Combined shapes in both Effectiveness and Pleasantness. In Swarm-Chase the Combined shape is deemed both more effective and pleasant.

5.2 Comment Data

Overall we observed that participants seem to enjoy themselves during the crowd-based games: Ball-catch and Pattern-Match. Ball-Catch did seem to incite more critiques about the technology in the comments where we found 7 unique mentions of lagginess or low framerate; and during play the “slowness” was often attributed as an issue to enjoyment and objective particularly when using the Combined shape.

Also, participants seem to generally prefer a shape that had the Precise or “people” shapes present, as there were positive reflections (comments that use wording such as “prefer”, “liked”, or “best”) within the comment sections about both the Precise and Combined shapes; though interestingly none about the Blobby shape exclusively. Additionally, when looking through the data we do see references to participants enjoying the combined aspect, but critiquing its size, with comments such as it “was easier to get the approximate shape but the blob outline was too large”, “method was pretty good but maybe make the Blobby shape a bit smaller”, “Best. Would be nice to have a less wide shape”, “Overall I found it best”, and interestingly “favourite b/c I could see the ‘people’ and the boundaries …”.

Participants also came across as quite interested in the interaction as many suggested possible changes to the shapes and to the game itself with 26 mentions of changes to make the game(s) better and 42 mentions of how to make the shapes better within the comments section, focusing on the Blobby shape in particular as a source of ambiguous visual design: In the comments we see sentiments of this where the Blobby shape was mentioned as showing “little information on what the blob represents”, “not resembling much different human beings”, needed “more stimuli to represent the crowd”, and “hard to distinguish who is where”. Fortunately, only 3 comments were made about how either the shape or exercise was confusing.

6 Discussion

Looking at the results, we can see that the Blobby Shape seems less effective, less pleasant, less easy-to-use and less suitable than the Combined shape in all areas we determine there is significance in Pattern-Match and Swarm-Chase. Additionally, we can see that the Precise shape is also determined to be more effective, pleasant, easy-to-use, and suitable than the Blobby shape as well. There is no significant difference detected between the Precise and Combined shapes, although the median graphs of the Combined and Precise shape seem to have differences that suggest the Combined shape is generally preferred; and the comments received also seem to suggest this same preference for the Combined shape. Future studies would likely best focus here with larger sample sizes.

This follows well that participants would better enjoy experiences where they are not presented with merely an abstract shape but a shape they can recognize as a group of persons. Wanting to see themselves in the shape is not too surprising as Snibbe et al. write of the power of shadows is user experiences [7] and one reality-based user interfaces main principles concerning “body awareness” [4]. Even in much larger groups our participants seemed much more comfortable with having some sort of visual feedback representing where they are, and where they are relative to others.

As for why Ball-Catch experiment showed no significant differences, we would likely hypothesize it is a combination of the difficulties in becoming more comfortable with the interaction as all three groups started the experiment with that interaction, as well as technological issues where many of the comments mentioned the low framerate, of the Combined shape particularly, being a deterrent to their enjoyment and usability of the experience.

7 Conclusion

In this paper, we have presented three potential methods for visualizing a crowd, and also the research results on their usability. We feel that these experiments help expose how important individual feedback still is, even in crowd interactions - that participants still strongly want to see how they contribute to the whole. Furthermore, though the results are not strong enough to conclusively state that the combined Precise and Blobby shape is the preferred shape of the three, for dynamic crowd-computer interactions we feel with further tests and larger sample sizes that this would likely be the case as individual feedback is important; but also is a more approximate or “blobby” understanding of the interaction of the entire crowd on its surrounding environment.