Designing Crowdcritique Systems for Formative Feedback

Easterday, Matthew W.; Rees Lewis, Daniel; Gerber, Elizabeth M.

doi:10.1007/s40593-016-0125-9

Download PDF

Matthew W. Easterday¹,
Daniel Rees Lewis¹ &
Elizabeth M. Gerber¹

2493 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Intelligent tutors based on expert systems often struggle to provide formative feedback on complex, ill-defined problems where answers are unknown. Hybrid crowdsourcing systems that combine the intelligence of multiple novices in face-to-face settings might provide an alternate approach for providing intelligent formative feedback. The purpose of this study was to develop empirically grounded design principles for crowdcritique systems that provide intelligent formative feedback on complex, ill-defined problems. In this design research project, we iteratively developed and tested a crowdcritique system through 3 studies of 43 novice problem solvers in 3 formal and informal learning environments. We collected observations, interviews, and surveys and used a grounded theory approach to develop and test socio-technical design principles for crowdcritique systems. The project found that to provide formative feedback on ill-defined problems, crowdcritique systems should provide a combination of technical features including: quick invite tools; formative framing; a public, near-synchronous social media interface; critique scaffolds; “like” system; community hashtags; analysis tools and “to do” lists; along with social practices including: prep/write-first/write-last script and critique training. Such a system creates a dual-channel conversation that increases the volume of quality critique by grounding comments, scaffolding and recording critique, and reducing production blocking. Such a design provides the benefits of both face-to-face critique and computer-support in both formal and informal learning environments while reducing the orchestration burden on instructors.

Improving collaborative problem-solving skills via automated feedback and scaffolding: a quasi-experimental study with CPSCoach 2.0

Article 14 February 2024

Supporting Effective Collaboration: Using a Rearview Mirror to Look Forward

Article 21 October 2015

An Annotation Protocol for Collecting User-Generated Counter-Arguments Using Crowdsourcing

Formative feedback from human experts is one of the most effective interventions for promoting learning (Hattie and Timperley 2007). Unfortunately, providing this feedback to learners is difficult to organize because experts are in high demand. As the number of learners increases, the orchestration challenge for experts becomes nearly impossible (Dillenbourg and Jermann 2010). The consequence is a failure to learn critical skills, knowledge, and attitudes.

To overcome this problem, Artificial Intelligence in Education (AIED) researchers have embraced a 1-1 human tutoring paradigm in which intelligent tutoring systems, rather than human experts, provide feedback to learners on well defined problems such as solving Algebra equations (VanLehn 2006, 2011). While researching and developing an intelligent tutoring system is expensive, once developed, the system can be scaled to many learners, lessening the burden on human experts. However, intelligent tutoring systems are typically unable to provide the same level of feedback on ill-defined problems such as designing a new product, writing a short story, or developing a public policy intervention, especially when computers cannot interpret the problem-solver’s work or when problems and solutions are not known ahead of time (Lynch et al. 2006). This makes it difficult for intelligent tutoring systems to provide feedback on the most complex types of problem solving. A new approach is needed to provide intelligent formative feedback on complex problems to learners at scale that reduces the orchestration challenge for instructors.

Group critique, in which a networked system combines contributions of multiple crowdcritiquers offers a potential alternate paradigm for providing intelligent feedback to learners. The spread of usable, low-cost networked devices makes crowdsourcing approaches increasingly feasible. Rather than program expert intelligence into a single system, crowdsourcing systems organize efforts of multiple individuals to make the group more effective (Bernstein et al. 2010), and unlike expert systems, crowdsourcing systems can potentially provide feedback on ill-defined problems. Crowdcritique is an exciting paradigm for a number of contexts, such as learning environments in which there are many more novices than experts; and in an increasingly large number of other domains, such as computer interface design (Kulkarni et al. 2015), venture formation (Hui et al. 2015), and writing encyclopedias (Giles 2005), in which many coordinated crowdworkers can perform tasks as well as experts. However, crowdcritique approaches scale only to the extent that there are available, mobilizable crowds whose cumulative efforts produces high quality feedback. Therefore the development of crowdsourcing systems faces challenges, such as how to distribute work to produce an intelligent result, and how to motivate crowds (Kittur et al. 2013). In this project, we ask: how might we design crowdcritique systems that provide useful feedback on complex, ill-defined problems and are sufficiently desirable to learners so as to reduce orchestration demands on instructors?

We investigate this question in the context of informal and formal studio-based learning environments (Sawyer 2012; Lawson and Dorst 2009), in which face-to-face groups provide feedback on project work supported by networked devices (Smirnov et al. 2016). Although crowdwork research has often focused on large groups of anonymous workers completing simple tasks in isolation for payment (as on Mechanical Turk) (Kittur et al. 2013), here we examine how online crowdwork systems can facilitate critique by mobilizing large groups of face-to-face novices in design studios. Crowdsourcing feedback in studio-based learning environments may benefit learners by increasing the amount of high-quality feedback when there are limited numbers of experts; benefit instructors by reducing the orchestration demands of managing studios; benefit critiquers by developing critique abilities; and benefit the studio community by building stronger ties among members. Crowdsourcing feedback thus has the potential to benefit a wide variety of learning environments including design and engineering courses; project-based learning environments; professional research, development, and design teams; and more.

Background

Formative Feedback in Project-Based Learning

To understand how crowdcritique systems might better provide formative feedback we can look to design disciplines in which group critique is already well established (Barrett 2000; Dannels and Martin 2008; Klebesadel and Kornetsky 2009; Schrand and Eliason 2012). This includes disciplines such as architecture, art, graphic design, and industrial design, as well as newer design disciplines such as interaction design and service design. An understanding of group critique in these disciplines can be applied to almost any professional discipline because as Simon points out:

Everyone designs who devises courses of actions aimed at changing existing situations into preferred ones… Design, so construed, is the core of all professional training… Schools of engineering, as well as schools of architecture, business, education, law, and medicine, are all centrally concerned with the process of design (Simon 1996; p. 111).

Group critique can even be used in the natural sciences and liberal arts, for example, by critiquing the design of an experiment or critiquing an essay. Group critique is a general method of providing formative feedback that can be applied across disciplines.

In most cases, design disciplines are taught through project-based learning (PBL). In PBL, students work to create a solution to a problem that takes the form of “advice or a product that answer a—research or practical—question” (van Merriënboer and Kirschner 2012, p. 62). A number of domains use PBL to teach students how to build solutions to problems such as building scientific experiments in science classrooms (e.g. Blumenfeld et al. 2000; Kolodner et al. 1998, 2004), products and services in engineering (e.g. Gerber et al. 2012), curriculum in education (e.g. Easterday et al. 2014), paintings, sculptures, videos and games in art and design (e.g. Sawyer 2012), and communication in English language learning classes (e.g. Moss and Van Duzer 1998). Krajcik and Shin (2014) describe the key features of PBL as: (a) confronting learners with a guiding question or problem to solve, (b) developing important disciplinary knowledge and skills and engaging in disciplinary practices, (c) asking learners to create and share artifacts with peers, teachers, and community members as they work on the problem, and (d) using technological scaffolds to help learners solve the problem. Project-based learning works well for teaching authentic professional skills because educators can select challenges and contexts that mirror professional experiences.

In PBL, tackling ill-defined problems can be challenging for learners and feedback cannot currently be easily automated, so instructors build in multiple opportunities for peers, teachers and community members to give learners formative feedback throughout the process (e.g. Blumenfeld et al. 1991; Brown and Campione 1994; Edelson 2001; Kolodner et al. 2004).

Face-to-Face Group Critique

While there are many ways to provide formative feedback, face-to-face group critique, or “crit” is by far the primary feedback vehicle of choice in professional design disciplines (Buxton 2007; Moggridge and Smith 2007) as well as the “signature pedagogy” of design education (Barrett 2000; Crowther 2013; Dannels and Martin 2008; Hokanson 2012; Klebesadel and Kornetsky 2009; Sawyer 2012; Shulman 2005; Lawson and Dorst 2009; Oh et al. 2013). Crits are useful to both design professionals and learners because design is a non-routine, problem-solving, activity that continually places the designer in the role of learner, who must understand the unique, particular context of the design problem and find ways to solve it. Designers use critique to benefit from others’ expertise and perspectives to overcome barriers in the innovation process and to test and improve designs (Kelley and Littman 2001; Lande and Leifer 2010). The diverse feedback from group critique helps highlight gaps between the intentions and perceptions of a design (Tohidi et al. 2006) and helps designers overcome impasses and indecision common in design projects (Lande and Leifer 2010). Designers place enough importance on face-to-face group critique to organize their studio space to facilitate it, for example, through open meeting rooms, standing tables for impromptu critiques in the hallways (Kelley and Littman 2001), and corkboard walls by designers’ desks for pinning-up drafts for commenting (Buxton 2007). Face-to-face group critique is widely used across a wide variety of educational contexts including design disciplines such as fashion design, architecture, and industrial design, graphic design, (Dannels and Martin 2008; Schrand and Eliason 2012) as well as the arts, which includes painting, sculpture, theater, etc. (Barrett 2000; Dannels and Martin 2008; Klebesadel and Kornetsky 2009), and any discipline requiring design thinking in general (Lawson and Dorst 2009; Cross 2011).

In face-to-face, formative group critique in education, designers present work followed by public discussion of the work with peers and experts (Barrett 2000; Crowther 2013; Dannels and Martin 2008; Hokanson 2012; Klebesadel and Kornetsky 2009; Schön 1987; Schrand and Eliason 2012). Discussion is a necessary part of the critique because in ill-defined problems, the problem of design work is constantly shifting, requiring participants to understand and question the goals and assumptions of the design challenge itself (Hokanson 2012, p. 77; Cross 2006; Gero 2002). Discussion of the design work can be interpretive (what is the work about?), evaluative (is it good work?), (Barrett 2000, p. 30) and deliberative (what should be done next?). While instructors may run critiques in varied ways (Barrett 2000; Dannels and Martin 2008; Hokanson 2012; Klebesadel and Kornetsky 2009), they will often provide critique multiple times over the course of a project (Dannels and Martin 2008; Hokanson 2012; Klebesadel and Kornetsky 2009; Schrand and Eliason 2012), invite guests from outside the classroom such as experienced students, practitioners, users, and project clients (Barrett 2000; Dannels and Martin 2008; Gerber et al. 2012; Schön 1987); and provide question prompts to guide feedback (Barrett 2000; Dannels and Martin 2008; Hokanson 2012). Instructors and students report that good critiques: (a) provide learners with a lot of feedback, (b) maintain a formative atmosphere, (c) provide a record of feedback and suggested changes; and (d) avoid instructor dominated conversation (Barrett 2000; Klebesadel and Kornetsky 2009).

Researchers have made a number of arguments for group critique improving performance and learning. Group critique combines feedback from a single expert via an instructor or senior designer; peer feedback from other students or designers; and reflection via comparison with peer solutions—all of which can improve learning. Feedback is a powerful learning intervention in general (Hattie and Timperley 2007). Students self-report that formative group critiques are: more helpful for their learning and more comfortable than other kinds of feedback practices (Schrand and Eliason 2012). Groups of peers can also provide useful feedback. Reviews of the peer assessment (PA) literature have found that feedback from peers can improve performance as much as, and sometimes even better than instructor feedback (Topping 1998) and that peer assessment can be valid and reliable, improve domain skills, improve critique skills and develop positive attitudes toward peer assessments (Van Zundert et al. 2010). In some cases, giving feedback can even improve learning more than receiving feedback (Lundstrom and Baker 2009). During critique, critiquers are exposed to other solutions and problems solving approaches and gain additional practice evaluating work; both enhance learning (Ambrose et al. 2010; van Merriënboer and Kirschner 2012).

There are many possible explanations for why group critique provides effective feedback. First, many critiquers provide the wisdom of the crowd (Hui et al. 2014a; Surowiecki 2004) where increasing the number of people more quickly identifies problems (Cho et al. 2008; Raymond 2001). Second, in peer critique, learners may be able to give feedback that is more easily understood by other novices than the instructor’s feedback (Cho et al. 2008) because they do not suffer from the expert’s blind spot, where experts fail to explain what is obvious to them but obscure to novices (Ambrose et al. 2010 p. 99). Third, giving critique helps critiquers learn what aspects to focus on to increase critique quality, which are skills that they can apply to their own design work (Ambrose et al. 2010). Fourth, viewing other designers’ work provides critiquers with additional concrete examples of design solutions, strategies and pitfalls (Crinon 2012; Klebesadel and Kornetsky 2009) which is a kind of cognitive feedback that helps learners develop schema and skills for evaluating their own and others’ work (van Merriënboer and Kirschner 2012, section 7.5).

While group critique can be effective, the design principles for designing group critique for specific contexts and outcomes remain unclear. Reviews of the peer assessment literature conclude that given the enormous variety both in critique practices and in research “…at present it is impossible to make claims about what exactly constitutes effective [group critique] and … under which conditions certain methods result in preferred outcomes remains unknown” (Van Zundert et al. 2010). Another review concludes that we know little about the interpersonal mechanisms that affect peer assessment outcomes (van Gennip et al. 2009).

Not only are the design principles for group critique unclear, group critique also faces a number of challenges as identified by research as well as our collective experience practicing design over the last two decades. These include:

Lack of expertise. For feedback to be effective, novices need an appropriate level of challenge and scaffolding (Hattie 2009; Hattie and Timperley 2007; Hui et al. 2014a). Learners may be less likely than experts to understand feedback because of their lack of knowledge and the additional cognitive load imposed by working in an unfamiliar domain. Novices need help identifying problems and solutions in peer work (Patchan et al. 2009) and with critique precision (Cho et al. 2006a). Novices also experience high cognitive load during critique and are therefore less likely to understand and benefit from critique when receiving a large amount of information at once.
Evaluation apprehension. The expectations and norms of group meetings are critical for reinforcing the way the team works and learns so group critique also depends upon people feeling psychologically safe to receive feedback without fear of evaluation (Edmondson 1999). Novices have limited experience with giving and receiving feedback and their unease about their competence and fear of evaluation compound this problem (Connolly et al. 1990).
Production blocking. Group critique is typically conducted in small, face-to-face groups so designers and critiquers can establish the common ground (Clark and Brennan 1991) necessary for critique (Buxton 2007; Cross 2006; Moggridge and Smith 2007) (Fig. 1a). Unfortunately, face-to-face critique requires designers to listen and record feedback from multiple critiquers and does not scale well to larger groups due to production blocking, because critiquers in large groups must wait their turn to speak (Gallupe et al. 1991).
Fig. 1
Crowdcritique systems might provide best of both face-to-face and computer-support
Full size image
Free-riding. Group work is also susceptible to free-riding, where each individual contributes less effort in a group setting than they would if working alone, because there is a diffusion of responsibility (Gallupe et al. 1991).
Recording. Novices often fail to record the feedback they are given. Students retrospectively report that critiques are more successful when they regularly record and reflect upon feedback (Barrett 2000) which they do not always do.
Synthesizing critique. In successful critiques, students receive a significant amount of feedback at a time, which can be challenging for them to synthesize (Hokanson 2012).
Facilitation scripts. In addition, novices may not know the importance of, or how to facilitate group critique, so they struggle to manage participation.
Attitudes toward critique. Novices may doubt the quality of peer critique (Cho et al. 2006b; Sue-Chan and Latham 2004), so they may ignore it.

In addition to these intrinsic challenges of face-to-face group critique, crits can also be run in non-productive, even abusive ways that offer little or no cognitive feedback and reduce motivation. It is relatively easy to find stories of instructors publically humiliating students by declaring their lack of talent; laughing at a student’s work and walking away without comment; and “improving” student work by smashing, defacing, or throwing it out the window (Barrett 2000)—pedagogical tactics that likely decrease learning and motivation (Hattie and Timperley 2007). This is especially problematic in project-based learning environments where learners take on ill-structured problems and are sensitive to emotional shocks that lead to a “paralysis of inaction” (Lande and Leifer 2010, p 275).

So while face-to-face group critique, when done well, appears to be an effective way of providing formative feedback, research does not provide clear guidance on how to make it most effective or how it might be improved. As an established pedagogy with a relatively long history, some argue that: “this somewhat antiquated mode of dialogue is overdue for technological intervention” (Crowther 2013, p. 20).

Crowdfeedback

Researchers in computer-supported collaborative learning, computer-supported collaborative work, and human-computer interaction have long sought to understand how social technology can overcome the challenges of design group-work practices such as: electronically brainstorming new ideas with peers (Gallupe et al. 1991), sharing reference material on digital corkboards (Buxton 2007), emailing experts for manufacturing advice (Hui et al. 2014a, b), and using social media to request critical resources for project completion (Gerber et al. 2012). This suggests that technology might also be able to support group critique through online systems where designers share work with critiquers who provide feedback online. While this previous research has not focused specifically on crowdfeedback systems, it identifies issues for online systems such as those for brainstorming that may also affect crowdfeedback systems, such as: production blocking (Gallupe et al. 1991), evaluation apprehension (Connolly et al. 1990), and free-riding (Gallupe et al. 1991).

In critique, the designer (whether an individual student or a project team) receives feedback from another group such as a classroom, studio, or even external coaches or reviewers. So online critique can be thought of both as a form of computer-supported group work and also as a form of crowdsourcing , where a group (the critiquers) does work (gives feedback) on behalf of the crowdsourcer (the designer or instructor). Researchers have yet to agree on a precise definition of crowdsourcing, but most definitions involve one party making an open (online) call for contributions to multiple other parties who are not formally obligated to respond to the call (Estellés-Arolas and González-Ladrón-de-Guevara 2012; Howe 2006). Other definitions, including that by Howe, who coined the term, emphasize that crowdsourcing replaces activities traditionally done by a small “in house” group with activities done by a larger external network (Howe 2006). Critique qualifies as a form of crowdsourcing by the former set of definitions because each student has the opportunity to critique but is not obliged to do so (Barrett 2000). Critique also qualifies as crowdsourcing by the later definitions, if we consider that a larger group of critiquers does work that replaces (at least some of) the work done by an individual instructor. Note that not all ways of providing feedback count as crowdsourcing, including: feedback provided by an individual instructor (because one source does the work); requiring all students to give feedback (because there is a formal obligation to do the work); or asking a specific friend to give feedback (because there is a direct request to a specific party rather than an open call).

Crowdsourcing systems for providing feedback, or crowdfeedback systems (Fig. 1), may compensate for limited access to experts because feedback from a large number of novices can be as good as feedback from a small number of experts (Cho et al. 2008; Van Zundert et al. 2010). Crowdfeedback may also improve critique quality by providing scaffolds such as question prompts and common errors (Greenberg et al. 2015). They also force critiquers to record feedback for the designer.

However, if critique is a type of crowdsourcing, it faces the challenges faced by other computer-supported collaborative work and learning as well as additional challenges faced by crowdsourcing systems such as how to solicit work from the crowd (classroom, studio or network), motivate workers (critiquers), and how to filter, verify and combine contributions (feedback).

Crowdreview

Researchers and practitioners have recently developed a number of crowdreview systems, by which we mean asynchronous, crowdfeedback systems (Fig. 1b), which may overcome some of the disadvantages of face-to-face group critique. In crowdreview systems, critiquers and designers are not required to be in the same place at the same time (cotemporal or copresent), so input from multiple critiquers avoids production blocking by allowing more critiquers to contribute over a longer period of time.

A number of online showcase systems, by which we mean crowdreview systems where feedback is public and voluntary, have recently emerged for web-designers and other creative professionals (Bēhance n.d.; Dribbble n.d.; Zurb n.d.). These systems enable professionals to share images or code and provide a comment feed for peers to provide critique. As such online showcase systems follow the norms of the “signature pedagogy” (Shulman 2005) of the design fields as feedback is open to the larger community. Online showcase systems do not provide the scaffolding or training needed by novices to improve the quality of their feedback. While some systems provide detailed guidelines on critique and invite members who demonstrate critiquing ability (Zurb), such systems have ignored support for the social process of critique.

In contrast to professional systems that ignore the struggles of novices, online peer assessment systems, by which we mean crowdreview systems where critiquers are anonymous and graded, such as SWoRD (Cho and Schunn 2007; Cho et al. 2008), Arope (Hamer et al. 2007) and Comrade (Goldin and Ashley 2012) have been designed to help novices critique writing. SWoRD is an online peer assessment system, chiefly for writing, where peers provide critique through feedback forms that include: rubrics with expert-provided criteria, 7-point Likert scales of overall ratings, a text box to write critique, and a ‘back-review’ rating for critiquing critiquers (Cho and Schunn 2007). While research on SWoRD shows that novices using the system view peer critique as less reliable than expert feedback (Cho et al. 2006a, b), it also finds critique from multiple peers on SWoRD can be of the same or higher quality than that from a single expert (Cho and MacArthur 2010; Cho et al. 2008; Patchan et al. 2009).

Unfortunately, the purely online, asynchronous, anonymous, graded feedback in online peer assessment systems may not be ideal for providing formative feedback on design work. Unlike the learning environments where SWoRD and Comrade are deployed, learners in design contexts may not all be: solving the same problem, creating the same type of product (a written paper), using the same medium, or using the same methods. After investigating an ill-defined design challenge, learners in design contexts might pick radically different approaches such as creating a product, a service, an educational technology, or a curriculum. This requires designers to explain work to critiquers and allow quick back-and-forth between designers and critiquers to clarify the work, designers’ intentions and process, which in turn requires the grounding provided by face-to-face interactions. Most crowdreview systems do not support the communication benefits of face-to-face discussion (copresence, cotemporality, simultaneity, sequentiality, etc.) needed for grounding communication of the designer’s goals, context and work (Buxton 2007; Cross 2006; Moggridge and Smith 2007).

Furthermore, online, asynchronous critique systems in education often force participation through external incentives such as participation and evaluation grades (which technically means peer assessment is not a form of crowdsourcing because grading places a formal obligation on peers to contribute). Peer assessments used for summative evaluation in turn requires anonymity of critiquers. Asynchronous, anonymous, and graded online peer assessment appears to violate the social, collaborative, formative norms of design critique (Schrand and Eliason 2012).

Crowdcritique

Critique with novice designers involves learners deliberating face-to-face about improving a complex work in progress with participants who may feel uncomfortable receiving or giving feedback. We may be able to support this kind of formative feedback with crowdcritique systems that combine face-to-face critique and online feedback to provide the advantages of technology (scaffolding, reduced production blocking, recording) and the advantages of face-to-face discussion (grounding, motivation, community support, perceived learning, and comfort). If designed properly, crowdcritique systems might increase the quality of critique in a sufficiently desirable way, thus reducing the orchestration burden of instructors.

As a first step toward designing crowdcritique systems, we might simply deploy existing crowdreview systems, such as those currently used for peer assessment of writing, in the context of a face-to-face critique. However, this obvious approach faces a number of design challenges with respect to grading, anonymity and context. First, peer assessment systems for writing are typically based on very different motivational assumptions. Peer assessment systems typically assume, perhaps correctly, that writing students will not provide useful feedback to their peers unless they are graded for participation and critique quality. Grading in turn requires these peer assessment systems to anonymize work and critique, lest students fear reprisal from their peers for critical feedback or collude with peers to inflate their grades. Second, while anonymity can decrease evaluation apprehension, which should increase contributions (Gallupe et al. 1991), it also allows free-riding (Kraut and Resnick 2012), which should decrease contributions if grades are removed. Third, as noted earlier, peer assessment tools for writing also assumes that students are evaluating work that should stand alone and that all peers are working on a similar problem—this shared context reduces the need for grounding, meaning that peer assessment systems are typically asynchronous.

Some of these design challenges for using existing online peer assessment systems in face-to-face critiques reflect underlying conflicts between the feedback traditions of design studios and liberal arts writing courses. The public, oral feedback in design critique contrasts starkly with the private, written evaluation of student writing in the liberal arts “…generally consisting of a letter grade or numerical score, plus some marginal and/or summative comments” (Schrand and Eliason 2012). Despite the success of online peer assessment systems for writing, we ignore the conventions of design studio critique at our own peril. A study by Schrand and Eliason (2012) of 373 students across 10 fields found that 92.5 % of students felt that formative design feedback was helpful as opposed to 39.7 % that felt that peer writing assessment was helpful. Students’ reasons for preferring formative design critique include differential levels of: student engagement; expertise; and trust and community. This research found that in some versions of peer assessment of writing: “student reviewers’ lack of engagement in the critique seems matched and influenced by the student author’s lack of confidence and interest in the reviewers’ comments. In these cases both parties in the peer review relationship have low expectations of the usefulness of the exercise” (2012 p. 58). This was opposed to the generally higher level of intrinsic motivation in design critique “which is perhaps easier to activate with the creative freedom afforded by some design projects” (2012 p. 58). So while we might begin by deploying peer-assessment like systems in design critiques as a starting point for the design of crowdcritique systems, the success of this approach is by no means assured.

Purpose

In this project, we ask: how might we design a crowdcritique systems for project-based learning environments that provide learners with useful formative feedback on ill-defined problems, in ways that are sufficiently desirable for learners, so as to decrease the orchestration demands on instructors?

If successful, this project makes three contributions to research on intelligent formative feedback:

1)
It develops a grounded model of the mechanisms by which social context and technical features of systems interact to affect the quality of crowdcritique that will provide a theoretical basis for the design of crowdcritique systems.
2)
It proposes and evaluates a set of design principles for crowdcritique systems that are: consistent with norms of design critique; are desirable to critiquers; and produce valuable feedback to designers.
3)
It illustrates how crowdsourcing approaches can compliment expert systems as a means of providing intelligent formative feedback.

Even a preliminary, empirically grounded answer to this question solves a major challenge of Artificial Intelligence in Educational and Intelligent Tutoring Systems research by producing theory and a system for providing formative feedback on ill-defined problems in a scalable manner. The previous work described in the background section has identified myriad factors and interactions that could arise, but is not clear which factors actually do arise in hybrid face to face/online crowdcritique environments. There are few guidelines for producing crowdcritique systems, and of the many possible features, it is not clear which are necessary, which are superfluous, which will be effective, or which should be chosen to mitigate unavoidable tradeoffs. In short, existing theory provides many possibilities for where to shine the spotlight of our research focus, but does not provide sufficient clarity to mitigate the difficult work of building and testing crowdcritique systems. There is a critical need for effective, scalable, inexpensive systems to teach students to effectively solve ill-defined problems.

General Method

Design Research

This project followed a design research methodology in which we iteratively designed and tested a crowdcritique system to produce an effective educational intervention and empirically grounded theory that can be used to guide future interventions (Brown 1992; Collins 1992; Collins and Bielaczyc 2004; Easterday et al. 2014). This theory takes the form of design arguments that specify how features of the intervention achieve the design goals in a given context according to theoretical principles (Easterday et al. 2015; Van den Akker 1999). For example, the design argument of Corbett & Anderson’s (2001) classic study is that intelligent tutors that provide immediate feedback improve learning (more than those that provide on-demand feedback or only error-flagging).

In a design research project, researchers conduct qualitative and quantitative empirical studies as part of the design process to develop novel interventions (Easterday et al. 2014; McKenney and Reeves 2013). To mitigate the risk of spending too much time building and testing ineffective approaches, design research studies employ a stage-dependent search (Bannan-Ritland 2003; Easterday et al. 2014; Kelly 2004, 2006) that employs empirical methods appropriate to the development of a given design and related theories. For example, in the early stages of a new educational intervention, when the features that lead to learning are unknown and success is unlikely, researchers should conduct lower-cost studies that quickly identify failures and study interactions and mechanisms that will quickly support redesign. In early stages, the emphasis is theory-building and discovering a plausible design argument. In later stages of design, when theory is better developed and a greater body of empirical support, it makes more sense to conduct randomized controlled studies that rigorously validate the intervention. In later stages, the emphasis is on theory validation. Conversely, conducting randomized controlled trials are inappropriate for early stage work because it is highly resource intensive and would not help identify the key features and mechanisms needed to develop theory. Likewise, conducting users tests or qualitative work on later-stage interventions in which the theoretical principles and mechanism are well understood would be redundant.

A crowdcritique system for formative feedback is an early stage intervention where design arguments for successful systems are unknown. Therefore, in this project, we employ open-ended data collection methods appropriate to early stage intervention in three exploratory studies to develop and describe the mechanisms of underlying crowdcritique systems.

Learner Contexts

The three studies here take place in three different, but related learning contexts: an undergraduate, PBL design class; a graduate, PBL educational design class; and a student-led extracurricular PBL program. We briefly describe each context.

The formal design classroom contexts consisted of 11-week, project-based design courses. In the undergraduate course, students worked in teams of four to design digital media (videos, online content, etc.) to explain policy issues to their peers. In the graduate course, masters and doctoral students worked in teams of 1–2, often for a client, to design real-world learning environments such as web-based tools for documentary journalists, online user-research guides for product designers, and a videogame about public policy. Participants attended weekly 3-h studio sessions and were expected to spend a further 10 h on their design project.

The third learning context was at one site within the Design for America (DFA) network. Design for America consists of a network of extra-curricular, student-led chapters located at 21 different universities (Gerber et al. 2012). DFA students are drawn from majors across the university and form teams to solve real-world design challenges such as reducing bed-bugs in low-income housing, reducing childhood illiteracy rates, and reducing hospital acquired infections. Teams are not restricted to a particular solution, rather are expected to create an original solution based on their investigations and understanding of the problem and context. In this project, participants were enrolled in DFA’s Summer Studio program held at a local chapter. In Summer Studio, new DFA teams were formed over the summer and worked 40/h a week for 6 weeks, along with a community partner client and professional design coach to solve a real-world design challenge. Participation in Summer Studio was voluntary and students did not receive credit or pay. The program was designed by design professors and professionals, but was led by a DFA student facilitator.

These learning contexts allowed us to test crowdcritique systems across university student types, with different PBL curriculum, and in different domains. We moved from more controlled environments of the classroom in studies 1 and 2 (undergraduate and graduate classrooms), to a less controlled informal learning environment in study 3 (DFA extracurricular program) where educators had less formal power over students as no course credit was offered and the program was student led.

Testing crowdcritique in an informal, student-led learning environment presents significant challenges but also opportunities for designing crowdfeedback systems that might eventually scaled across studio-based learning environments. Scaling, or rather, the process of diffusion and adoption of innovations (Rogers 2003) presents many challenges, including the marketing challenge of how others are persuaded to try and eventually adopt the innovation; and the implementation challenge of successfully putting that innovation into practice.

In this paper, we focus on the implementation challenge to first determine whether crowdcritique can feasibly provide learning benefits in studio-based learning environments. Crowdcritique systems must provide feedback that is useful and sufficiently desirable to learners so that studio leaders can easily mobilize their crowd. Crowdwork systems deploy a wide range of incentives appropriate to a particular context. For example, in systems such as Mechanical Turk, a public television funding drive, or Wikipedia, crowdworkers are incentivized through rewards such as pay, a public good, or personal satisfaction (Quinn and Bederson 2011). In crowdfunding systems (Hui et al. 2014b) such as Kickstarter or the church collection plate, there are additional social pressures created by network ties or community norms that encourage participation. In other crowdwork systems such as a corporate help-seeking on intranets or schoolyard clean up coordination with online sign up sheets, members may be explicitly sanctioned for social-loafing. While none of these crowdwork systems formally mandate participation, they each use various combinations of rewards and sanctions to encourage contributions.

For crowdcritique to overcome the implementation challenge in a studio-based learning environment, we can rely on studio leads (such as teachers, peer facilitators, or managers) to encourage norms participation, but the crowdcritique systems must be sufficiently desirable to participants that they will contribute critique without explicit sanctions (such as chiding or poor grades) from the studio lead. If studio leads must sanction members to ensure participation, then the systems imposes too great an orchestration burden on the studio lead; participants will attempt to game the system in a way that produces poor feedback; and learners will not continue crowdcritique practices outside the studio-based learning environment. In other words, the crowdcritique system must be desirable enough to require only a light touch by the studio lead.

Overcoming the implementation challenge in these learning contexts lays the foundation for overcoming the marketing challenge of diffusing innovations. Successfully implementing crowdcritique systems that mobilize studios to provide feedback helps us to eventually examine how crowdfeedback systems can spread across studios. For example, Design for America is a rapidly growing network that includes thousands of students at scores of studios across the US. Spreading crowdcritique practices across this network requires first demonstrating success of crowdcritique within a local studio to provide the credibility needed for network leaders to advocate the practice across the network. The focus of this paper on within-studio implementation is thus a first step toward spreading practices across studios.

Research Design

To better understand the complex social and technical mechanisms important to crowdcritique and to develop empirically grounded principles for the design of a crowdcritique system, we conducted 3 observational studies to examine the use of different iterations of the crowdcritique system. Figure 2 shows a student design team (not part of the study) using the system with an interface like that in Fig. 3.

Study 1: Crowdcritique Pilot

As a first step toward developing principles for the design of a crowdcritique system, we deployed a simple online system similar to those used for peer assessment of writing, in the context of a face-to-face critique. The system provided a baseline platform for continuing study, like the MovieLens system for recommender systems (Rashid et al. 2002). We designed the system to be similar to existing approaches to test current approaches in a face-to-face critique setting and to be as simple as possible for theoretical parsimony.

Design Argument

To create a system that was representative of current approaches, we designed the critique system with basic features used in crowdreview systems, adapted to the face-to-face critique setting. This basic crowdcritique system provided each critiquer with a guide, or rubric, implemented as an online form (Fig. 2). Each rubric typically included 3–5 criteria for a given design activity. For each criterion, critiquers had to provide a rating (from 1–4, each provided with a text definition), an explanation of their rating, and actionable advice for how the designer could improve their work. Designers receiving critique could click on an icon of an identifiable critiquer to view their feedback.

Thus the design argument for the first iteration of the system was that: to provide formative feedback for novice designers, crowdcritique systems should include a combination of critique training and critique scaffolds (justifications in Table 1).

Table 1 The first crowdcritique system provided critique training, write-first scripts and critique scaffolds

Full size table

In addition to the critique tool, learners used a variety of other tools for their design projects. These included a website with design method tutorials and a cloud-based file-sharing website, Box, for saving and sharing work. While critiquers could view designers’ work on Box and submit critiques asynchronously, learners were given class time for face-to-face critique, usually in small groups and occasionally as a whole class. Learners always had access to the critique tool during their face-to-face critiques.

Participants & Setting

This study was conducted in an 11-week, project-based, undergraduate design course designed to teach novice communication designers and critiquers at a private Midwestern university how to use digital media to explain policy issues. Sixteen participants worked in teams of 4. Participants attended weekly 3-h studio sessions and were expected to spend a further 10 h/week on their project outside of studio time. Studio time was used for peer critiques, short lectures, and reading discussions.

Intervention

Critiques in the pilot proceeded as follows: first, 2 student critiquers were selected by the instructor to lead the whole-class critique; then designers presented work while critiquers listened and wrote comments using the crowdcritique software; after the presentation, leaders offered initial commentary while other critiquers finished writing; finally, face-to-face discussion began. In early sessions, the instructor demonstrated critiquing by commenting on issues not addressed by the critiquers.

Data Collection and Analysis

In addition to field observations (Spradley 1980), an external evaluator conducted a small-group analysis (Brennan and Williams 2004; Diamond 2004). In a small-group analysis, participants discuss in small groups the answers to three questions: What aspects of this course/instruction enhance your learning?, how could this course be improved to enhance your learning?, and what could you--as a student--do to enhance your learning in this course?. The class discussed their answers as a whole group and decided by discussion on the top answers. Then the entire group was surveyed on their agree/disagree/neutrality toward each of the top answers, providing quantitative agreement scores for each of the top answers. Other minority answers are collected along with the survey. The course instructor was not present for these discussions.

Results

Field observations suggested the system reduced production blocking—whereas only a few students typically comment in face-to-face critique, with many not paying attention at all, far more students participated in the group critique by posting online. Furthermore, because all critiquers were encouraged to reflect by commenting online before the discussion, critiquers seemed to have more to say, which appeared to increase the number and thoughtfulness of contributions during face-to-face discussion.

In the small group analysis conducted by an external evaluator, students rated class feedback and the critique system as the top two factors promoting their learning. Students gave an average rating of 7.33 (out of 9) to the importance of class feedback in improving their learning, with 73 % agreeing (11 students giving a rating between 7-9), 27 % neutral (4 students giving a rating between 4 and 6) and 0 % disagreeing. Students gave an average rating of 6.87 to the importance of the critique system for improving their learning, with 66 % of students agreeing (10 students giving a rating between 7 and 9), 33 % neutral (5 students giving a rating between 4 and 6), and 0 % disagreeing. Students reported that: class feedback; responding to another person’s post; seeing everyone else’s project and getting feedback on each step (of the problem-solving process); helped improve their learning.

When asked how the course might be improved to enhance learning, students also identified feedback as the top factor, asking for more feedback from the instructor, with 100 % of students (n = 15) agreeing with that statement. By more feedback, students meant several things. First, that they wanted critique on every assignment throughout the course, not just the milestone assignments for which the class used group critique. Second, that they also wanted more summative feedback about their grades. Third, that they “learned how to do things after they were due” and that it “would have been helpful to do that before” but that it was “nice [that it] didn’t have a negative effect on our grades.”

These comments indicate some of the challenges of framing formative feedback to undergraduate, novice designers and critiquers. In the vast majority of courses students take, they do not have the opportunity to revise work after receiving formative feedback. So while they greatly valued formative feedback for improving their learning and wanted more of it, they were confused that this feedback would come “after the assignment was due” because they are not accustomed to the idea that they can, and will, revise each assignment multiple times. Furthermore, building a culture of formative feedback by de-emphasizing grades may simply make students more apprehensive if they fear that their summative evaluation will be negative.

Discussion

The pilot study tested the application of a simple online critique system integrated into face-to-face, whole-group critiques facilitated by the instructor. The results of the pilot, though tentative, indicated that even a relatively simple crowdcritique system provided feedback that those students overwhelmingly reported enhanced their learning. Furthermore, the students’ main complaint was that, if anything, they wanted more feedback. The pilot also showed the importance and challenge of developing formative feedback norms that run counter to their experiences in other courses.

The similarity of the critique system to existing systems and its success in the pilot suggested that the critique system would provide a valid platform for studying crowdcritique. This study allowed us to test the efficacy of a basic critique system in novice, face-to-face group critique, to observe how novice designers use such a critique system and to suggest design principles for crowdcritique systems.

The results of the pilot, though tentative, indicated that even a relatively simple crowdcritique system provides feedback that is valuable to learners, which, for this stage of development, suggested that the design argument was worth further investigation in Study 2.

Study 2: Crowdcritique Challenges

The purpose of the second study was to build a richer theoretical model of crowdcritique, so we deployed the same crowdcritique system again in a formal design classroom setting and gave learners more choice in how they used the system. Ultimately, we are interested in crowdcritique systems sufficiently desirable to learners so as to require only modest encouragement from studio leads (such as instructors, managers, or extracurricular peer facilitators as in study 3) and to develop dispositions to use critique that transfer to other contexts.

Thus in study 2, we collected more in-depth observations (Spradley 1980) and interviews that we analyzed using grounded theory (Corbin and Strauss 2007). Studying the system under more varied conditions and with more in-depth data collection and analysis was part of the design research stage-dependent search strategy (Bannan-Ritland 2003; Easterday et al. 2014; Kelly 2004, 2006), of selecting data collection and analysis methods appropriate to the fidelity of the design and theory.

Design Argument

The results of study 1 indicated that students valued the feedback from the first version of the crowdcritique system, so we tested the same design argument in study 2.

Participants

Fifteen novice graduate education designers at a private Midwestern university participated in this study. The participants were between 22 and 30 years old and included master’s students in education (6), PhD students in education (7), mechanical engineering (1), and Human Computer Interaction (1). Twelve had at least 2 years of professional work experience. Fourteen had less than 1 year of formal training in interaction design and design critique. Participants worked closely together in shared co-working spaces throughout the year. All used the Internet daily.

Setting

This study was conducted in an 11-week, project-based, interaction design course to prepare students for professional careers in designing learning environments. Participants designed solutions to real-world interaction design challenges such as web-based tools for documentary journalists, online user-research guides for product designers, and a videogame about public policy. Participants worked in project teams of 1–2 members and each project team was paired with a client with a design need. Participants attended weekly 3-h studio sessions and were expected to spend a further 10 h/week on their project outside of studio time. Studio time was used for peer critiques, short lectures, and reading discussions.

Intervention

Participants engaged in regular crowdcritique sessions. During sessions, project teams talked about their work and received critique for 15–25 min. While on 2 occasions teams presented to the whole studio for critique, other critiques occurred in small groups of 2–3 teams (3–4 participants) to better understand how the crowdcritique system worked in a wider range of activities. There were no facilitators in any group, and critiquers were able to engage in discussion during the presentations.

Data Collection and Procedure

As is typical with grounded theory, we began with open data collection. We observed the weekly, 3-h studio session for 11 weeks, taking notes on how designers and critiquers interacted with each other and the system. We conducted 56, 10–25 min, semi-structured, face-to-face interviews with participants on their project work and system use. We interviewed each participant at least 2 times and transcribed all interviews. Our semi-structured interview protocol focused on aspects of the system that helped or hindered design work. We amended the protocol as themes emerged from the interviews. In each interview, we asked participants about the work they had undertaken that week, the opportunities and drawbacks of using the system, and about themes that had arisen in previous interviews. We used the observation notes and transcriptions to crosscheck the emerging themes from the interviews.

Analysis

The grounded theory research methodology (Corbin and Strauss 2007) is used to generate, rather than validate, new theoretical models grounded in data that can explain some phenomenon. Following grounded theory, we began with the research questions: what are the process and causal mechanisms of crowdcritique? and what are the grounded principles for designing crowdcritique systems? We reviewed the data as we collected it, tagged data with codes that we then grouped into concepts and then into broad categories. Throughout the process, we wrote memos to identify the substantive concepts and relations between them. The emerging theoretical model was used to guide further data collection, which we proceeded to collect until additional data collection ceased to add new information to the model. The resulting theoretical model provided an explanation of the phenomenon grounded in empirical data.

First level coding identified 1,338 instances of 71 codes describing how the system succeeded or failed in providing needed support. We then iterated between first level coding and open coding to cluster these codes into conceptual categories pertinent to how systems can support novice designers. The theme of critique process emerged as one of the most salient to understanding group critique systems, so we selected it as our central category and created an analytical storyline describing how different constructs relate to this theme. We formalized our findings as process and causal diagrams with descriptive narrative. To refine our theory and resolve disagreements between the research team we checked the model against interview, observation data, and existing research. Finally, we presented our model to 3 participants to verify accuracy.

Results

Group critique systems must support a complex interaction between designers, critiquers, software, and practices that can break down in a number of ways. Figure 4 displays a graphical representation of the findings. The following subsections describe each element in Fig. 4.

Set Goals—Designer Understands Goals. Before beginning work, designers must understand the criteria for judging work. Novice designers are often unaware of these criteria and may not attempt to develop this awareness early enough (Fig. 4a). For example, P3 said, “[t]his feedback was helpful, but I wish I had these rubrics to look at when making my goals.” P3 identified that understanding crucial design criteria would improve their work even before group critique.
Work—Designer Shares Work. If designers want their work critiqued asynchronously or if the work requires significant time to review before face-to-face critique, then designers must make their work available to critique. Furthermore, critiquers might want to inspect work, such as an interface design or an interview protocol on their own screen during the presentation. Making the work available requires the designer to establish a convention for sharing the work, notify critiquers that the work is available, and provide sufficient information and context for evaluating the work. Designers used a third party file sharing system, which created friction for critiquers navigating across different interfaces (Fig. 4b): “It will be nice I guess if we didn’t have to have multiple interfaces if you could like upload your own stuff here” (P14). Furthermore, designers often did not provide enough context for their work to be intelligible, as critiquers noted: “having to go through [the file sharing system]… and read every single document to try to get a feel for what they were doing just so that you can give feedback was an issue” (P6). Note that this situation arose in the few cases where critiquers spontaneously attempted to asynchronously critique other projects.
Group Size—Form Critique Groups and Size. Designers and critiquers sometimes formed large groups for formal presentations or small, informal groups. Group size impacted later steps of the process, including Talk & Write, Record, Recall/Review and Use feedback because larger groups experience more production blocking, in which case it is easier to provide written feedback online, which in turn increases the amount of critique captured (Fig. 4c) as discussed shortly.
View—Critiquers View Work from Library. Like designers, critiquers had difficulty managing the materials to critique when they attempted to critique only online (Fig. 4b). P10 noted that it took “five minutes to figure out where to go and then probably another 5 minutes to figure out what I was supposed to be looking at” because designer-provided materials were not on the same site as the critique tool. Even when critiquers were able to access materials, critiquers reported not having enough context to give feedback. Observational data found that studio members overcame this problem by having designers present their work in person on a computer display.
Analyze—Critiquers Analyze Work. In crowdcritique, a large number of novices may potentially compensate for a lack of experts. Unfortunately, novice designers face a number of challenges analyzing work, including not feeling that they have the necessary expertise, not understanding rubric criteria, and not wanting to be critical (Fig. 4d). Critiquers were hesitant about their critique performance in general, saying things like, “I don't feel I'm good at it or I'm not always constructive” (P12). Critiquers also had difficulty understanding rubric criteria, making them feel that they do not “have enough background or understanding to be able to judge effectively” (P4). And while the rubric scaffolds quantitative and qualitative feedback, some critiquers resisted quantitative evaluation. As P14 explains, “…is there ever going to be a time when I’m going to give somebody a … lower score..? Probably not. I’m probably just going to give advice and explanation just because I don’t want to make people feel bad.”
Talk, Write—Critiquers Decision to Talk or Write. Crowdcritique offers both face-to-face and online modes for providing feedback, the former providing an easy means of grounding discussion, the latter offering recording, scaffolding, and decreased production blocking. However, critiquers expressed many reasons for not writing (Fig. 4f). First, some reported that it was too difficult to write during conversation: “…being able to look at somebody’s stuff and type the feedback and talk, it’s all overwhelming. So we usually just do stuff like ‘great, awesome good job’ not really helpful” (P6). Second, some critiquers did not see the value of recording comments already expressed verbally. P11 explained that in face-to-face discussion designers and critiquers would consider many issues and possible solutions, but then treat the recording the discussion as a chore. As a result, their comments often consisted of a short sentence, such as “good job.” Third, some reported that rubrics were too constraining. For example P1 reported not knowing where to place a comment in the critique system and not knowing how to address a critique system prompt to provide a design suggestion: “So a lot of times, that falls outside of whatever kind of format it is, so I’d just to try just think about, you know, here, just put it in this box. So I don’t know… it was too constraining” (P1). Fourth, critiquers expressed uncertainty about how to critique in general. P12 acknowledged the rubrics helping novice critiquers notice issues that experts would address that novice designers would miss. However, the rubrics alone did not necessarily help novices respond with better feedback. P12 reported that even if the rubric called attention to an issue, the novice critiquer still might not be able to assess the issue or provide advice to the designer. Fifth, critiquers were often concerned about providing redundant feedback: “I was pretty minimal [laughter]. I feel like I just repeat what other people say” (P12).

Given critiquers’ reasons for not wanting to provide written critique, when left on their own, small groups of critiquers typically provided only verbal feedback: “It’s easier for me to ask questions and have more of a conversation back and forth with somebody to express my thoughts or to answer certain questions that someone might have about my projects rather than following the rubric of the feedback outline on the [system]” (P12). In face-to-face verbal discussion, critiquers did not have to use rubric criteria that they did not understand (see Analyze) and could comment on other aspects not included in the rubric.

Unlike the pilot, crowdcritique in this study often ceded to face-to-face (only) critique. Fortunately, the failures to contribute to critique often observed in design education settings did not occur in the small group setting. In the pilot, the tool was used in large-group critique, which included a critique script (routine) in which designers presented work and critiquers wrote comments before face-to-face discussion. In that pilot, the tool seemed to decrease production blocking and increase contributions during critique. In the current study, designers were given more choice about how to use the tool. When in small groups, without a facilitator, novices used a talk-only script. Unfortunately, there were failures to record critiques (discussed shortly).

On the other hand, some critiquers noted that writing provided advantages because: designers did not always record feedback, writing promoted reflection, and writing might be necessary for large group critique. Critiquers like P1 and P13 thought multiple channels were beneficial because they could ensure their feedback came across by saying it and writing it.
Record—Designer Records Feedback. While designers and critiquers in small groups preferred face-to-face discussion because to “look at somebody’s stuff and type the feedback and talk, [is all] overwhelming” (P6), they nevertheless wanted feedback recorded. Although small face-to-face groups did not experience failure to contribute to critique, they did experience failures to record critiques. P1 identified the tool as a useful place for a critiquer to record critiques that the designer may not have captured in his or her note-taking. With critiquers not utilizing computer support in offering feedback, designers often took responsibility for capturing feedback. Some designers, like P6, P11, and P14, took notes based on the discussion. P4 suggested that some other critiquer be given the responsibility of note-taking.
Recall/Review—Designer Reviews Feedback. Designers faced different obstacles reviewing critique depending on the critique group size. In small groups that relied only on face-to-face discussion, the small amount of written comments were seldom worth reviewing because they were of lower quality than feedback from discussion (Fig. 4g). For example, P14 stated: “I haven’t really actually looked at this very much, to be honest … everybody basically tells me that they’re just filling things in arbitrarily because we have so much better results from our conversations.”

When designers gave critiques in large groups, the large number of written comments introduced challenges of synthesizing comments, reconciling conflicts and deciding whether to trust the critiquer. Designers had difficulty making sense of the feedback, which was organized by critiquer on separate pages: “…but I don’t know. It seems like getting so much, it’s hard to know how to make sense of this yourself. I can imagine him getting these five reports and being like, uh, now he has to do the work of synthesizing everybody’s comments…” (P2).
Recall/Review—Designers Use Feedback. When feedback was available, designers sometimes had difficulty knowing which feedback to prioritize and how to act on it (Fig. 4h). Though the rubrics prompted critiquers to give prescriptive advice, they often did not. Designers then had to deal with descriptive feedback that did not provide actionable next steps: “So figuring out a way to prioritize the feedback I think is really important, because right now, it’s giving equal weight to all the feedback. Even if I give you a low score here, that might not be the thing that’s going to change the direction of your project” (P7).

Discussion

In study 2, we field-tested the same basic crowdcritique system used in study 1, this time in both large-group and small-group critiques, giving students the choice of using the system, and with more in-depth data collection and analysis. Unlike study 1, in which we saw an overwhelmingly positive response to crowdcritique, upon closer inspection in study 2, we found some significant limitations of crowdcritique using peer-assessment-like systems.

We identified 7 major challenges that crowdcritique systems must overcome to provide formative feedback (Table 2).

Table 2 Challenges for crowd critique systems

Full size table

Of these challenges, by far the most concerning was that, when given the option, students elected not to use the basic peer-assessment inspired online system, primarily because they were averse to the constrained, grade-like nature of the system (challenge 3 in Table 2). Previous studies of online peer-assessment systems do not seem to have identified these challenges, perhaps because they assume that students will not engage in any practice unless motivated through external incentives and hence did not give students the option.

However, forcing students to use crowdcritique systems is neither desirable in terms of encouraging students to adopt formative feedback practices, or feasible in extracurricular environments in which learners do have a choice about whether to adopt crowdcritique practices. Even if crowdcritique solves this problem through more formal facilitation scripts, crowdcritique must be desirable for designers and critiquers.

Study 3: A More Social Crowdcritique

Despite students’ positive reactions to the whole-group crowdcritique in study 1, when students were given more choice in study 2, they preferred face-to-face discussion rather than crowdcritique, losing the benefits of scaffolding, reduced production blocking and recording. The purpose of study 3 was to determine how to design crowdcritique systems that overcome the challenges identified in study 2, specifically, to design a system that both provides useful critique and that is sufficiently desirable to learners.

Desirability is critical for both formal and extra-curricular learning environments. If a crowdcritique system can only elicit critique by anonymizing critiquers’ contributions, grading critiquers for their participation, and backgrading them to ensure that they will provide anything beyond the most minimal effort, then it is unlikely that the crowdcritique system will be used by people who have signed up for an informal learning environment (like DFA) to “hang out with friends”, “have fun” or “make a difference.” Desirability is critical for formal learning environments as well. If a crowdcritique system only works with external incentives like grades, then it is unlikely that learners will continue to support, initiate, and practice crowdcritique in their professional lives.

Design Argument

To overcome the challenges identified in study 2, in study 3, we redesigned the crowdcritique system by: adding support for near-synchronous, public conversation; using social media conventions; removing “grading”; and teaching a more explicit facilitation script. Specifically, the design argument for the second iteration of the system was that: to provide formative feedback for novice designers, crowdcritique systems should provide a combination of: formative framing, critique training, a prep/write-first script, interactive critiquing, critique scaffolds and upvoting, (justified in Table 3) (Fig. 5).

Table 3 Crowdcritique systems should combine scaffolding with social-media conventions

Full size table

Participants

Twelve novice undergraduate members of Design for America (DFA), aged 19-23 participated in this study. Participants were all DFA members from the same university, drawn from different majors (10 engineering, 2 social sciences, and 2 humanities); across all grades (5 freshman, 5 sophomores, 1 junior and 1 senior); and were predominantly female (11 female and 1 male). None of the participants had professional design experience or formal experience with design critique. All participants used the Internet daily.

Setting

The study took place during the Design for America Summer Studio program, organized by a DFA chapter and housed in a university design institute. Summer Studio was a 6-week extra-curricular program created by design professors and professionals for DFA. Participants received no course credit or pay for participating in the program.

DFA student design teams worked with real-world clients to solve social problems. The teams partnered with (a) a public library to reduce youth illiteracy, (b) a homeless shelter to increase safety for homeless youth, and (c) a senior living residence to improve the quality of life for dementia patients. Clients provided teams with feedback and access to users who directly experienced the problem design teams were working to solve. Each design team was paired with a professional coach with 10 or more years of design experience who mentored the team in face-to-face meetings for 2 h a week.

Each 4-person design team worked at least 40 h per week, using the design process to design an original solution to their real-world problem. Teams often worked additional hours and one team continued working on their project at least 8 months after Summer Studio.

An undergraduate DFA student who had participated in Summer Studio in a previous year facilitated the program. The undergraduate facilitator organized workshops led by professors or local design professionals on different design techniques such as conducting user research, ideation, prototyping and testing.

Intervention

At the beginning of Summer Studio, the undergraduate facilitator and one member of the research team led a 20-min session introducing critique, explaining it as a low-stakes way to get feedback for improving teams’ projects. Over the 6-week program, teams held 5 weekly critique sessions lasting 75–90 min. At each critique session, teams presented their work, which covered a wide range of design methods including: research plans, user research syntheses, design concepts, prototypes, testing plans and client communication plans.

Before each critique session, the undergraduate facilitator and one member of the research team discussed which critique prompts should be used for the session and entered these prompts into the system. The facilitator also asked teams if they wanted additional critique prompts and entered these 0–5 additional prompts into the system as well.

Each critique session was attended by: Summer Studio project teams, the undergraduate facilitator, 1–6 additional experienced design students, and 0–3 design professors.

During the critique, each team presented for 10–15 minutes and received feedback verbally for 10–15 min after the presentation. The facilitator encouraged critiquers to write comments using the online system at any time and to ask clarification questions at any time during the presentation.

Data Collection

Over the course of the 6-week Summer Studio program, we collected field notes and interviews to better understand the process of crowdcritique and the effect of the crowdcritique system.

This included field notes and observations of the 5, weekly, 75–90 minute critique sessions focusing on the interactions between designers, critiquers and the system during critique and working closely with facilitators to understand the challenges that arise in crowdcritique.

Data collection also included 48, 15–25 minute, semi-structured, individual, face-to-face interviews with each participant about their project work and use of the crowdcritique system. Interviews with participants were scheduled within 5 days of the critique, and, while it was not always possible to interview each participant each week, each participant was interviewed at least 3 times. The semi-structured interview protocol focused on participants’ critique experience, their use of the crowdcritique system and their reactions to the critique facilitation. During the interview, the interviewer directed participants to the online written critique comments to clarify their responses and aid recall. The research team transcribed interviews immediately and modified the protocol based on emergent themes.

The combined observations notes, transcript and critique system record were used to crosscheck themes.

Analysis

We analyzed the interview transcriptions and field notes again using a grounded theory approach (Corbin and Strauss 2007) to better understand the interactions between designers, critiquers and the crowdcritique system.

Our first level coding identified 1161 instances of 22 codes about system use and critique. We initially open coded to capture if a significantly different process emerged, but study 2 provided us with many of the right sensitizing concepts (Blumer 1954) that we could apply to the data. For example, that critiquers don’t record feedback in study 2 led us to code for write first during the presentation. This approach also allowed us to identify new codes that did not occur in study 2, such as critiquers revoicing written feedback during the verbal critique. As to be expected, given that we had a more complete understanding of the process, fewer codes emerged in study 3 than in study 2. We then iterated between first level coding and open coding to cluster the codes into 10 conceptual categories about how the system supported critique. Other conceptual categories that were less central to our research question or for which there was insufficient data to research saturation were not pursued (Miles et al. 2013), for example, categories relating to how designers planned their presentation.

Based on these categories, we then formalized our findings as process and causal diagrams with narrative description, which are described in the following results section.

To refine our theory and resolve disagreements, we checked the formal models against the interview, observational, and online system’s data and compared the theory to existing research. We also verified the findings by presenting and discussing them with the Summer Studio student facilitator.

Results

The more social crowdcritique system used in this study led to an extended dual-channel critique. This included face-to-face discussion and multiple simultaneous online conversations that extended over the discussion and presentation phases of the critique, with critiquers moving across channels through revoicing and reposting comments. The process model (Fig. 6) summarizes the interactions between designers, critiquers and system. We now describe each component of the model and its supportive qualitative data in detail.

Scaffolding Critique

Prompts

When designers began work on a design activity or were ready to share their work (Fig. 6a), the critique tool provided prompts written by instructors specific to each activity, such as: Given the way we tested [our prototype], is it fair to draw the conclusions that we did? To what extent are our findings [from user research] supported by evidence? Or is the prototype the right fidelity to test what we want to test? These prompts were intended to scaffold novice critiquers by helping them ask the kind of questions that expert designers ask.

Critiquers reported that the prompts helped guide their critiques, which they liked (P7, P11, P12). Designers also felt that, when critiquers answered all the prompts, “It covered everything we wanted to talk about” (P11).

It was also important for the crowdcritique systems to allow designers to customize the critique prompts. The expert written critique prompts guide novices’ attention to general issue, but designers reported needing to tailor critique to issues idiosyncratic to their project (P12, P5). Designers found the ability to customize critique questions to get “feedback on what we wanted to ask…[that] was really nice” (P2). Like designers, critiquers will also sometimes want to raise issues outside the general prompts (P1, P12), although this wasn’t always productive. For example, critiquers would go outside the prompts to comment on the designers’ presentation rather than their work, which did not help the designers overcome their immediate design challenges (P1, P10, P2, P5, P4).

Prep

It was important for critique facilitators to begin the critique with a short prep period (Fig. 6b) where designers explained to critiquers the kind of feedback they were looking for and the meaning of the critique prompts. Critiquers found reviewing the prompts before the critique useful (P9, P11). Scaffolding critique with prompts was meant to guide novice critiquers’ attention to the same issues that experts attend to, which is inherently challenging. As to be expected, critiquers didn’t always understand the prompts (P7). Critiquers also didn’t spontaneously take the time to study or ask about the prompts before the critique stating that they “didn’t actually take advantage of it” (P7). So it was critical that facilitators did not skip the prep period.

Dual-Channel Critique

The greatest difference in version 2 of the crowdcritique system was the use of social-media conventions. Rather than an anonymous, asynchronous system modeled off of other peer assessment systems, this version of the crowdcritique system provided a public, near-synchronous system more similar to Facebook than an online grading rubric. This, combined with the facilitation script, created a dual-challenge discussion across face-to-face and online channels (Fig. 6d).

Extended dual channel conversation

The greatest impact of the social-media interface (and facilitation scripts) was to create an extended dual-channel conversation in which designers and critiquers simultaneously discuss in face-to-face and online channels. In traditional face-to-face critique, designers present while critiquers (hopefully) listen, then take turns given verbal critiques, which create production blocking as other critiquers wait (hopefully) to make a comment.

Version 2 of the crowdcritique system’s Facebook-like interface conveyed a friendlier, more formative framing than the rubric-like interface in version 1, so critiquers were far more willing to use the online written channel.

Version 2 of the crowdcritique system also made it possible for critiquers to post public, near-simultaneous comments at any time. Over time with the encouragement of the “write-first” step of the facilitation script (Fig. 6e), critiquers began to start posting comments earlier during the presentation (P1) itself, decreasing production blocking and extending the amount of time for giving critique. Furthermore, design team members reported monitoring and responding to the near-synchronous online comments while their teammate presented (P10). These simultaneous online conversations continued during the verbal critique (P4).

Coordinating channels by revoicing and reposting

How did critiquers coordinate simultaneous discussion across two channels? We found that they: (a) made choices about which channel to use, (b) reposted verbal comments to the online channel, and (c) revoiced comments from the online channel aloud (Fig. 6f).

Critiquers chose to state important comments and comments requiring discussion verbally (P10, P1). They used the online comment stream for both shorter, clearer, or minor comments such as referrals to resources (P10, P1) as well as extended comments that would take too long to describe verbally (P10, P1, P3).

Critiquers reported reposting verbal comments. This included verbal comments that critiquers wanted to emphasize (P2) and to capture for the designer (P2, P11). Critiquers would also capture other critiquers’ verbal comments online (P12), which could go beyond simply recording comments, such as when a self-described “listen-and-analyze person” would repost a comment along with his/her own opinion (P12).

Critiquers reported verbally revoicing online comments they felt were important (P8). Critiquers would also revoice other critiquers’ online comments, especially on behalf of shyer critiquers, a phenomena recognized by other critiquers (P4).

Adapting to dual-channel conversation

Multiple, high-volume conversation distributed across both face-to-face and online channels could potentially create an unreasonable cognitive load if critiquers had to spend significant amounts of time monitoring and up/down voting the large number of online comments (P1, P2, P5, P10, P14). Fortunately, this potential cognitive load was of minor concern because critiquers quickly adapted to the system and it was not necessary for every critiquer to read every online comment to produce useful critique.

In early sessions, critiquers reported that the dual-challenge conversation felt “confusing,” “weird,” and “people didn’t really use it or used it at the wrong time” (P1, P11, P12). But after 4 sessions, critiquers found it easy to use the system to critique (P6, P11), stating that it felt “a lot more natural” (P1, P11), that they “liked it” (P2) that “I was all into it” (P12).

For designers, the dual channel conversation provoked varied reactions. One designer (P11) felt that the typing during presenting was distracting because “at-the back-of-your-head” critiquers’ evaluation felt “judgmental,” which is consistent with the evaluation apprehension that we expect novices to experience in receiving critique. On the other hand, other designers of the same team “didn’t even notice” the typing during the presentation. Designers also expressed positive reactions as well, including that they “really like the idea of doing real-time critique.” Even P11 noted that in other academic courses, hearing people type means that “they're not paying attention, and you know people are paying attention here because they’re here to give you good feedback” (P11).

With simultaneous written critique, there is the danger that critiquers would attend more to the written comments rather than the presentation itself (P8). However, the primary benefit of online written critique was to provide a larger volume of feedback to designers from an increased number of critiquers, not for critiquers to monitor the comments added to online conversation. Designers seemed to think the benefits of real-time critique outweighed the costs: “it’s more important to get real-time feedback than it is to… you’re still giving them your full attention, so people should just feel free to leave as many comments as you can. Within the quantity of feedback you’ll get good answers too” (P10).

Up voting decreases evaluation apprehension

The crowdcritique system included an up/down-voting feature similar to Reddit. This was originally intended as a means to anonymously filter and promote useful comments for designers. However, designers did not pay much attention to the number of up/down votes because other information was more important, such as: the person critiquing, whether the critique raised an issue the designer hadn’t considered (P11), and whether the critique was also voiced verbally. The primary effect of upvoting was to increase novice critiquers’ confidence in their critiquing ability (Fig. 6g).

Critiquers used up-votes to express their agreement (P6) but rarely downvoted. Critiquers liked to upvote because it allowed them to express a “me-too” critique without being repetitive (P4). They did not like to downvote because they did not want to make other critiquers feel bad (P4, P5) and, because downvotes warranted additional explanation, they simply added an additional written comment instead (P4, P5, P8).

While upvotes did not seem important to designers, they were very important to other critiquers. Novice critiquers are uncertain about their design and critique skills, so upvotes signaled to critiquers that they had made a useful comment, which increased their confidence (P6, P12).

Critique Outcomes

Designers and critiquers reported the dual channel discussion impacted critique in a number of ways including: grounding, volume of comments, number of new ideas and reflection.

Increased grounding. The combination of verbal critique with near-simultaneous, publically visible, written critique led to more grounded critique in several ways.

First, the near-simultaneity of online critique created a temporal grounding. Design team members reading the online critiques during the presentation could more easily understand the comments because they knew that the comment referred to the part of the presentation just presented (P6, P5). Furthermore, these team members could respond to the critique even before the verbal discussion began (P4, P10, P11).

Second, the online comments were used during 1–2 minute period between the presentation and the verbal discussion allotted for writing additional comments gave designers time to “think about their answers” to the critique questions before beginning discussion (P4). This period also allowed designers to figure out what critiquers had understood from the presentation, helping the designer better communicate during the verbal discussion. It helped designers and critiquers “get on the same page” and “see what was coming from the in-person feedback” (P4, P5).

Third, the redundancy of comments presented both online and verbally helped designers better interpret written critiques (P5) and make the critiques stick (P4, P7), in addition to recording the critiques for later use by designers (P11).
More critiques. One of the reasons why peer assessment can improve student work more than feedback from a single expert is because there is more of it (Cho and MacArthur 2010). For peer critique, the benefits of having a greater number of useful comments appears to outweigh the distractions caused by having a greater number of less useful comments.

The dual channel crowdcritique system allowed more critiquers to contribute (by reducing production blocking) over a longer duration (during both presentation and discussion) leading to a greater volume of written critiques (Fig. 6i). The written channel also allowed less vocal critiquers to contribute (P3, P1, P5). The quantity of critique also increased across sessions as critiquers learned to use the system. Upvoting also encouraged additional contributions by decreasing evaluation apprehension.
More ideas. Public written critiques led to an increased number of new ideas. Designers reported that reading the comments generated new ideas that they hadn’t thought of before (P5, P3, P10), gave them new perspectives (P5), and gave everyone “…a chance to kind of build off of each other’s ideas, which I think is the biggest merit to discussion” (P3). These new ideas lead to additional critiques.
Increased reflection. Helping novice designers develop more effective strategies and mental models requires cognitive feedback that fosters reflection. Group critique can be an effective means of providing cognitive feedback because it requires learners to compare their process and solutions to those of other learners (van Merriënboer and Kirschner 2012, ch 7.). Designers described how critiquing other projects helped improve their own projects (P1, P6): “because like it’s nice to like remind yourself about something you saw was so great that you wanted to write about, then you’d like to incorporate it in your own project” (P6). Facilitators encouraged this comparison by reminding critiquers to follow their own advice. Group critique was essential for fostering this reflection because, outside of critiques, designers reported learning little about other teams’ activities even though they worked in close physical proximity to each other daily for several months.

Reflection

Designers reported being affected by the critique in a number of ways.

Usefulness of critique. Designers found the critique feedback useful. They reported that it was “definitely helpful” (P8) for generating new ideas and perspectives (P12, P8, P3, P5, P6, P10); helped them decide how to solve problems (P5); improve their products (P7); iterate and revise early stages of their process (P6, P7, P8); think critically about their solutions (P3); generate insights that completely changed how they thought about their project (P11, P12, P8, P5); present to clients (P9); and to iterate quickly (P1). Likewise, critiquers found that giving feedback also helped them improve their project and understand their problem in new ways (P1, P6).
Building reciprocity. An additional benefit of peer critique may have been to build reciprocity amongst designers (Fig. 6h). Designers reported that the studio offered “a larger support system” and that they were motivated to provide feedback to other teams because “we just want them to do well” (P11). They felt that other groups helped them by providing helpful feedback and that they should do the same (P6): “they were commenting for me, so I feel like, if they were commenting for me then I should be giving them the same attention…” (P2).

This sense of comfort and reciprocity did not exist in initial critique sessions where teams were unfamiliar with each other and relatively quiet (P10), but rather developed over time. Designers described a number of ways in which cooperative attitudes developed. Giving critique itself made critiquers feel good—they reported that it’s nice to be able to help other teams improve their project (P6). Having one’s critique upvoted, make critiquers feel that their comment was useful to others (P6), made them feel good (P2, P6), made them feel confident in their critique ability, and made them more likely to comment again (P1, P6). Critiquers reported being aware of who had given them feedback online (P1).

Other aspects of the crowdcritique system and social setting may have hindered the development of reciprocity. The anonymous up/downvoting system did not allow designers to see some of the people who had only participated online by upvoting. For some critiques, the design studio was joined by professional experts, which novice designers liked but also found intimidating (P9, P11) which led to fewer contributions because they felt they had less to contribute.

Discussion

In stark contrast to study 2, in which students opted not to use the critique system, study 3 found that a crowdcritique system that provided a more interactive, social-media-like interface appears to better deliver formative feedback than the rubric-based system. The combination of features in version 2 of the crowdcritique system: (a) formative framing; (b) critique training; (c) prep/write-first/write-last script; (d) interactive (public, near-synchronous, online) critique; and (e) like system; created a dual-channel critique that produces useful cognitive feedback, reciprocity amongst critiquers, and willingness to use the system.

Figure 7 summarizes the causal mechanisms by which the social and technical features of the crowdcritique system appeared to impact critique. The public, near simultaneous commenting system, combined with the write-first scripts created a larger number of ideas (both by reducing production blocking, sharing others’ ideas and making it easier for shyer participants to contribute) leading to a greater volume of written critiques. Written critiques were often revoiced as verbal critiques and verbal critiques were often reposted as written critiques. This greater volume of critique led to more useful critique overall. Furthermore, the public, near-simultaneous online system increased grounding, which along with the prompts guiding discussion increased the quality of critique resulting in more useful critique to designers. Upvoting, while it did not directly benefit designers, did increase critiquers’ confidence in their critique ability and increased participation. The public visibility of contributions also increased critiquers’ feelings of obligation to contribute to critiques, although the anonymous upvoting hid many of these contributions.

General Discussion: Designing Crowdcritique Systems

This design research project, asked: how might we design crowdcritique systems for project-based learning environments that provide learners with useful formative feedback on ill-defined problems and that are sufficiently desirable to learners so as to decrease the orchestration demands on instructors?

In answering this question, this project makes 3 contributions to research on intelligent systems for formative feedback by: (a) identifying the challenges that crowdcritique systems must overcome; (b) providing empirically supported design principles for crowdcritique systems; and (c) illustrating how crowdsourcing approaches can complement existing expert systems technologies.

Contribution 1: The Challenges of Hybrid Crowdcritique

The first contribution of this project is to identify the challenges of hybrid group critique. Face-to-face and online critique provides different benefits for supporting formative feedback on design for complex, ill-structured problems. However, crowdcritique systems that provide the benefits of both face-to-face and online critique must overcome many challenges.

Study 1 and 2 examined the use of a rubric-based crowdcritique system like those used in previous studies on peer assessment systems, where students grade each other anonymously. Although the system produced useful feedback when mandated by the instructor in study 1, upon closer inspection, learners chose not to use the system when given the choice in study 2, which does not bode well for building technology-supported formative feedback practices that transfer outside of the formal learning contexts or that will work at all in voluntary learning contexts.

Study 2 identified some of the core challenges in designing crowdcritique systems for formative feedback in voluntary contexts. Crowdcritique systems must: (a) help students understand the goals/criteria for quality work; (b) facilitate sharing of work; (c) avoid overscaffolding; (d) overcome evaluation apprehension, and (e) aid review and synthesis of feedback.

When these challenges are not overcome, learners resort to face-to-face discussion and lose advantages of online critique (scaffolding, recording, reduced production blocking). While existing online peer assessment systems can evade these issues through compulsion they forgo the advantages of face-to-face critique, which is not ideal for voluntary contexts or transfer.

Contribution 2: Design Principles for Crowdcritique

The second contribution of this project is to provide grounded principles for crowdcritique systems that overcome the core challenges of hybrid critique.

To support formative feedback for work on complex, ill-structured problems in voluntary settings, crowdcritique system should provide: (a) quick invites, (b) formative framing; (c) prep/write-first/write-last script (d) interactive critiquing (e) suggested and customizable scaffolds; (f) like systems; (g) community hashtags; (i) analysis tools; and (j) to-do items. Study 2 showed that a system with formative framing, write-first script; interactive critiquing, scaffolds and upvoting better supported the social conventions of a design critique. This in turn leads to the desired dual-channel face-to-face and online critique in which critiques are scaffolded and recorded with reduced production blocking, and also grounded through verbal discussion. This produces a higher volume of critique and cognitive feedback, and also creates an atmosphere of psychological safety (Edmondson 1999) where learners develop attitudes of reciprocity toward giving each other feedback.

The crowdcritique features deployed in study 3 overcame the main limitations of the rubric-based systems. We can also derive additional principles based on this approach that overcome challenges not addressed by study 3. Specifically, while the social crowdcritique system promoted dual channel critique, it did not provide quantitative information (useful for data analysis) or support review and synthesis of feedback. However, we can approach these issues with a similar social media approach, as described by the interface in Fig. 8. Rather than asking students to quantitatively score each other, we can allow them to define a variety of hashtags that indicate good work, work with minor flaws, and work with major flaws. These hashtags then quantitatively summarize which critique topics learners must focus on to improve their work. Initial pilot tests show that students are quite willing and able to use this approach, which fits with existing social media conventions (Easterday et al. 2015). Furthermore, these leveled hashtags allow for greater nuance in critiquers’ scoring and maintain a formative framing. More work is needed to further validate these principles, but, taken together the principles in Table 4 provide a comprehensive approach to designing social crowdcritique systems.

Table 4 Crowdcritique design principles for supporting formative feedback on design work

Full size table

Contribution 3: Crowdsystems for Providing Intelligent Formative Feedback

The third contribution of this work is to illustrate how crowdfeedback techniques that synthesize contributions of large groups of novices to produce formative feedback in scalable, generalizable way on ill-structured problems, can compliment existing AIED approaches.

The paradigm of AIED has been a single expert system providing feedback to a single learner. Expert systems work well for well-defined domains but cannot be designed to provide formative feedback on ill-structured problems with unknown answers. But as networked devices become ubiquitous, other approaches to designing intelligent systems in education have become possible (Dow et al. 2013; Hui et al. 2014a). Crowdfeedback offers one such approach. Instead of encoding expert schema into a stand-alone program, crowdfeedback distributes a shared schema across multiple novices whose collective behavior provides intelligent feedback in a way that is not possible by one novice alone. Whereas expert systems provide scale through replication, crowdsourcing provides scale through the prevalence of networked crowds (Howe 2006).

Furthermore in the case of crowdcritique, the work being done by the novice crowd not only provides useful feedback to novice designers, it also promotes learning amongst the novice critiquers—therefore it neither increases the orchestration demands on instructors nor squanders learners’ time.

Crowdfeedback may also create new opportunities and applications for expert systems approaches in mixed-initiative systems. For example, by converting the critiquers’ comments into quantitative ratings across iterations of design projects, expert systems can monitor student’s progress and suggest additional design problems or next steps, combining crowd intelligence with machine intelligence.

While, some may feel that crowdcritique does not conform to our conventional notions of artificial intelligence, this crowdfeedback technology allows us to solve educational problems that conventional approaches do not, and is therefore an approach that is here to stay.

Limitations and Future Work

The three exploratory design-based research studies reported helped develop an initial theoretical model of the process and causal mechanisms affecting crowdcritique in studio-based learning environments. This process of theory generation tested different interventions in different settings, allowing us to quickly uncover challenges and iteratively improve the model and intervention. As the model and intervention continue to hold across exploratory cases, future work must turn to validation. For example, quasi-experimental designs with pre/post assessments of work and learning will allow us to determine the extent to which peer feedback impacts performance and learning. Future studies will also include additional quantitative measures in the form of surveys and learning assessments that are quicker to measure and less subjective.

These studies focused on how crowdcritique systems can mobilize peers within studio-based learning environments to provide useful feedback to designers without increasing orchestration demands on the studio leads (such as instructors, peer facilitators or managers). This potentially changes the role of the studio lead in providing feedback. Future work should examine the new role for the studio lead, which might include exploring how best to combine studio lead and peer feedback, how studio leads can best scaffold and facilitate peer critique, or whether the studio lead’s time is best spent on other aspects of the studio-based learning environment.

These studies focused on the implementation of crowdcritique in particular studio-based learning environments, which is an important first step in designing useful, usable, and desirable crowdcritique systems. However, to effectively diffuse crowdcritique innovations in learning networks such as Design For America, we must also investigate the “marketing” challenge of how studios learn about, are persuaded to try, and finally decide to adopt crowdcritique practices.

Conclusion

This design research project provides both an initial theoretical model of the process and causal mechanisms affecting crowdcritique in studio-based learning environments and empirically grounded design principles for crowdcritique systems that provide intelligent feedback on complex problems. Not only are technical features such as invite tools; critique scaffolds; and community hashtags needed but also social features such as prep/write-first/write-last script. Such systems create a dual-channel conversation that increases the volume of quality critique by grounding comments, scaffolding and recording critique, and reducing production blocking. By blending the benefits of both face-to-face critique and computer-support, we can increase formative feedback critical for learners and reduce the orchestration burden on instructors. This work shows how we can design desirable feedback systems, allowing us to harness the power of crowds to radically increase the availability of formative feedback in formal and informal learning environments.

References

Ambrose, S. A., Bridges, M. W., DiPietro, M., Lovett, M. C., & Norman, M. K. (2010). How learning works: 7 research-based principles for smart teaching. San Francisco: Jossey-Bass.
Google Scholar
Bannan-Ritland, B. (2003). The role of design in research: The integrative learning design framework. Educational Researcher, 32(1), 21–24. Retrieved from JSTOR.org: http://www.jstor.org/stable/3699931.
Article Google Scholar
Barrett, T. (2000). Studio critiques of student art: as they are, as they could be with mentoring. Theory Into Practice, 39(1), 29–35.
Article Google Scholar
Bēhance. (n.d.). Bēhance. [Web page] Retrieved from http://www.behance.net/
Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., & Panovich, K. (2010). Soylent: A word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User Interface Software and Technology (pp. 313–322).
Blumenfeld, P. C., Soloway, E., Marx, R. W., Krajcik, J. S., Guzdial, M., & Palincsar, A. (1991). Motivating project-based learning: sustaining the doing, supporting the learning. Educational Psychologist, 26(3-4), 369–398.
Article Google Scholar
Blumenfeld, P., Fishman, B. J., Krajcik, J., Marx, R. W., & Soloway, E. (2000). Creating usable innovations in systemic reform: scaling up technology-embedded project-based science in urban schools. Educational Psychologist, 35(3), 149–164.
Article Google Scholar
Blumer, H. (1954). What is wrong with social theory? American Sociological Review, 19(1), 3–10.
Article Google Scholar
Brennan, J., & Williams, R. (2004). Collecting and using student feedback: A guide to good practice. Learning and Teaching Support Network.
Brown, A. L. (1992). Design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2(2), 141–178.
Article Google Scholar
Brown, A. L., & Campione, J. C. (1994). Guided discovery in a community of learners. The MIT Press.
Buxton, B. (2007). Sketching user experiences: Getting the design right and the right design. San Francisco: Morgan Kaufmann.
Google Scholar
Cho, K., & MacArthur, C. (2010). Student revision with peer and expert reviewing. Learning and Instruction, 20(4), 328–338.
Article Google Scholar
Cho, K., & Schunn, C. D. (2007). Scaffolded writing and rewriting in the discipline: a web-based reciprocal peer review system. Computers & Education, 48(3), 409–426.
Article Google Scholar
Cho, K., Schunn, C. D., & Charney, D. (2006a). Commenting on writing typology and perceived helpfulness of comments from novice peer reviewers and subject matter experts. Written Communication, 23(3), 260–294.
Article Google Scholar
Cho, K., Schunn, C. D., & Wilson, R. W. (2006b). Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. Journal of Educational Psychology, 98(4), 891–901.
Article Google Scholar
Cho, K., Chung, T. R., King, W. R., & Schunn, C. (2008). Peer-based computer-supported knowledge refinement: an empirical investigation. Communications of the ACM, 51(3), 83–88.
Article Google Scholar
Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. Perspectives on Socially Shared Cognition, 13(1991), 127–149.
Article Google Scholar
Collins, A. (1992). Toward a design science of education. In E. Scanlon & T. O’Shea (Eds.), New directions in educational technology (pp. 15–22). Berlin Heidelberg: Springer. doi:10.1007/978-3-642-77750-9_2.
Chapter Google Scholar
Collins, J., & Bielaczyc. (2004). Design research: theoretical and methodological issues. The Journal of the Learning Sciences, 13(1), 15–42.
Article Google Scholar
Connolly, T., Jessup, L. M., & Valacich, J. S. (1990). Effects of anonymity and evaluative tone on idea generation in computer-mediated groups. Management Science, 36(6), 689–703.
Article Google Scholar
Corbett, A. T., & Anderson, J. R. (2001). Locus of feedback control in computer-based tutoring: Impact on learning rate, achievement and attitudes. In J. Jacko, A. Sears, M. Beaudouin-Lafon, & R. Jacob (Eds.), Proceedings of the ACM CHI’2001 conference on human factors in computing systems (pp. 245–252). New York: ACM Press. Retrieved from http://portal.acm.org/citation.cfm?id=365111.
Google Scholar
Corbin, J., & Strauss, A. (2007). Basics of qualitative research: Techniques and procedures for developing grounded theory (3rd ed.). Thousand Oaks: Sage.
Google Scholar
Crinon, J. (2012). The dynamics of writing and peer review at primary school. Journal of Writing Research, 4(2), 121–154.
Article Google Scholar
Cross, N. (2006). Designerly ways of knowing. London: Springer.
Google Scholar
Cross, N. (2011). Design thinking: Understanding how designers think and work. Oxford: Berg.
Google Scholar
Crowther, P. (2013). Understanding the signature pedagogy of the design studio and the opportunities for its technological enhancement. Journal of Learning Design, 6(3), 18–28.
Article MathSciNet Google Scholar
Dannels, D. P., & Martin, K. N. (2008). Critiquing critiques a genre analysis of feedback across novice to expert design studios. Journal of Business and Technical Communication, 22(2), 135–159.
Article Google Scholar
Diamond, M. R. (2004). The usefulness of structured mid-term feedback as a catalyst for change in higher education classes. Active Learning in Higher Education, 5(3), 217–231.
Article Google Scholar
Dillenbourg, P., & Jermann, P. (2010). Technology for classroom orchestration. In M. S. Khine & I. M. Salhey (Eds.), New science of learning (pp. 525–552). New York: Springer.
Chapter Google Scholar
Dow, S., Gerber, E., & Wong, A. (2013). A pilot study of using crowds in the classroom. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 227–236).
Dribbble. (n.d.). Dribbble. [Web page] Retrieved from http://dribbble.com/
Easterday, M. W., Rees Lewis, D., & Gerber, E. M. (2014). Design-Based research process: Problems, phases, and applications. In Proceedings of the International Conference of the Learning Sciences (pp. 317–324). June 23-27, 2014, Colorado, USA
Easterday, M. W., Rees Lewis, D., & Gerber, E. M. (2015). Crowdsourcing critique. In CSCL crowd 2015, a workshop at the 2015 International Conference on Computer Supported Collaborative Learning. June 7, Gothenburg, Sweden.
Edelson, D. C. (2001). Learning-for-use: a framework for the design of technology-supported inquiry activities. Journal of Research in Science Teaching, 38(3), 355–385.
Article Google Scholar
Edmondson, A. (1999). Psychological safety and learning behavior in work teams. Administrative Science Quarterly, 44(2), 350–383. doi:10.2307/2666999.
Article Google Scholar
Estellés-Arolas, E., & González-Ladrón-De-Guevara, F. (2012). Towards an integrated crowdsourcing definition. Journal of Information science, 38(2), 189–200.
Gallupe, R. B., Bastianutti, L. M., & Cooper, W. H. (1991). Unblocking brainstorms. The Journal of Applied Psychology, 76(1), 137–142.
Article Google Scholar
Gerber, E. M., Marie Olson, J., & Komarek, R. L. D. (2012). Extracurricular design-based learning: preparing students for careers in innovation. International Journal of Engineering Education, 28(2), 317–324.
Google Scholar
Gero, J. S. (2002). Computational models of creative designing based on situated cognition. In Proceedings of the 4th conference on Creativity and Cognition, (pp. 3–10). New York: ACM.
Giles, J. (2005). Internet encyclopaedias go head to head. Nature, 438(7070), 900–901.
Article Google Scholar
Goldin, I. M., & Ashley, K. D. (2012). Eliciting formative assessment in peer review. Journal of Writing Research, 4(2), 203–237.
Article Google Scholar
Greenberg, M., Gerber, E., & Easterday, M. (2015). Critiki: A scaffolded approach to gathering design feedback from paid crowdworkers. In Proceedings of the 2015 ACM Conference on Creativity and Cognition.
Hamer, J., Kell, C., & Spence, F. (2007). Peer assessment using aropä. In Proceedings of the Ninth Australasian Conference on Computing Education (Vol. 66, pp. 43–54).
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York: Routledge.
Google Scholar
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
Article Google Scholar
Hokanson, B. (2012). The design critique as a model for distributed learning. In L. Moller & J. B. Huett (Eds.), The next generation of distance education (pp. 71–83). Springer.
Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6), 1–4.
Hui, J. S., Gerber, E. M., & Dow, S. P. (2014a). Crowd-based design activities: Helping students connect with users online. In Proceedings of the 2014 Conference on Designing Interactive Systems (pp. 875–884). Retrieved from doi:10.1145/2598510.2598538.
Hui, J. S., Greenberg, M. D., & Gerber, E. M. (2014b, February). Understanding the role of community in crowdfunding work. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 62–74). ACM.
Hui, J., Glenn, A., Jue, R., Gerber, E., & Dow, S. (2015, September). Using anonymity and communal efforts to improve quality of crowdsourced feedback. In Third AAAI Conference on Human Computation and Crowdsourcing.
Kelley, T., & Littman, J. (2001). The art of innovation: Lessons in creativity from IDEO, America’s Leading Design Firm. New York: Crown Business.
Google Scholar
Kelly, A. (2004). Design research in education: yes, but is it methodological? The Journal of the Learning Sciences, 13(1), 115–128.
Article Google Scholar
Kelly, A. E. (2006). Quality criteria for design research. Educational Design Research, 107–118.
Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., Lease, M., & Horton, J. J. (2013). The future of crowd work. In Proceedings of the 2013 conference on computer supported cooperative work (pp. 1301–1318).
Klebesadel, H., & Kornetsky, L. (2009). Critique as signature pedagogy in the arts. In Exploring signature pedagogies: Approaches to teaching disciplinary habits of mind (pp 99–120).
Kolodner, J. L., Crismond, D., Gray, J., Holbrook, J., & Puntambekar, S. (1998). Learning by design from theory to practice. In Proceedings of the Third International Conference of the Learning Sciences, Dec 16-19, 1998, Atlanta, GA (pp. 16–22).
Kolodner, J. L., Owensby, J. N., & Guzdial, M. (2004). Case-based learning aids. In Handbook of research for educational communications and technology (Vol. 2, pp. 829–861).
Krajcik, J. S., & Shin, N. (2014). Project-based learning. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (pp. 275–297). Cambridge University Press.
Kraut, R., & Resnick, P. (2012). Encouraging contribution to online communities. In R. Kraut & P. Resnick (Eds.), Building successful online communities: Evidence-based social design (pp. 21–76). Cambridge: MIT Press.
Google Scholar
Kulkarni, C., Wei, K. P., Le, H., Chia, D., Papadopoulos, K., Cheng, J., Koller, D., & Klemmer, S. R. (2015). Peer and self assessment in massive online classes. In H. Platter, C. Meinel, & Leifer (Eds.), Design thinking research (pp. 131–168). New York: Springer International Publishing.
Google Scholar
Lande, M., & Leifer, L. (2010). Difficulties student engineers face designing the future. International Journal of Engineering Education, 26(2), 271–277.
Google Scholar
Lawson, B., & Dorst, K. (2009). Design expertise. Oxford: Architectural Press.
Google Scholar
Lundstrom, K., & Baker, W. (2009). To give is better than to receive: the benefits of peer review to the reviewer’s own writing. Journal of Second Language Writing, 18(1), 30–43.
Article Google Scholar
Lynch, C. F., Ashley, K. D., Aleven, V. A., & Pinkwart. (2006). Defining “ill-defined domains”: A literature survey. In V. A. Aleven, K. D. Ashley, C. F. Lynch, & N. Pinkwart (Eds.), Proceedings of the workshop on intelligent tutoring systems for ill-defined domains at the 8th International Conference on Intelligent Tutoring systems (ITS 2006) (pp. 1–10). Johngli: National Central University.
Google Scholar
McKenney, S., & Reeves, T. C. (2013). Conducting educational design research. New York: Routledge.
Google Scholar
Miles, M. B., Huberman, A. M., & Saldaña, J. (2013). Qualitative data analysis: A methods sourcebook (3rd ed.). Los Angeles: Sage.
Google Scholar
Moggridge, B., & Smith, G. C. (2007). Designing interactions (Vol. 17). Cambridge: MIT Press.
Google Scholar
Moss, D., & Van Duzer, C. (1998). Project-based learning for adult english language learners. ERIC digest.
Oh, Y., Ishizaki, S., Gross, M. D., & Do, E. Y.-L. (2013). A theoretical framework of design critiquing in architecture studios. Design Studies, 34(3), 302–325.
Article Google Scholar
Patchan, M. M., Charney, D., & Schunn, C. D. (2009). A validation study of students’ end comments: comparing comments by students, a writing instructor, and a content instructor. Journal of Writing Research, 1(2), 124–152.
Article Google Scholar
Quinn, A. J., & Bederson, B. B. (2011, May). Human computation: a survey and taxonomy of a growing field. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1403–1412). ACM.
Rashid, A. M., Albert, I., Cosley, D., Lam, S. K. (2002). McNee systems. In Proceedings of the 7th International Conference on Intelligent User Interfaces (pp. 127–134).
Raymond, E. S. (2001). The cathedral & the bazaar: Musings on Linux and open source by an accidental revolutionary. Sebastopol: O’Reilly Media, Inc.
Google Scholar
Rogers, E. M. (2003). Diffusion of innovations. New York: Free Press.
Sawyer, R. K. (2012). Learning how to create: Toward a learning sciences of art and design. In Proceedings of the International Conference of the Learning Sciences (Vol. 1, pp. 33–39). June 2-6, 2012, Sydney, Australia.
Schön, D. A. (1987). Educating the reflective practitioner: toward a new design for teaching and learning in the professions. San Francisco: Jossey-Bass.
Google Scholar
Schrand, T., & Eliason, J. (2012). Feedback practices and signature pedagogies: what can the liberal arts learn from the design critique? Teaching in Higher Education, 17(1), 51–62.
Article Google Scholar
Shulman, L. S. (2005). Signature pedagogies in the professions. Daedalus, 134(3), 52–59.
Article Google Scholar
Simon, H. A. (1996). The sciences of the artificial (3rd ed.). Cambridge: MIT Press.
Google Scholar
Smirnov, N., Easterday, M. W., & Gerber, E. M. (2016). Scaling studio-based learning through social innovation networks. In Proceedings of the International Conference of the Learning Sciences, 2016, Singapore.
Spradley, J. P. (1980). Participant observation. Belmont: Wadsworth.
Google Scholar
Sue-Chan, C., & Latham, G. P. (2004). The relative effectiveness of external, peer, and self-coaches. Applied Psychology, 53(2), 260–278.
Article Google Scholar
Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. New York: Doubleday.
Google Scholar
Tohidi, M., Buxton, W., Baecker, R., & Sellen, A. (2006). Getting the right design and the design right. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (pp. 1243–1252). New York: ACM.
Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational Research, 68(3), 249–276.
Article Google Scholar
Van den Akker, J. (1999). Principles and methods of development research. In J. van den Akker, R. M. Branch, K. Gustafson, N. Nieveen, & T. Plomp (Eds.), Design approaches and tools in education and training (pp. 1–14). Netherlands: Springer.
Chapter Google Scholar
van Gennip, N. A., Segers, M. S., & Tillema, H. H. (2009). Peer assessment for learning from a social perspective: the influence of interpersonal variables and structural features. Educational Research Review, 4(1), 41–54.
Article Google Scholar
van Merriënboer, J. J. G., & Kirschner, P. A. (2012). Ten steps to complex learning: A systematic approach to four-component instructional design (2nd ed.). Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Van Zundert, M., Sluijsmans, D., & Van Merriënboer, J. (2010). Effective peer assessment processes: research findings and future directions. Learning and Instruction, 20(4), 270–279.
Article Google Scholar
VanLehn, K. (2006). The behavior of tutoring systems. International Journal of Artificial Intelligence in Education, 16(3), 227–265.
Google Scholar
VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221. doi:10.1080/00461520.2011.611369.
Article Google Scholar
Zurb. (n.d.). Forrst. [Web page] Retrieved from http://forrst.com/posts

Download references

Acknowledgments

This work supported by the National Science Foundation Grant No. IIS-1320693, No. IIS-1530833, and No. IIS-1217225 and Venture Well. An earlier version of study 3 was reported at the Proceedings of the 2014 Designing Interactive Systems Conference.

Author information

Authors and Affiliations

Northwestern University, Evanston, IL, USA
Matthew W. Easterday, Daniel Rees Lewis & Elizabeth M. Gerber

Authors

Matthew W. Easterday
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rees Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth M. Gerber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew W. Easterday.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Easterday, M.W., Rees Lewis, D. & Gerber, E.M. Designing Crowdcritique Systems for Formative Feedback. Int J Artif Intell Educ 27, 623–663 (2017). https://doi.org/10.1007/s40593-016-0125-9

Download citation

Published: 18 November 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s40593-016-0125-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Designing Crowdcritique Systems for Formative Feedback

Abstract

Similar content being viewed by others

Improving collaborative problem-solving skills via automated feedback and scaffolding: a quasi-experimental study with CPSCoach 2.0

Supporting Effective Collaboration: Using a Rearview Mirror to Look Forward

An Annotation Protocol for Collecting User-Generated Counter-Arguments Using Crowdsourcing

Background

Formative Feedback in Project-Based Learning

Face-to-Face Group Critique

Crowdfeedback

Crowdreview

Crowdcritique

Purpose

General Method

Design Research

Learner Contexts

Research Design

Study 1: Crowdcritique Pilot

Design Argument

Participants & Setting

Intervention

Data Collection and Analysis

Results

Discussion

Study 2: Crowdcritique Challenges

Design Argument

Participants

Setting

Intervention

Data Collection and Procedure

Analysis

Results

Discussion

Study 3: A More Social Crowdcritique

Design Argument

Participants

Setting

Intervention

Data Collection

Analysis

Results

Scaffolding Critique

Prompts

Prep

Dual-Channel Critique

Extended dual channel conversation

Coordinating channels by revoicing and reposting

Adapting to dual-channel conversation

Up voting decreases evaluation apprehension

Critique Outcomes

Reflection

Discussion

General Discussion: Designing Crowdcritique Systems

Contribution 1: The Challenges of Hybrid Crowdcritique

Contribution 2: Design Principles for Crowdcritique

Contribution 3: Crowdsystems for Providing Intelligent Formative Feedback

Limitations and Future Work

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation