Keywords

1 Introduction

Clear communication of assessment results to the intended audience contributes to the appropriate use of assessment information. The Standards for Educational and Psychological Testing [1] contain several guidelines on score reporting issues including the need to provide clear explanations of assessment results, evidence to support interpretations for intended purposes, information about recommended uses, and warnings about possible misuses.

The literature in this area includes guidelines and iterative development frameworks for designing score reports [2,3,4,5]. These iterative frameworks usually include activities such as gathering assessment needs from and evaluating score reports with the intended audience. Zapata-Rivera and Katz [6] apply audience analyses to design score reports based on the needs, knowledge and attitudes of the audience.

Score reports for traditional assessment types typically include assessment results at the individual level (e.g., total scores, subscores, performance levels, task-level results, and recommendations for follow-up activities), class level (e.g., roster of individual results, distribution of scores and performance levels), school level (e.g., distribution of scores per grade and subject) and district level (e.g., aggregate data across schools, and subgroups of students). These results are usually accompanied by introductory and ancillary materials aimed at helping the intended audience make sense of the assessment results embedded in the score report.

New assessments, such as video-based, simulation-based, and conversation-based assessments, can provide users with a variety of novel assessment information. New assessments can gather evidence of students’ knowledge, skills and abilities (KSAs) derived from several sources including student responses to predefined questions and process data [7]. In general, the effective communication of assessment results to the intended audience is critical to the validity of the assessment [6]. Therefore, it is important to carefully consider the types of feedback information that would be most appropriate and useful for these new assessment types. Moreover, it is important to carefully consider the feedback needs for formative assessment tasks, which tend to be woven into instruction, and are therefore intended to provide teachers with ongoing feedback about their students’ current level of understanding [8, 9].

In this paper, we describe two audience-specific score reporting systems for highly interactive assessments that are being developed to provide formative feedback for teachers. These two score reporting systems provide us with good examples of the types of communication and interaction issues that are present in the development of new types of assessments. We discuss commonalities between these systems focusing on mainly design and evaluation processes mainly, and elaborate on the implications for future work in this area.

2 Formative Feedback for Preservice and Teacher Educators Using a Simulated Classroom

The first system [10] provides formative feedback to preservice (i.e., student) teachers (PSTs) based on their performance teaching a group of virtual student avatars in a simulated performance-based task [11]. A user-based needs assessment constitutes the first step in an iterative multistep process typically recommended for score report development [2,3,4,5]. Following this recommendation, the first system was designed specifically to cater to the feedback needs of teacher educators and PSTs when formatively embedding simulated performance-based tasks into science and mathematics elementary methods courses. In these simulated tasks, administered multiple times during a methods course, PSTs are provided an opportunity to learn to facilitate high quality discussions among virtual student avatars within a simulated classroom environment. For feedback to be effective in a formative context, teacher educators (the teachers, in this context) should be able to diagnose gaps in PSTs’ (the students, in this context) current learning to modify instruction and PSTs should be able to understand their strengths and weaknesses to improve their performance [8].

Therefore, with the specific goals of developing a system that is primarily intended to inform instructional practice and guide ongoing learning, we designed a preliminary prospective score reporting (PSR) system [3] and used this PSR system to identify audience-specific score reporting needs from teacher educators and PSTs. The feedback elements incorporated within this PSR system were informed by previous research [5, 10] and through pilot testing. Among other features, the PSR system developed for the teacher educators included the ability to interactively score videos of PSTs’ performances, provide written and annotated video-based feedback to PSTs, and view summary level reports of the whole-class and individual student performance. The PSR system developed for PSTs included the ability to view teacher feedback for each dimension, and the ability to view and annotate one’s own video to respond and discuss with the teacher educator. Example screenshots of the scoring and reporting functionalities within this PSR system are presented in Figs. 1, 2, 3 and 4. Focus groups with the relevant stakeholder groups (i.e., teacher educators N = 8 and PSTs N = 5) were carried out to identify additional stakeholder-specific needs. With this goal in mind, participants in the focus group studies responded to usability questions that included comprehension and preference questions, and identified a prioritized list of user-specific needs.

Fig. 1.
figure 1

Example screenshot of scoring functionality with video snapshots of preservice teacher performance that serves as evidence for the score assigned for each behavior.

Fig. 2.
figure 2

Example screenshot of detailed class-level feedback on one dimension for one administration that shows the distribution of students across the score points.

Fig. 3.
figure 3

Example screenshot of preservice teacher report that shows total score and dimension scores across three test administrations.

Fig. 4.
figure 4

Example screenshot of preservice teacher report that allows the PST the ability to annotate their own video and use as a discussion tool.

Results [8] from the focus groups indicated that, in general, all participants reacted positively to the preliminary mockups and had some insightful suggestions for revisions (e.g., include benchmark performances; annotate visual representations; include the ability for PSTs to self-evaluate and respond to feedback). As suggested in the literature (e.g., [12, 13]), teacher educators reiterated that they favored qualitative annotated feedback directed for focused improvement during the first two administrations rather than providing quantitative scores. One interesting suggestion from these focus groups was the idea of providing support for self-reflection – both teacher educators and PSTs thought it would be a good idea for the PSTs to first evaluate and annotate their own performance using a teacher-provided check-list before the teacher educator scores and provides feedback on these performances.

Following the focus groups, needs generated from both groups (teacher educators and PSTs) were reviewed and duplicate needs were consolidated. Ultimately, 32 needs for the teacher educators and 19 needs for the PSTs were identified; of these, 16 needs were common across the two stakeholder groups, and therefore, included on both lists. The relative importances of these identified needs were then confirmed through a post-meeting survey. The top 10 needs that emerged for both teacher educators and the PSTs will be considered first in our prototype revisions (if they were not already incorporated in the preliminary mockups) during subsequent phases of the iterative report development cycle.

3 Feedback for Teachers on Students’ Interaction with Conversation-Based Assessments

The second system prototype provides teachers with information relevant to how students interact with conversation-based assessments [14]. This system also includes a dialogue-based tutorial aimed at teaching teachers about measurement error and how to make informed decisions based on this concept [15].

The PSR system was created following an iterative design process (See Fig. 4). Specifically, we created mock score reports for teachers to showcase unique features of conversation-based assessment (CBA) such as ability of students to discriminate between correct and incorrect answers, number of words generated, and amount of scaffolding received. These features were chosen because they are positively correlated with learning [16,17,18]. We created score reports for various domains including English language learning and assessment (ELLA) and science.

After developing the initial prototype, we refined the score reports across three iterations of teacher focus groups through the process of creating the mockup prototype-> gaining teacher feedback-> reviewing the feedback-> iteratively refining mockup prototypes. In total we conducted 2 focus groups for each domain. Teachers participating in the focus groups included 7 ELLA-mathematics teachers, and 5 science teachers. Three versions of the prototypes were created based on the teachers’ feedback. Questions for the focus group addressed comprehension and preference issues.

We discovered from the focus groups various representations that could be more helpful to teachers such as using color bars rather than levels in some instances and providing links to items and incorporating a conversational tutor. Overall, we discovered that teacher feedback was extremely important in creating a score report that meets the needs of teachers and thus may be used to help students.

Figures 5, 6 and 7 show screenshots of the mockup prototypes at a late stage of the process. Assessment results based on both student responses and process data were included in the reports. Figure 5 shows a description of one of the features (number of words). These descriptions are aimed at facilitating teachers’ understanding of and appropriate use of the information in the report. Figures 5 and 6 correspond to individual student reports for teachers in the ELLA domain and Fig. 7 in the science domain.

Fig. 5.
figure 5

Example screenshot of an individual student report for teachers in the context of an English language CBA. The report includes performance level information, overall scores and features extracted from process data at different levels (student-, class-, and school-level). (Color figure online)

Fig. 6.
figure 6

Example screenshot of an individual student report for teachers showing an explanation for number of words. (Color figure online)

Fig. 7.
figure 7

Example screenshot of an individual student report for teachers in the context of a science CBA. (Color figure online)

We then developed a conversation-based tutor to better help teachers understand measurement error because teachers who participated above-mentioned focus groups agreed that this type of support would be helpful in interpreting overall score information. Initial results from a pilot study (N = 6) suggest that this tutor was well-received by teachers but may need alterations in the dialogic framework to account for teacher answers which are different from common student responses [15]. Specifically, in a pilot study (N = 8), teachers answered 62.8% of questions about the system in a positive fashion. In regards to the dialogic framework, teachers gave elaborate responses that were extremely close to the correct answer. However, these responses were judged to be correct by the tutor but only partially correct by two human raters (with interrater reliability of 90.8% agreement). This phenomenon of teachers providing extremely close but not completely correct answers may be more unique to teachers as the framework has been successful in categorizing student responses in other systems.

4 Discussion

In this section we discuss several design and evaluation aspects of score report systems in the context of new assessments.

  • An iterative, audience-centered approach. These systems followed an audience-centered development and evaluation approach. Various mockups of the system were used to gather information about needs for assessment results in reports. These assessment needs are usually captured in a prospective score report (PSR) [3] that is used throughout the assessment development process by an interdisciplinary group of experts. The PSR can take the form of a paper-based mockup report or a reporting system mockup (or PSR system). As these PSR systems get iteratively refined based on the results of studies with experts and the intended audience, they can be used as communication tools to show different stakeholders the changes made to the original report design, and iteratively refine the information that would be included in the operational reports.

  • Communicating assessment results based on student response and process data. A clear alignment of the purpose of the reporting system and the types of claims and assessment information in the reporting system is essential for the creation of reporting systems that (a) provide the right type of information needed by the intended audience, and (b) support appropriate use of assessment information. The examples presented in this paper show how the purpose of the assessment (i.e., provide formative feedback for teachers) guided design and evaluation decisions. Results of the studies carried out as part of each project (e.g., focus groups, cognitive labs and usability studies) showed that teachers appreciated the type of information provided by the system. This information was not limited to total or subscores but also included feedback for PSTs from teacher educators based on a predefined rubric, in the case of the simulated classroom, and general linguistic features based on student performance across tasks, in the case of the CBA system.

  • Evaluating comprehension and preference aspects. Both comprehension and preference aspects of the reports should be part of the evaluation plan. Preferred representations are not necessarily better at supporting comprehension of assessment information [19,20,21]. Both, comprehension and preference questions were included in the questionnaires used to evaluate the reporting systems described above.

  • Supporting mechanisms. When evaluating reporting systems, needs for additional support can be identified. In some cases, providing additional information of the meaning of particular features and how they can be used is enough to help the audience understand and appropriately use assessment results. However, in some cases, additional supporting mechanisms such as video tutorials or dialogue-based tutors are necessary to teach challenging concepts [15, 21].

5 Future Work

Future work in this area includes designing and evaluating new feedback features. These features may involve the implementation of a continuous feedback loop in which teacher educators can support PSTs while improving their teaching skills using the simulated classroom. Also, providing teachers with video segments of students interacting with characters in the CBA.

In addition, we would like to make improvements to the conversation-based tutor based on the data collected and using this tutor to help teachers understand the concept of measurement error.