Designing and Evaluating Reporting Systems in the Context of New Assessments

Zapata-Rivera, Diego; Kannan, Priya; Forsyth, Carol; Peters, Stephanie; Bryant, Andrew D.; Guo, Enruo; Long, Rodolfo

doi:10.1007/978-3-319-91467-1_12

Diego Zapata-Rivera¹⁵,
Priya Kannan¹⁵,
Carol Forsyth¹⁵,
Stephanie Peters¹⁵,
Andrew D. Bryant¹⁵,
Enruo Guo¹⁵ &
…
Rodolfo Long¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10916))

Included in the following conference series:

International Conference on Augmented Cognition

1611 Accesses
3 Citations

Abstract

The effective communication of assessment results to the intended audience is an important issue that has implications for accomplishing the goals of an assessment. New assessments can provide score report users with a variety of additional evidence about the test taker’s knowledge, skills, and abilities, than has been possible with traditional assessments. Two audience-specific score reporting systems for highly interactive assessments are currently being developed to provide formative feedback for teachers. The first system provides formative feedback to preservice teachers based on their performance teaching a group of virtual student avatars in a simulated classroom. The second system provides teachers with information relevant to how students interact with a conversation-based assessment. These two score reporting systems provide us with good examples of the types of communication and interaction issues that are present in the development of new types of assessments. In this paper, we describe these two reporting systems, discuss commonalities between the two systems particularly focusing on the design and evaluation processes, and elaborate on the implications for future work in this area.

You have full access to this open access chapter, Download conference paper PDF

Interactive Score Reporting: An AutoTutor-Based System for Teachers

Aligning Assessment Goals with the Current and Future Technologies Needed to Achieve Them

Technology-Enhanced Assessment Feedback

Keywords

1 Introduction

Clear communication of assessment results to the intended audience contributes to the appropriate use of assessment information. The Standards for Educational and Psychological Testing [1] contain several guidelines on score reporting issues including the need to provide clear explanations of assessment results, evidence to support interpretations for intended purposes, information about recommended uses, and warnings about possible misuses.

The literature in this area includes guidelines and iterative development frameworks for designing score reports [2,3,4,5]. These iterative frameworks usually include activities such as gathering assessment needs from and evaluating score reports with the intended audience. Zapata-Rivera and Katz [6] apply audience analyses to design score reports based on the needs, knowledge and attitudes of the audience.

Score reports for traditional assessment types typically include assessment results at the individual level (e.g., total scores, subscores, performance levels, task-level results, and recommendations for follow-up activities), class level (e.g., roster of individual results, distribution of scores and performance levels), school level (e.g., distribution of scores per grade and subject) and district level (e.g., aggregate data across schools, and subgroups of students). These results are usually accompanied by introductory and ancillary materials aimed at helping the intended audience make sense of the assessment results embedded in the score report.

New assessments, such as video-based, simulation-based, and conversation-based assessments, can provide users with a variety of novel assessment information. New assessments can gather evidence of students’ knowledge, skills and abilities (KSAs) derived from several sources including student responses to predefined questions and process data [7]. In general, the effective communication of assessment results to the intended audience is critical to the validity of the assessment [6]. Therefore, it is important to carefully consider the types of feedback information that would be most appropriate and useful for these new assessment types. Moreover, it is important to carefully consider the feedback needs for formative assessment tasks, which tend to be woven into instruction, and are therefore intended to provide teachers with ongoing feedback about their students’ current level of understanding [8, 9].

In this paper, we describe two audience-specific score reporting systems for highly interactive assessments that are being developed to provide formative feedback for teachers. These two score reporting systems provide us with good examples of the types of communication and interaction issues that are present in the development of new types of assessments. We discuss commonalities between these systems focusing on mainly design and evaluation processes mainly, and elaborate on the implications for future work in this area.

2 Formative Feedback for Preservice and Teacher Educators Using a Simulated Classroom

The first system [10] provides formative feedback to preservice (i.e., student) teachers (PSTs) based on their performance teaching a group of virtual student avatars in a simulated performance-based task [11]. A user-based needs assessment constitutes the first step in an iterative multistep process typically recommended for score report development [2,3,4,5]. Following this recommendation, the first system was designed specifically to cater to the feedback needs of teacher educators and PSTs when formatively embedding simulated performance-based tasks into science and mathematics elementary methods courses. In these simulated tasks, administered multiple times during a methods course, PSTs are provided an opportunity to learn to facilitate high quality discussions among virtual student avatars within a simulated classroom environment. For feedback to be effective in a formative context, teacher educators (the teachers, in this context) should be able to diagnose gaps in PSTs’ (the students, in this context) current learning to modify instruction and PSTs should be able to understand their strengths and weaknesses to improve their performance [8].

Therefore, with the specific goals of developing a system that is primarily intended to inform instructional practice and guide ongoing learning, we designed a preliminary prospective score reporting (PSR) system [3] and used this PSR system to identify audience-specific score reporting needs from teacher educators and PSTs. The feedback elements incorporated within this PSR system were informed by previous research [5, 10] and through pilot testing. Among other features, the PSR system developed for the teacher educators included the ability to interactively score videos of PSTs’ performances, provide written and annotated video-based feedback to PSTs, and view summary level reports of the whole-class and individual student performance. The PSR system developed for PSTs included the ability to view teacher feedback for each dimension, and the ability to view and annotate one’s own video to respond and discuss with the teacher educator. Example screenshots of the scoring and reporting functionalities within this PSR system are presented in Figs. 1, 2, 3 and 4. Focus groups with the relevant stakeholder groups (i.e., teacher educators N = 8 and PSTs N = 5) were carried out to identify additional stakeholder-specific needs. With this goal in mind, participants in the focus group studies responded to usability questions that included comprehension and preference questions, and identified a prioritized list of user-specific needs.

Results [8] from the focus groups indicated that, in general, all participants reacted positively to the preliminary mockups and had some insightful suggestions for revisions (e.g., include benchmark performances; annotate visual representations; include the ability for PSTs to self-evaluate and respond to feedback). As suggested in the literature (e.g., [12, 13]), teacher educators reiterated that they favored qualitative annotated feedback directed for focused improvement during the first two administrations rather than providing quantitative scores. One interesting suggestion from these focus groups was the idea of providing support for self-reflection – both teacher educators and PSTs thought it would be a good idea for the PSTs to first evaluate and annotate their own performance using a teacher-provided check-list before the teacher educator scores and provides feedback on these performances.

Following the focus groups, needs generated from both groups (teacher educators and PSTs) were reviewed and duplicate needs were consolidated. Ultimately, 32 needs for the teacher educators and 19 needs for the PSTs were identified; of these, 16 needs were common across the two stakeholder groups, and therefore, included on both lists. The relative importances of these identified needs were then confirmed through a post-meeting survey. The top 10 needs that emerged for both teacher educators and the PSTs will be considered first in our prototype revisions (if they were not already incorporated in the preliminary mockups) during subsequent phases of the iterative report development cycle.

3 Feedback for Teachers on Students’ Interaction with Conversation-Based Assessments

The second system prototype provides teachers with information relevant to how students interact with conversation-based assessments [14]. This system also includes a dialogue-based tutorial aimed at teaching teachers about measurement error and how to make informed decisions based on this concept [15].

The PSR system was created following an iterative design process (See Fig. 4). Specifically, we created mock score reports for teachers to showcase unique features of conversation-based assessment (CBA) such as ability of students to discriminate between correct and incorrect answers, number of words generated, and amount of scaffolding received. These features were chosen because they are positively correlated with learning [16,17,18]. We created score reports for various domains including English language learning and assessment (ELLA) and science.

After developing the initial prototype, we refined the score reports across three iterations of teacher focus groups through the process of creating the mockup prototype-> gaining teacher feedback-> reviewing the feedback-> iteratively refining mockup prototypes. In total we conducted 2 focus groups for each domain. Teachers participating in the focus groups included 7 ELLA-mathematics teachers, and 5 science teachers. Three versions of the prototypes were created based on the teachers’ feedback. Questions for the focus group addressed comprehension and preference issues.

We discovered from the focus groups various representations that could be more helpful to teachers such as using color bars rather than levels in some instances and providing links to items and incorporating a conversational tutor. Overall, we discovered that teacher feedback was extremely important in creating a score report that meets the needs of teachers and thus may be used to help students.

Figures 5, 6 and 7 show screenshots of the mockup prototypes at a late stage of the process. Assessment results based on both student responses and process data were included in the reports. Figure 5 shows a description of one of the features (number of words). These descriptions are aimed at facilitating teachers’ understanding of and appropriate use of the information in the report. Figures 5 and 6 correspond to individual student reports for teachers in the ELLA domain and Fig. 7 in the science domain.

We then developed a conversation-based tutor to better help teachers understand measurement error because teachers who participated above-mentioned focus groups agreed that this type of support would be helpful in interpreting overall score information. Initial results from a pilot study (N = 6) suggest that this tutor was well-received by teachers but may need alterations in the dialogic framework to account for teacher answers which are different from common student responses [15]. Specifically, in a pilot study (N = 8), teachers answered 62.8% of questions about the system in a positive fashion. In regards to the dialogic framework, teachers gave elaborate responses that were extremely close to the correct answer. However, these responses were judged to be correct by the tutor but only partially correct by two human raters (with interrater reliability of 90.8% agreement). This phenomenon of teachers providing extremely close but not completely correct answers may be more unique to teachers as the framework has been successful in categorizing student responses in other systems.

4 Discussion

In this section we discuss several design and evaluation aspects of score report systems in the context of new assessments.

An iterative, audience-centered approach. These systems followed an audience-centered development and evaluation approach. Various mockups of the system were used to gather information about needs for assessment results in reports. These assessment needs are usually captured in a prospective score report (PSR) [3] that is used throughout the assessment development process by an interdisciplinary group of experts. The PSR can take the form of a paper-based mockup report or a reporting system mockup (or PSR system). As these PSR systems get iteratively refined based on the results of studies with experts and the intended audience, they can be used as communication tools to show different stakeholders the changes made to the original report design, and iteratively refine the information that would be included in the operational reports.
Communicating assessment results based on student response and process data. A clear alignment of the purpose of the reporting system and the types of claims and assessment information in the reporting system is essential for the creation of reporting systems that (a) provide the right type of information needed by the intended audience, and (b) support appropriate use of assessment information. The examples presented in this paper show how the purpose of the assessment (i.e., provide formative feedback for teachers) guided design and evaluation decisions. Results of the studies carried out as part of each project (e.g., focus groups, cognitive labs and usability studies) showed that teachers appreciated the type of information provided by the system. This information was not limited to total or subscores but also included feedback for PSTs from teacher educators based on a predefined rubric, in the case of the simulated classroom, and general linguistic features based on student performance across tasks, in the case of the CBA system.
Evaluating comprehension and preference aspects. Both comprehension and preference aspects of the reports should be part of the evaluation plan. Preferred representations are not necessarily better at supporting comprehension of assessment information [19,20,21]. Both, comprehension and preference questions were included in the questionnaires used to evaluate the reporting systems described above.
Supporting mechanisms. When evaluating reporting systems, needs for additional support can be identified. In some cases, providing additional information of the meaning of particular features and how they can be used is enough to help the audience understand and appropriately use assessment results. However, in some cases, additional supporting mechanisms such as video tutorials or dialogue-based tutors are necessary to teach challenging concepts [15, 21].

5 Future Work

Future work in this area includes designing and evaluating new feedback features. These features may involve the implementation of a continuous feedback loop in which teacher educators can support PSTs while improving their teaching skills using the simulated classroom. Also, providing teachers with video segments of students interacting with characters in the CBA.

In addition, we would like to make improvements to the conversation-based tutor based on the data collected and using this tutor to help teachers understand the concept of measurement error.

References

American Educational Research Association (AERA): American Psychological Association (APA), & National Council on Measurement in Education (NCME). Standards for educational and psychological testing, AERA, Washington, DC (2014)
Google Scholar
Wainer, H., Hambleton, R.K., Meara, K.: Alternative displays for communicating NAEP results: a redesign and validity study. J. Educ. Meas. 36(4), 301–335 (1999)
Article Google Scholar
Zapata-Rivera, D., VanWinkle, W., Zwick, R.: Applying score design principles in the design of score reports for CBAL™ teachers. ETS Research Memorandum RM-12-20. ETS, Princeton (2012)
Google Scholar
Zenisky, A.L., Hambleton, R.K.: A model and good practices for score reporting. In: Lane, S., Raymond, M.R., Haladyna, T.M. (eds.) Handbook of test development, 2nd edn, pp. 585–602. Routledge, New York (2016)
Google Scholar
Tannenbaum, R.J., Kannan, P., Leibowitz, E.A., Choi, I., Papageorgiou, S.: Interactive score reports: a strategic and systematic approach to development. In: Paper Presented at the Annual Meeting of the National Council on Measurement in Education, Washington, DC (2016)
Google Scholar
Zapata-Rivera, D., Katz, I.: Keeping your audience in mind: applying audience analysis to the design of score reports. Assess. Educ. Princ. Policy Pract. 21(4), 442–463 (2014)
Google Scholar
Zapata-Rivera, D., Liu, L., Chen, L., Hao, J., von Davier, A.A.: Assessing science inquiry skills in an immersive, conversation-based scenario. In: Kei Daniel, B. (ed.) Big Data and Learning Analytics in Higher Education, pp. 237–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-06520-5_14
Chapter Google Scholar
Black, P., Wiliam, D.: Assessment and classroom learning. Assess. Educ. Princ. Policy Pract. 5(1), 7–75 (1998)
Google Scholar
Shepard, L.A.: Formative assessment: Caveat emptor. In: Paper Presented at the 2005 ETS Invitational Conference on The Future of Assessment: Shaping Teaching and Learning, New York (2005)
Google Scholar
Kannan, P., Zapata-Rivera, D., Bryant, A.D., Long, R.: Providing formative feedback to pre-service teachers as they practice facilitation of high-quality discussions in simulated mathematics methods classrooms. Final Report. ETS, Princeton (2018)
Google Scholar
Mikeska, J.N., Howell, H., Straub, C.: Developing elementary teachers’ ability to facilitate discussions in science and mathematics via simulated classroom environments. In: Paper Presented at the Annual TeachLive Conference, Orlando, FL (2017)
Google Scholar
Koedinger, K.R., Corbett, A.T., Perfetti, C.: The knowledge-learning-instruction (KLI) framework: toward bridging the science-practice chasm to enhance robust student learning. Cogn. Sci. 36, 757–798 (2010)
Article Google Scholar
Shute, V.J.: Focus on formative feedback. Rev. Educ. Res. 78(1), 153–189 (2008)
Article Google Scholar
Peters, S., Forsyth, C.M., Lentini, J., Zapata-Rivera, D.: Score reports for conversation-based assessments: identifying and interpreting evidence. In: Paper Presented at the Annual Meeting of the American Educational Research Association (AERA), San Antonio, TX (2017)
Google Scholar
Forsyth, C.M., Peters, S., Zapata-Rivera, D., Lentini, J., Graesser, A., Cai, Z.: Interactive score reporting: an autotutor-based system for teachers. In: André, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 506–509. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61425-0_51
Chapter Google Scholar
VanLehn, K., Graesser, A.C., Jackson, G.T., Jordan, P., Olney, A., Rose, C.P.: When are tutorial dialogues more effective than reading? Cogn. Sci. 31(1), 3–62 (2007)
Article Google Scholar
Forsyth, C.M., Graesser, A.C., Pavlik, P., Millis, K., Samei, B.: Discovering theoretically-grounded predictors of shallow vs. deep- level learning. In: Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B.M. (eds.), Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014), pp. 229–232 (2014)
Google Scholar
Swanson, H.L.: Generality and modifiability of working memory among skilled and less skilled readers. J. Educ. Psychol. 84, 473–488 (1992)
Article Google Scholar
Kannan, P., Zapata-Rivera, D., Leibowitz, E.A.: The interpretation of score reports by diverse subgroups of parents. Educational Assessment (in press)
Google Scholar
Zwick, R., Zapata-Rivera, D., Hegarty, M.: Comparing graphical and verbal representations of measurement error in test score reports. Educ. Assess. 19(2), 116–138 (2014)
Article Google Scholar
Zapata-Rivera, D., Zwick, R., Vezzu, M.: Exploring the effectiveness of a measurement error tutorial in helping teachers understand score report results. Educ. Assess. 21(3), 215–229 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Educational Testing Service, Princeton, NJ, 08541, USA
Diego Zapata-Rivera, Priya Kannan, Carol Forsyth, Stephanie Peters, Andrew D. Bryant, Enruo Guo & Rodolfo Long

Authors

Diego Zapata-Rivera
View author publications
You can also search for this author in PubMed Google Scholar
Priya Kannan
View author publications
You can also search for this author in PubMed Google Scholar
Carol Forsyth
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Peters
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D. Bryant
View author publications
You can also search for this author in PubMed Google Scholar
Enruo Guo
View author publications
You can also search for this author in PubMed Google Scholar
Rodolfo Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego Zapata-Rivera .

Editor information

Editors and Affiliations

Office of Naval Research, Orlando, Florida, USA
Dylan D. Schmorrow
Design Interactive, Inc., Orlando, Florida, USA
Cali M. Fidopiastis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zapata-Rivera, D. et al. (2018). Designing and Evaluating Reporting Systems in the Context of New Assessments. In: Schmorrow, D., Fidopiastis, C. (eds) Augmented Cognition: Users and Contexts. AC 2018. Lecture Notes in Computer Science(), vol 10916. Springer, Cham. https://doi.org/10.1007/978-3-319-91467-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-91467-1_12
Published: 03 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91466-4
Online ISBN: 978-3-319-91467-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics