1 Introduction

In our daily life, we experience many difficult social issues that may have various viewpoints, making it difficult to find a proper solution in such situations. To enable a person to think deeply about the problems and arrive at a logical decision under any circumstances, it is important to enhance their meta-cognitive skills of “internal self-conversation.” The self-conversation requires creation of new knowledge by identifying conflicts between one’s own thinking and that of others. In order to train such a tacit process, Chen et al. proposed a thinking training environment for learners’ internal self-conversation [1]. The tool plays the role of a safety wheel for learners so that they eventually think about their self-conversation. Previous research includes continuously practiced educational programs using Sizhi for undergraduate university students and hospital nurses [2, 3]. It has been reported that the tool effectively works to cultivate the meta-cognitive skills of the learners. However, while the quality of the externalized thought (i.e. collective statements and their logic) varies depending on the learners, the difference between externalization processes of each learner’s self-conversation such as what the learner saw, thought, and verbalized is unrevealed.

The objective of this study is to propose a learning environment to analyze the internal self-conversation process based on a learner’s gazing behavior along with their thinking externalization action. Eye-gazing behavior can be used to interpret a learner’s exhaustive thinking process between actions; as the famous proverb goes: the eyes are as eloquent as the tongue. In addition to analyzing a learner’s thinking process, the system also analyzes the thinking process of the trainers who assess and adjust the learner’s outcome. To date, an eye-tracker has been mostly used to analyze the process of viewing advertisements or reading text (e.g. Web searching [4], difference between normal reading and mindless reading [5]). Although some research has been conducted on analyzing the verbalization process by using the eye-tracker (e.g. stimulated retrospective think-aloud [6]), as far as we know, there is no study on eye-gazing behavior during an internal self-conversation process. If the process could be interpreted and modeled in human-understandable levels, the system may judge whether the learner is thinking logically in the self-conversation or not, also the system may provide an intellectual support to foster the effective self-conversation process.

In following sections, we first introduce the importance of internal self-conversation and explain our approach. Then, we explain the internal self-conversation training system that has a function to trace the sequence of user’s eye-gazing information in the thinking process. Finally, we discuss the results of the initial analysis to validate the available eye-gazing data in the thinking process.

2 Sizhi: Thinking Training Environment for Self-conversation

In general, it is important to externalize one’s logical thinking. We cannot always express our thought process precisely. Therefore, it is must to focus on the effect of the “verbalization” of thoughts as a learning strategy and propose a model on verbalization [7] to meet the learning goals. The model describes the sequence of three phases: description (cyclic state of verbalizing one’s thought based on own experiences), cognitive-conflict (state of facing the conflict through the verbalization of one’s thought and interaction with others), and knowledge-building (cyclic state to resolve the conflict states). Along with the process, if learners actively think deeply about the problem faced by them as an internal self-conversation, their thinking process becomes unclouded and the thinking gets sophisticated.

In order for learners to train their ability on thinking process, Chen et al. proposed the thinking training environment called “Sizhi” [1] as shown in Fig. 1. They claim that the most important aspect in designing the tool is to clearly verbalize one’s own thinking (thinking-A) and that of others (thinking-B) by reflecting on one’s own thinking process in a logical manner, and reflect on the thinking process to find meaningful conflicts. In the tool, a learner must verbalize his/her thinking by switching Sizhi tabs that represents three thinking phases based on [7], and by selecting Sizhi tags to add descriptive statement expressing the type of thought (detail information is described in Sect. 3.1).

Fig. 1.
figure 1

Sizhi interface proposed by [1]

They also propose the replay tool called “Sizhi player” to capture a learner’s thinking reflection process [8]. The tool has a function to display the internal self-conversation process by reading Sizhi log. Since the replay is based on a learner’s externalized actions (e.g. clicking the mouse and pushing keyboard buttons), the reason for the learner’s action is still implicit. Therefore, the interpretation depends on viewers.

Our research with eye-gazing information as a key aims to lighten a part of the learner’s meta-cognitive monitoring and control process in the context of self-conversation.

3 Proposed System

3.1 System Interface

Figure 2 shows the interface of the system we developed. The system follows the learning design concept of Sizhi explained in the previous section. The interface depicts four thinking areas: “A’s-thinking” denotes one’s own thinking, “B’s-thinking” denotes opponent’s thinking, “conflict” denotes the difference between A’s-thinking and B’s-thinking, and “knowledge-building” denotes dissolving the root of conflict. In A’s-thinking, B’s-thinking and knowledge-building areas, the user can add/delete their statements using the Statement edit buttons and input the statement text by selecting pre-defined Sizhi tags such as fact, hypothesize, decision, assumption, policy/principle etc. The user can also add other statements as references to express the reason for adding the statement. To help the learner to gain deep insight into conflicts in the cognitive-conflict areas, the user is allowed to select only one policy/principle statement from each of the statements described in A’s-thinking and B’s-thinking area, and express the root of conflict into the text area.

Fig. 2.
figure 2

Gaze-aware Sizhi interface

To track a user’s gazing behavior in internal self-conversation process, we introduced a screen-based eye-tracker device (Tobii Pro X2-30 [9]) that provides gaze data at 30 Hz. The device distinguishes the type of area in the interface a user is looking at by setting area of interest (AOI) regions to areas and objects. This way, even if the user moves the positions of the objects (e.g. positions of statements move by scroll-bar or up/down button in the interface), our system correctly detects the targets by judging whether the eye movements fall within such AOI at each frame. Currently, it recognizes four types of thinking areas (A’s-thinking, B’s-thinking, conflict, and knowledge-construct), each statement area itself and the included components (areas of Sizhi tag, reference, and text), conflict text area, and edit buttons. The system records the user’s activity details, which includes user’s gazing events and thinking externalization action (i.e. keyboard and mouse events).

We assume that the proposed system can be used to analyze the meta-cognitive thinking process in the following conditions:

  • Externalizing internal self-conversations by learners; the difference between the sequences of learners’ gazing data by collating their critical thinking skills.

  • Correcting the learner’s outcome by skillful trainers; type of verbalized thoughts the trainers tend to focus on, and adjust them for exposing the root of conflict using the clue of gazing data.

In addition, the system has a potential to introduce interactive situations during learner-learner/trainer (e.g. to show the sequence of learner’s gazing targets to the trainer).

3.2 Output Data for Analysis

To analyze the internal self-conversation process, it is necessary that the system records the user’s exhaustive behavior through the session as much as possible. Table 1 shows the specification of the system log file format. The file is generated as comma separated value (csv) format, and each line corresponds to each detected event. Row 1 and 2 indicates the time data of the event. For analysis purpose, the system records each event on the millisecond time scale. Row 3 represents meta-event name: events in operations of statement object (STATEMENT_EVENT), operations of conflict area (CONFLICT_EVENT), eye-gazing target changing (GAZE_EVENT), and user’s operations to the system (SYSTEM_EVENT). Row 4 corresponds to the specific event name of the meta-event, and the following data (after Row 5) shows the detail information of the event.

Table 1. System log file format

For example, when a user clicks a statement (id: 4, 3rd statement from the top in A’s-thinking area, reason: nothing), following is the output on the millisecond time scale:

      

1450256939878

2015-12-16-18-08-59-878

STATEMENT_EVENT

FOCUS_TEXT_AREA

ThinkingA

4

3

Tag

Statement Text

      

Especially, in case of changing eye-gazing target, nested records of the target object’s IN/OUT event appears in pairs when the eye-coordination falls within/out of the target AOI area:

         

1450256431512

2015-12-16-18-00-31-512

GAZE_EVENT

IN

Statement

Construct

2

2

Tag

Statement text

4 5

7 4

         

1450256431815

2015-12-16-18-00-31-815

GAZE_EVENT

OUT

Statement

Construct

2

2

Tag

Statement text

4 5

7 4

         

The records present the data when a user looks at the statement (id: 2, 2nd statement from the top in knowledge-construct area, reason: statements 4 and 5), and looks away from the same statement after a few seconds (303 ms). Thus, the log data allows us to trace not only the time taken by a user to make action on each statement but also the kind of objects a user looks at during the internal self-conversation process.

4 Initial Analysis

4.1 Data Collection

To validate the availability of using eye-gazing data in the thinking process, we conduct an experiment for collecting the data of trainers’ correction process. Here, two trainers (T1 and T2) who have some experience in correcting learners’ cases through the thinking method workshop corrected three hospital nurses’ cases (C1, C2 and C3).

Before using the system, to precisely detect what a trainer is looking at during the correction process, we asked the trainers to calibrate the eye-tracker. They looked at a series of displayed points. Then, they opened the case files in the proposed system and started to correct the case. The trainers continued the process until they were fully satisfied. As a result, we obtained six log files (two trainers multiplied by three cases) as described in Sect. 3.2.

In internal self-conversation, the important factor is to clearly write ones’ own case by reflecting on individual thinking process using Sizhi tags, and to find meaningful conflicts of the case [1]. Based on the concept, as the first step in analyzing the trainers’ gazing behaviors, this initial analysis especially focuses on the trainers’ correction process of the conflict. We analyzed the following features:

  • Eye-gazing process in each thinking area; the sequence of turning eye-gazing intervals in A’s-thinking, B’s-thinking, conflict, and knowledge-construct areas through the correction process.

  • Timing of setting the conflict statements; the number of clicking select/release button in conflict area.

  • Timing of verbalizing actions; keypress to the statements in the A’s-thinking and B’s thinking area.

4.2 Result

Quantitative Analysis.

Table 2 shows the overall results of total eye-gazing time in each area. Vertical axis indicates the time (milliseconds). From the results, we infer that the correction time differs for each trainer. The total time of the eye-gazing is shorter than the total time of the session. This is because the frames of user’s eye-coordination were not detected by the eye-tracker (e.g. while blinking, looking at the keyboard and not at the display etc.). The worst session was C2&T2 where eye-gazing data was not found for 16.3 % of the total session time. The best was C1&T1 as only 3.4 % data was lost during the session time.

Table 2. Result of total gazing time in each area

Figure 3 represents the bar graphs based on the total gazing time on each thinking area. The gazing rates of knowledge-construct area (green) are very small for all cases. The result suggests that both the trainers gave their full attention to unearth meaningful conflicts of the correcting case. On the other hand, in the result of gazing time in conflict area, T1 relatively took much time to focus on the area than T2, especially in C1 and C2. The result suggests that the correction policy depends on trainers.

Fig. 3.
figure 3

Bar graphs of total eye-gazing time in each area (Color figure online)

Table 3 shows the number of keypress in each area. Table 4 shows the number of select/release A’s/B’s-statement which is the Sizhi tag displaying the policy/principle as a root of conflict. From the result, both the trainers modified conflict text of cases (only 2 times in C1&T2) very little. In particular, T1’s keypress actions for C1 and C2 occurred in only the statements of conflict area, while that of other four results occurred in the statements of A’s-/B’s-thinking areas (Table 3). This result indicates that T1 largely agreed to the policy/principle conflict statements originally described by the nurses of C1 and C2, and edited their statement texts in conflict area. In fact, as shown in Table 4, T1 did not select/release the conflict statements while correction. On the other hand, other results suggest that the trainers did not agree with the original conflict statements, and they tried to pick up the policy/principle statements from the statements in A’s-/B’s-thinking area and edited them so as to be a root of conflicts. This hypothesis supported by the select/release data in Table 4: except of T1&C1 and T1&C2, the trainers replaced the original conflict statements.

Table 3. Result of the number of keypress in each area
Table 4. Result of the number of select/release A’s/B’s statement in conflict area

Timeline Sequences of Eye-Gazing Process.

Figures 4, 5 and 6 represent the timeline of the correction process in each case. Each timeline graph has three sub-timelines shown from left to right. Though the total session time shown in Table 2 is different from the correction process, they are normalized in the same time scale. Upper timeline shows eye-gazing sequence in each of the four types of thinking area; middle timeline indicates the eye-gazing sequence in policy/principle statements and the conflict text areas in conflict area; and lower timeline shows the keypress action throughout the correction process. According to the visualized results, we can grasp that the sequence of eye-gazing in thinking areas are not always chaotic but has some block of time width. This gives us an important clue to infer the succession of trainers’ monitoring and control process.

Fig. 4.
figure 4

Result of timeline: case C1 (Color figure online)

Fig. 5.
figure 5

Result of timeline: case C2 (Color figure online)

Fig. 6.
figure 6

Result of timeline: case C3 (Color figure online)

In case of T1&C1, for example, the trainer first looked at conflict area to understand the nurse’s original root of conflict (purple area on the left of Fig. 4 (T1)). Then, the trainer focused on a task to understand the statements in A’s-/B’s-thinking area (blue and orange areas on the left of Fig. 4 (T1)). Especially, the trainer took time to understand the statements in A’s-thinking area; hence, we assume that the trainer tried to not only understand but also confirm whether the statements were consistent in terms of the nurse’s root of conflict. After that, the trainer must have started correcting the policy/principle statements in conflict area (purple area in middle of Fig. 4 (T1)). We find from the picture of the process that the trainer first revalidated the conflict text. Then, as shown by the keypress data, the trainer modified the text in the statements to improve them. The result of following timeline shows that the trainer seldom or never spent time to confirm the corrected conflict. Instead, the trainer devoted all his attention to revalidate the logic of A’s-/B’s-thinking (blue and orange areas in the last half of Fig. 4 (T1)). From the log data, though the trainer did not edit the statement text itself, the trainer took several actions to press the statement up/down button. As a whole, we speculate that the trainer first confirmed, corrected, and consented his correction in the first half of session (the most important part of the correction objective) to find meaningful conflicts of the case, and then spent the latter half to check the minor part of the original statements for desirable ones (e.g. change displaying order of the statements).

We also infer that each trainer universally started by focusing on the conflict area. The results convinced us that their correction policy was first to understand the original root of conflict. In addition, in the case of C3, as the keypress sequences of both trainers did not merge, this case might be considered as the logical structure of the original one, which was not clearly verbalized. In this manner, by considering eye-gazing process between trainer’s actual actions for correction, there is some possibility to interpret the context of trainer’s monitoring and control process.

As an initial analysis, we mainly focus on the comprehensive features such as the amount of eye-gazing time in thinking areas. As described in Sect. 3.2, based on the log data, we can analyze how the trainers modified the target statement tag/text while comparing the statements and the displaying order of statements in thinking areas. For future work, we must provide the detail eye-gazing process such as the type of Sizhi tag particularly focused by the trainers and their correction process.

5 Conclusion

In this paper, we proposed a novel gaze-aware internal self-conversation system that has can record the sequence of user’s eye-gazing information. To validate the availability of the system, we conducted an initial data analysis based on the trainers’ correction data. From the results, we confirm that there is some possibility to interpret the context of a trainer’s monitoring and control process.

Unlike the trainers’ correction process, which started with understanding the learners’ output, learners need to verbalize their thought from scratch. We believe that the log data includes the different tendencies of the thinking composition process from the thinking ability perspective of the learner, e.g. critical thinking skills. In order to make the difference clear, we have a plan to conduct an experiment for collecting the internal self-conversation data of a learner.