Keywords

1 Introduction

Recently, it is easy to collect learning logs to do learning analysis by using data collecting tools. For example, the digital textbook system can be seen as a kind of educational data collecting tool. Both traditional and online platforms are using the digital textbook system. Many countries plan to use digital textbooks replaced the traditional textbooks in schools. Such as Korean, Japan (Yin et al. 2014). The traditional textbook has been replaced gradually (Yin et al. 2018). By using digital textbooks, it is possible to collect a significant amount of logged educational data. These log data are a recording of learning practices such as marking, memo. Recently, researchers have begun to pay attention to the utilization of the learning logs of the digital textbook systems (Yin and Hwang 2018).

On our previous research (Yin et al. 2018), by using digital textbooks reading logs, a k-means clustering is employed to group students into clusters, and analyzed the learning behavioral patterns of each group by using some features, such as number of pages read, reading times, and backtrack reading rates. Students were grouped into four clusters and their learning behavioral patterns were analyzed. We examined whether the learning behavioral patterns are related to the learning outcomes. To be more specific, we considered the students’ grade achieved (in the end of the semester) as a way to evaluate their performance.

However, some features stronger association with learning outcomes than others. Therefore it is essential to assign a reasonable weight to each features of clustering (Nie et al. 2017). In this paper, weighting the features is different from other researches, when we carry out Correlation Analysis between features and grade achieved. If the correlation coefficient is bigger, then add a bigger weight to the feature; if the correlation coefficient is smaller, then add a smaller weight to the feature.

After weighting the features, we grouped students into five clusters and analyzed their learning behavioral patterns. We found some interesting patterns, such as the students who always use the MEMO or UNDERLINE function, are the ones who always get better grade achieved.

2 Literature Review

2.1 Learning Analytics (LA)

LA can positively influence on learning effectiveness (Archer et al. 2014; Hrastinski 2009; Yin et al. 2017; Yin and Hwang 2018). LA can be used to optimize learning and teaching processes (Colvin et al. 2015). Yin and Hwang (2018) indicated that LA can benefit different roles:

  • LA can help students to improve and share their learning experience;

  • LA can help teachers to get feedback from learners and identify the learning strategies;

  • LA can help to evaluate the structure of course content, and to evaluate teaching materials;

  • LA can help to evaluate teachers and students.

Digital textbooks reading log data based LA is an emerging topic. Yin and Hwang (2018) proposed several potential research directions of LA for digital textbooks:

  1. (1)

    Prediction.

    • By analyzing digital textbooks based learning logs to provide supports and make predictions.

  2. (2)

    Structure Discovery.

    • Discovering learning behavioral patterns.

    • Identifying the impacts of different learning strategies.

    • Investigating the factors affecting students’ performances.

  3. (3)

    Relationship Mining.

    • Investigating the correlations between students’ behaviors and performances.

A lots of research is focused on using LA for Structure Discovery. Clustering, Factor Analysis, Knowledge Inference and Network Analysis are common analysis methods of structure discovery. This paper is also focused on Structure Discovery, which attempts to find learning patterns by using e-book reading data.

2.2 Clustering

The k-means clustering involves grouping similar records together in a large multidimensional data set (Aggarwal and Yu 1999). The records within a cluster exhibit high similarity to each other; however, they are dissimilar to records in other clusters. Dissimilarity and similarity features are based on the attribute values that describe the records and generally involve distance metrics (Han et al. 2011).

Clustering Analytics are widely used as a tool for analyzing multivariate data. Utilizing k-means clustering, Cheng and Tsai (2014) reported their success in identifying the features of reading behaviors in an augmented reality picture book. Our research purpose is to identify the features of reading behaviors on digital textbooks, in a more educational context. Therefore, k-means clustering analysis is conducted in the current work.

2.3 Previous Work

On my previous research (Yin et al. 2017), we examined whether the learning behavioral patterns exhibited relations with the learning outcomes and identified learning patterns from e-book log data. To achieve these objectives, we used visualization technologies to identify unobserved learning behaviors. We then analyzed their learning behavior patterns by using k-means clustering.

The students were grouped into four clusters of varying learning behavior patterns: the “preview and diligent group,” “efficient group,” “diligent group,” and “poor performance group.” The following observations emerged from the analysis results:

  1. (1)

    Backtrack reading learning behavior has its merit because it can aid students in saving time while studying. The results reveal that the learning behavior of “backtrack” style reading exerts a significant positive influence on learning effectiveness, which can aid students to learn more efficiently.

  2. (2)

    A reasonable learning behavior complemented by sufficient learning time can yield excellent learning results.

There are three differences with the previous research:

  1. (1)

    In current research, the e-book reading log data used come from a commercial law course. In the previous research, the e-book reading log data is collected from programming learning courses.

  2. (2)

    The previous research did not care about weighting the features of clustering.

  3. (3)

    The number of features used in the previous research is smaller than current research.

3 DITeL

3.1 System

Collecting data is the first step in learning analysis (Yin et al. 2013; Yin et al. 2013b). In order to collect educational data, we build a digital textbook system, which is named as digital textbook for improving teaching and learning (DITeL). Teachers upload all the teaching materials needed to the DITeL. The DITeL system has been developed to collect textbook reading data such as “turning to next/previous page,” “memo,” “zoom in/out,” “adding marker.” The aforementioned reading actions are termed as events in this study. The DITeL system can be used not only on a personal computer, but also on mobile devices such as smartphones and tablets, thereby making it usable anywhere and anytime.

The learning logs of the students were collected to analyze their learning behaviors for improving the DITeL system (Yin et al. 2017; Li et al. 2018).

Figure 1 shows the interface of DITel. The interface has some buttons at the top. The students can interact with the e-book, to navigate through its contents, add notes and highlights. All of these actions are stored in the database. These data were used to analyze learning behaviors of the students. More specifically, the available events are the following:

Fig. 1.
figure 1

Interface of DITeL

Turning to next/previous page: Students can read the contents multiple times; they navigate to the next page by clicking the “Next” button, and backtrack to the previous page by clicking the “Prev” button.

Memo: A user can click the “Memo” button to write something in the learning content. After the user finishes writing the memo in the textbox provided, the action name will be saved as “Memo.”

Zoom in/out: The zoom in/out function can expand and reduce font size so that it can help students read the contents clearly.

Adding marker: A user can click “Highlight” or the “Under line” button when s/he wants to highlight text in the learning content, and the action name will be saved as “Highlight” or “Underline.

3.2 Participants

The data used in this study were collected during a commercial law course in March 2017- July 2017and March 2018–July 2018 at Jinan University in China. The learning goal of the course was to teach students the basic commercial law.

The students were provided the teaching materials for the subsequent class and were asked to prepare the lesson before the subsequent class. A total of 234 graduate students (aged 18 to 19) attended the course. Nine students were removed from the study sample after data processing, including drop-out students among others. The data from the remaining 225 students were used. Among the participants, 22.4% were female and 77.6% were male. In these students, 6% were from the School of Education, 10% were from the School of Letters, 13% were from the Faculty of Science, three were from the School of Medicine, and 69% were from the Faculty of Engineering.

Prior to entering the class, they had no previous experience of using DITeL, an e-book system that records students’ learning behaviors when they read e-books.

The experiments were conducted following the ethics criteria suggested by an authorized ethics committee in Japan in order to protect the participants. Further, the personal information of the participants was hidden.

3.3 Data Collection

Collecting data is the first step in learning behavior analysis (Yin et al. 2013b, 2013; Yin et al. 2018). Table 1 presents a sample of reading action logs. Each data log contains date, time, user ID, Content ID, page number, user action, devices, and other data. A total of 200,000 records were gathered from 2017 to 2018.

Table 1. Sample of records

The reading action logs for “Action Time” listed the students who engaged in this behavior, and we calculated the number of times this occurred.

The e-book features were used in this research:

  • Log in times(LGI): The number of times a student log in to DITeL system.

  • Number of “Memo” (NM): The number of times a student writes memo.

  • Number of “Underline”(NUL): The number of times a student makes underline.

  • Number of “Highlight” (NHL): The number of times a student makes highlight.

  • Number of “Next” (NN): The number of times a student turns to subsequent pages.

  • Number of “Prev” (NP): The number of times a student returns to previous pages.

  • Reading Pages (RP): The total number of pages that a student read. The reading action logs for “Page No.” and “Action Time” listed the number of pages the students read. Many of them repeatedly read specific pages.

  • Read Time (RT): The total time spent reading the learning contents. The reading action logs “Action Time” listed the lengths of time students spent reading the learning content. RT was calculated on an hourly basis.

  • Backtrack reading rate (BRR): BRR is a hidden measure, which is also calculated using NN and NP. This will be described in the following section.

4 Method

4.1 Correlation Analysis

The Grade Achieved (GA) was used to measure the learning outcomes. We analyzed a number of variables that could affect performance, including the behaviors and their related variables (time spent reading pages, etc.). SPSS (IBM SPSS Statistics, New York, USA) was used to determine the partial correlation of GA with these variables.

Table 2 lists the results, which indicate that the variable GA exhibits a significant positive correlation with LGI, NM, NHL, NUL, BTR, RP, and RT. In addition, based on the results of partial correlation, a k-means clustering analysis was conducted to cluster students into groups to analyze the features of the learning behaviors in each group.

Table 2. The correlation GA with features

The partial correlation values are used the weighted the features of k-means clustering.

5 Learning Behavior Patterns

After clustering, we got five clusters: C1, C2, C3, C4 and C5. Compare these 5 groups, C4 and C3 obtained similar learning achievement (there is no difference between the GA of C4 and C3), and C4 and C3 exhibited a higher learning achievement. The performance of C4 and C3 is higher than C2, the performance of C2 is higher than C1, and the learning achievement of C1 is higher than C5 (GA: 3 > 2 > 1 > 5; 4 > 2 > 1 > 5) (Table 3).

Table 3. The result of clustering

Compare C1 with Other Groups:

C1 (n = 52) (Table 4): C1 students obtained higher learning achievement (GA: mean = 0.56192, SD = 0.1062) than C5. While C5 students exhibited significantly higher tendency to go back the previous pages of the learning content compared with the C1 students (BTR: 1 > 5).

Table 4. Compare C1 with other groups

In the previous research, it is found that (Backing Track Rate) BTR is a good learning strategy which can help students to save time; Although students learn with BTR learning strategy, it is still necessary to spend more time reading the learning content in order to ensure higher learning achievement (Yin et al. 2018). Comparing these students in C5 and C1, the BTR learning strategy does not work for the C5, because these students did not spend more time reading the learning contents.

Among all the students, those of C4 exhibited the low learning achievement (GTA: 1 < 2, 1 < 3, 1 < 4) as well as the lowest total count of login count, reading pages and reading time (LGI, RP, RT: 1 < 2, 1 < 3, 1 < 4). Therefore, owing to their behavior of only reading the pages of the learning content in sequence, C4 is identified as the “poor performance group.”

Compare C2 with Other Groups:

C2 (n = 53) (Table 5): C2 (n = 29, Table 5: C2): C2 students also exhibited a significantly higher tendency to write memo and make highlight compared with C5 and C1 students (NM, HL: 2 > 5; 2 > 1). C2 students reported significantly higher learning achievement than C5 and C1 students (GTA: 2 > 5; 2 > 1). While the BTR of C5 shows a significantly higher tendency compared with C2 students (BTR: 5 > 2), and compare to the BTR, there is no difference with C1 and C1. Therefore, Memo and Highlight exert a significant positive influence on learning effectiveness, Memo and Highlight are “good” reading strategies, and BTR could not influence the learning effectiveness, if the Memo and Highlight have significantly difference.

Table 5. Compare C2 with other groups

Compare C3 with Other Groups:

Compare these 5 groups (Table 6), C4 and C3 obtained similar learning achievement (there is no difference between the GA of C4 and C3), and C4 and C3 exhibited a higher learning achievement. The learning achievement of C4 and C3 is higher than C2, the learning achievement of C2 is higher than C1, and the learning achievement of C1 is higher than C5 (GA: 3 > 2 > 1 > 5; 4 > 2 > 1 > 5).

Table 6. Compare C3 with other groups

6 Discussion and Conclusion

Yin and Hwang (2018) indicated that several potential research issues of LA for e-books: such as Identifying students’ behavioral patterns from e-book-based learning logs, investigating the impacts of different learning strategies on students’ behavioral patterns, using LA approaches to investigate the factors affecting students’ learning performances.

This paper focused on using LA to investigate the factors affecting students’ learning performances by using clustering method. To further understand students’ possible behavior patterns, cluster analysis was employed. Students were clustered into groups according to the similarity in their learning behaviors. We then analyzed the learning behavior features in each group.

In this research, the k-means clustering method is improved by weighting the features of clustering, and the number of features is also increased, such as the number of MEMO, the count of UNDERLINE and the count HIGHLIGHT. According to the analysis result, we found some interesting patterns, such as the students who always use the MEMO or MAKERS function, who always get better GA.