Abstract
A typical medical curriculum is organized as a hierarchy of learning outcomes (LOs), each LO is a short text that describes a medical concept. Machine learning models have been applied to predict relatedness between LOs. These models are trained on examples of LO-relationships annotated by experts. However, medical curricula are periodically reviewed and revised, resulting in changes to the structure and content of LOs. This work addresses the problem of model adaptation under curriculum drift. First, we propose heuristics to generate reliable annotations for the revised curriculum, thus eliminating dependence on expert annotations. Second, starting with a model pre-trained on the old curriculum, we inject a task-specific transformation layer to capture nuances of the revised curriculum. Our approach makes significant progress towards reaching human-level performance.
S. Mondal and T. I. Dhamecha—Contributed equally.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The LO-relationship extraction task, recently introduced in [8], seeks to predict the degree of relatedness between learning outcomes (LOs) in a curriculum. The authors examine the curriculum of the Lee Kong Chian School of Medicine, which spans five years of education and covers about 4000 LOs; each LO is a short statement describing a concept that students are expected to master. A hierarchy, designed by curriculum experts, groups these LOs at different levels of granularity. A successful clinical encounter requires students to conceptually relate and marshal knowledge gained from several LOs, spread across years and across distant parts of the curriculum hierarchy. This underscores the need for an automatic LO-relationship extraction tool (hereafter called LReT).
In our earlier work [8], this is abstracted as a classification task, where a pair of LOs is categorized as being strongly related (high degree of conceptual similarity), weakly related (intermediate conceptual similarity), or unrelated (no conceptual similarity). An LReT is trained on annotated data obtained from subject matter experts (SMEs), who are both faculty and doctors.
However, this curriculum is periodically reviewed and revised. Modifications are made to both content (emphasising some LOs, dropping others, merging a few), as well as organization (grouping LOs differently, re-evaluating classroom hours dedicated to each). Table 1 compares an old LO with its revised counterpart. Note that the textual formulation (hence underlying concept) of the LO has been modified. Additionally, the LO has been re-grouped under a separate set of verticals - Longitudinal Course, Module, and Assessment Type, while doing away with Clinical Block, the only vertical in the previous version.
As the curriculum drifts, so do relationships between its constituent LOs. An LReT trained on one version of the curriculum may not perform well on the revised version. Re-obtaining SME annotations carries appreciable cognitive and cost overheads, making it impractical to train an LReT from scratch.
We present a systematic approach towards LO-relationship extraction under curriculum drift. Beginning with the SME-labelled dataset on the old curriculum, we employ heuristics to create a pseudo-labelled dataset for the revised curriculum. With some supervision now available, we tune the existing pre-trained model to the nuances of the revised curriculum, and compare its efficacy against human performance.
This aligns with existing work on domain adaptation and transfer learning [6, 10]; both study scenarios where training and test data do not derive from the same distribution. In contrast, not only do we adapt the model to a modified domain, but also generate data pertinent to this domain, thus eliminating the need for human intervention. This bridges the gap between building a reliable LReT, and deploying it against a changing curriculum landscape.
2 Silver Standard Dataset Generation
Starting with SME-annotated old LO pairs, which serves as the gold-standard dataset, we proceed in two steps. First, we define a mapping that links an LO from the old curriculum (OC) to its closest matching counterpart in the revised curriculum (RC):
where sim is an appropriate semantic textual similarity metric. Intuitively, the mapping score, sim(p, M(p)) captures the extent of semantic drift in the content of an LO.
(A) base-model trained on gold-standard data from OC. (B) Model trained from Scratch on silver-standard data from RC. (C) Manually map features (MF) from RC to OC, and then use base-model. (D) Learn a feature transform (FT) from RC that approximates OC-like features by leveraging weak correspondence between RC and OC. The base-model can be further smoothed (FT-S).
Thereafter, we rely on pruning. Recall that the gold-standard dataset (\(\mathcal {D}_{old}\)) consists of old LO pairs (p, q), along with an SME-annotated class label. A silver-standard dataset for the revised curriculum (\(\mathcal {D}_{rev}\)) is derived by pruning the mapping scores of an old LO pair at a pre-defined threshold (\(\tau \)), while retaining its class label. Formally,
Effectively, we propagate the SME-label from a LO pair in old curriculum to their corresponding maps in the revised curriculum, only if the both mapping scores exceed the threshold. These pseudo-labeled instances constitute the silver-standard dataset.
3 Proposed Model Adaptation Approaches
The base-model (Fig. 1(A)), trained on gold-standard LO pairs of the old curriculum, predicts posterior probabilities for Strong, Weak, and None classes. As a comparative baseline, we train a model from scratch on the silver-standard dataset, without leveraging the base-model. We then explore three approaches to adapt base-model:
-
1. Manual Feature Mapping (MF), where we manually map features from the revised curriculum to the old curriculum, and drop features that cannot be mapped (Fig. 1(C)). The resultant feature set can be fed to the base-model for predicting LO relatedness in new curriculum.
-
2. Feature Transformation (FT): In this novel approach (Fig. 1(D)), we inject a fully connected layer that transforms the revised feature set to an approximate old feature set, which can then be fed to the base-model. The silver standard dataset is leveraged to train only this transformation layer, i.e. base-model layers are frozen.
-
3. Feature Transformation with Smoothing (FT-S): Once the transformation weights converge to an extent, we unfreeze the base-model parameters and train for a few epochs to allow fine-grained updates to the entire network.
4 Experiments and Analysis
Table 2a compares model adaptation techniques outlined in Sect. 3. All approaches that leverage the base-model outperform training from Scratch, to various degrees. Feature transformation with smoothing (FT-S) yields the highest macro-F1, thus establishing that a) the base-model encodes some task-specific information independent of the specific curriculum, b) the revised feature-set can be adequately modeled as a linear transformation of the old feature-set, and c) additional smoothing over parameters of the base-model allows it to learn curriculum-specific nuances.
Furthermore, as shown in Table 2b, the high variance in model performance stems from the small size of training and test sets for each cross-validation split, and the macro-F1 score is sensitive to samples in the specific test split. We perform paired t-test to ascertain that except for two pairs, FT vs MF (p = \(6.8\times 10^{-2}\)) and FT vs FT-S (\(p=6.6\times 10^{-2}\)), differences between all other technique-pairs are statistically significant at 95% confidence interval.
Finally, for a small held-out set (\(n=229\)), we obtain annotations separately from two SMEs and compute the inter-annotator agreement (71.7% macro-F1), which serves as a skyline. As shown in Table 2d, considering one SME as ground-truth and comparing against FT-S’s predictions, the human-machine agreement turns out to be 64.4%. Compared to human performance, our reported results are moderately high, with, of course, some further scope of improvement.
References
Bjerva, J., Kouw, W., Augenstein, I.: Back to the future-sequential alignment of text representations. arXiv preprint arXiv:1909.03464 (2019)
Chan, J., Bailey, J., Leckie, C.: Discovering correlated spatio-temporal changes in evolving graphs. Knowl. Inf. Syst. 16(1), 53–96 (2008)
Chen, Y., Wuillemin, P.H., Labat, J.M.: Discovering prerequisite structure of skills through probabilistic association rules mining. International Educational Data Mining Society (2015)
Gravemeijer, K., Rampal, A.: Mathematics curriculum development. In: Cho, S.J. (ed.) The Proceedings of the 12th International Congress on Mathematical Education, pp. 549–555. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-12688-3_57
Käser, T., Klingler, S., Schwing, A.G., Gross, M.: Beyond knowledge tracing: modeling skill topologies with bayesian networks. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 188–198. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_23
Kouw, W.M., Loog, M.: A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Kumar, I., Balakrishnan, S.: Beyond basic: a temporal study of curriculum changes in a first-year communication course. Int. J. Res. Bus. Stud. 4, 14 (2019). ISSN 2455–2992
Mondal, S., et al.: Learning outcomes and their relatedness in a medical curriculum. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 402–411 (2019)
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing (2013)
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766 (2007)
Reis, S.: Curriculum reform: why? what? how? and how will we know it works? Isr. J. Health Policy Res. 7, 30 (2018). https://doi.org/10.1186/s13584-018-0221-4
Stankov, S., Rosić, M., Žitko, B., Grubišić, A.: Tex-sys model for building intelligent tutoring systems. Comput. Educ. 51(3), 1017–1036 (2008)
Zouaq, A., Nkambou, R.: Building domain ontologies from text for educational purposes. IEEE Trans. Learn. Technol. 1(1), 49–62 (2008)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mondal, S. et al. (2020). Learning Outcomes and Their Relatedness Under Curriculum Drift. In: Bittencourt, I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science(), vol 12164. Springer, Cham. https://doi.org/10.1007/978-3-030-52240-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-52240-7_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52239-1
Online ISBN: 978-3-030-52240-7
eBook Packages: Computer ScienceComputer Science (R0)