Information extraction from syllabi for academic e-Advising
Introduction
The globalization of society, increasing migration between countries, and the popularization of international study, student exchanges, and adult learning has dramatically increased the work load of academic advisors and academic credential evaluation services who offer learners advice on furthering their education. Issues constantly dealt with by academic advisors include identification of equivalent courses between institutions and allocation of transfer credits, recommendation of courses for continued study based on a learner’s academic history, and international degree recognition based on program similarity. These tedious tasks involve intense analysis of learning objects (LOs) such as university programs, academic calendars, course outlines (syllabi), transcripts, and other academic credentials. While the advent of e-Learning and the use of the Internet as an information delivery tool has given academic advisors (and students) the facility to access many LOs on the Web, academic advising remains a time consuming and cumbersome undertaking; however, recent breakthroughs in knowledge engineering and the Semantic Web have uncovered exciting new prospects, making the concept of a semi-automatic academic e-Advisor a reality.
The proposed e-Advising system is a conceptual expert system for continued learning that is intended to automate the process of transferring course credits between institutions and to recommend courses for further study. Such a system would use a learner’s academic history (based on transcripts and other records) and university profiles (based on academic calendars and syllabi) to semi-automatically determine equivalent courses between institutions and suggest the best solution for continuation of study.
The initial idea of academic e-Advising and the drive to automate the process was proposed by Kamarthi, Valbuena, Velou, Kumara, and Enscore (1992), but the described expert system, ADVISOR, assumed the existence of an extensive internal course database containing course descriptions, schedules, prerequisites, corequisites, substitutions, credits, and weights for programs at different institutions. Presently, no such database exists and creating one with an entry for every course in every program offered by every institution would be a long, tedious process; however, the information needed to populate such a database is readily available on most institutions’ websites, in the form of an academic calendar and course syllabi. Therefore, if these existing materials could be used to automatically create a multi-institution course database, the prospect of making a semi-automatic academic e-Advisor could be realized.
This paper presents an approach to extracting information from LOs with the goal of automatically building a course database. More specifically, this paper describes the course outline data extractor (CODE) application, a tool capable of automatically transforming syllabi from semi-structured, human-readable HTML stored on an institution’s website to structured, machine-readable XML (Biletskiy & Scribner, 2005). Many of the information extraction (IE) and classification techniques described in the paper could easily be adapted to automatically extract from academic calendars, transcripts, other LOs, or documents from entirely different domains.
Section 2 presents an overview of the proposed e-Advising system and other fundamental background information needed for an understanding of the rest of the paper. Section 3 describes the work related to this paper, including potential applications of the CODE approach in other domains. Section 4 describes the CODE approach and methodologies. Section 5 presents details of the HTML to XML conversion. Section 6 evaluates the application and Section 7 concludes the paper and discusses potential future work.
Section snippets
e-Advising and background information
This section will describe the e-Advising system and the role played by the CODE application within this system. Additional background information will be presented as well.
The following scenario details one intended use of the semi-automatic e-Advisor:
- (1)
A learner provides a transcript describing his/her educational background. The courses listed in the transcript are used as references to corresponding course outlines (and/or calendar descriptions).
- (2)
An academic advisor provides a desired target
Applications and related work
Although CODE is described in the context of the proposed e-Advising expert system, there are other possible applications for its approach. Given the diversity of institutions and rapid growth of globalization, the recognition of international degrees has become an important issue. Automating this task is similar to e-Advising and requires extraction and analysis of information from learning objects, such as transcripts, university calendars, and syllabi. Using the CODE tool as a core
The CODE approach
This section gives a high level overview of the approach employed in the CODE application to extract information from semi-structured HTML documents and store the information in a machine-readable XML representation (shown in Fig. 2). The inputs to the application are a semi-structured course outline in the form of an HTML file, a predefined XML template, and several libraries of patterns and key terms. The HTML to XML logic parses the HTML file into a document object model (DOM) then applies a
HTML to XML conversion
The HTML to XML conversion procedure developed for the CODE system consists of four major phases:
- (1)
Preprocessing.
- (2)
HTML parsing and DOM building.
- (3)
Information extraction.
- (4)
Sub-domain classification.
Evaluation
The success of the CODE application was evaluated for 50 HTML course outlines taken from University of New Brunswick, the University of Waterloo, and the Massachusetts Institute of Technology in the domains of Computer Science, Electrical Engineering, Computer Engineering, and Software Engineering. It should be noted that the goal of the CODE application is not to extract all the information from a course outline, but to accurately capture the most important content and metadata, as specified
Conclusion and future work
The work presented in this paper described an approach to extracting information from HTML course outlines and storing it in machine-readable XML for use in a course database for the proposed semi-automatic academic e-Advisor. An extensible and expandable application called CODE (course outline data extractor) was implemented and evaluated. The code application parses the HTML document into a DOM and applies a series of IE and classification methods that make use of a finite number of key terms
Acknowledgements
We would like to thank UNB students Tim Scribner and Martin Dames for their significant technical contribution and NSERC, NBIF, and UNB for funding the project.
References (19)
- et al.
An adaptive scheduling system with genetic algorithms for arranging employee training programs
Expert Systems with Applications
(2007) - et al.
ADVISOR—An expert system for the selection of courses
Expert Systems with Applications
(1992) - Biletskiy, Y., & Scribner, T. (2005). Conversion of learning objects to meaningful XML. In Proceedings of the 8th...
- et al.
Building ontologies for interoperability among learning objects and learners
Lecture Notes in Computer Science
(2004) - et al.
A match-making system for learners and learning objects
International Journal of Interactive Technology and Smart Education
(2005) - Cohen, W., Hurst, M., & Jensen, L. S. (2002). A flexible learning system for wrapping tables and lists in HTML...
- Dames, M., & Biletskiy, Y. (2006). An extensible text extraction tool for learning objects. In Proceedings of the 8th...
Interoperability and learning objects: An overview of e-learning standardization
Interdisciplinary Journal of Knowledge and Learning Objects
(2005)- Gupta, S., Kaiser, G., Neistadt, D., & Grimm, P. (2003). DOM-based content extraction of HTML documents. In Proceedings...
Cited by (20)
Enabling successful Collaboration 2.0: A REST-based Web Service and Web 2.0 technology oriented information platform for collaborative product development
2012, Computers in IndustryCitation Excerpt :Document Object Model (DOM): allows programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page [37]. XML-based 3D Models: are 3D objects formatted by X3D.
A semantic approach to expert system for e-Assessment of credentials and competencies
2010, Expert Systems with ApplicationsCitation Excerpt :Another important application of e-Assessment is to use it for comparative selection of a program for continuing learning based on personal. These two improvements of the present e-Assessment expert system are to be done through integration of two other applications into the system: “Information extraction from Syllabi for e-Advising” (Biletskiy, Brown, & Ranganathan, 2008) and “An adjustable personalized search and delivery of learning objects” (Biletskiy, Baghi, et al., 2008). The paper presented advances in Prior Learning Assessment and Recognition (PLAR), in particular (semi) automatic electronic assessment (e-Assessment) of diverse credentials and competencies.
Robo academic advisor: Can chatbots and artificial intelligence replace human interaction?
2024, Contemporary Educational TechnologyA Career Focused Online and Autonomous e-Advising System for Computer Science Learners
2023, Proceedings - 2023 IEEE International Conference on Advanced Learning Technologies, ICALT 2023A student advising system using association rule mining
2021, International Journal of Web-Based Learning and Teaching TechnologiesA Systematic Review of Current Trends in Web Content Mining
2019, Journal of Physics: Conference Series