Information extraction from syllabi for academic e-Advising

https://doi.org/10.1016/j.eswa.2008.05.011Get rights and content

Abstract

Creating an academic e-Advisor to automate the process of transferring course credits between institutions and recommend courses for further study requires an extensive database of course information. This paper presents an application for creating such a database by automatically extracting relevant information from HTML course outlines stored on an institution’s website and storing it in machine-readable XML. The developed application, called CODE (course outline data extractor), parses a course outline based on its HTML tags and content to build a document object model then applies a combination of web mining, natural language processing, and pattern recognition techniques to automatically classify and extract content useful for the semi-automatic e-Advisor and store it as XML. The current implementation is restricted to HTML course outlines, but the concepts can be extended to other formats of learning objects or entirely different domains. The quality of extraction and classification is evaluated for a corpus of syllabi as proof of concept.

Introduction

The globalization of society, increasing migration between countries, and the popularization of international study, student exchanges, and adult learning has dramatically increased the work load of academic advisors and academic credential evaluation services who offer learners advice on furthering their education. Issues constantly dealt with by academic advisors include identification of equivalent courses between institutions and allocation of transfer credits, recommendation of courses for continued study based on a learner’s academic history, and international degree recognition based on program similarity. These tedious tasks involve intense analysis of learning objects (LOs) such as university programs, academic calendars, course outlines (syllabi), transcripts, and other academic credentials. While the advent of e-Learning and the use of the Internet as an information delivery tool has given academic advisors (and students) the facility to access many LOs on the Web, academic advising remains a time consuming and cumbersome undertaking; however, recent breakthroughs in knowledge engineering and the Semantic Web have uncovered exciting new prospects, making the concept of a semi-automatic academic e-Advisor a reality.

The proposed e-Advising system is a conceptual expert system for continued learning that is intended to automate the process of transferring course credits between institutions and to recommend courses for further study. Such a system would use a learner’s academic history (based on transcripts and other records) and university profiles (based on academic calendars and syllabi) to semi-automatically determine equivalent courses between institutions and suggest the best solution for continuation of study.

The initial idea of academic e-Advising and the drive to automate the process was proposed by Kamarthi, Valbuena, Velou, Kumara, and Enscore (1992), but the described expert system, ADVISOR, assumed the existence of an extensive internal course database containing course descriptions, schedules, prerequisites, corequisites, substitutions, credits, and weights for programs at different institutions. Presently, no such database exists and creating one with an entry for every course in every program offered by every institution would be a long, tedious process; however, the information needed to populate such a database is readily available on most institutions’ websites, in the form of an academic calendar and course syllabi. Therefore, if these existing materials could be used to automatically create a multi-institution course database, the prospect of making a semi-automatic academic e-Advisor could be realized.

This paper presents an approach to extracting information from LOs with the goal of automatically building a course database. More specifically, this paper describes the course outline data extractor (CODE) application, a tool capable of automatically transforming syllabi from semi-structured, human-readable HTML stored on an institution’s website to structured, machine-readable XML (Biletskiy & Scribner, 2005). Many of the information extraction (IE) and classification techniques described in the paper could easily be adapted to automatically extract from academic calendars, transcripts, other LOs, or documents from entirely different domains.

Section 2 presents an overview of the proposed e-Advising system and other fundamental background information needed for an understanding of the rest of the paper. Section 3 describes the work related to this paper, including potential applications of the CODE approach in other domains. Section 4 describes the CODE approach and methodologies. Section 5 presents details of the HTML to XML conversion. Section 6 evaluates the application and Section 7 concludes the paper and discusses potential future work.

Section snippets

e-Advising and background information

This section will describe the e-Advising system and the role played by the CODE application within this system. Additional background information will be presented as well.

The following scenario details one intended use of the semi-automatic e-Advisor:

  • (1)

    A learner provides a transcript describing his/her educational background. The courses listed in the transcript are used as references to corresponding course outlines (and/or calendar descriptions).

  • (2)

    An academic advisor provides a desired target

Applications and related work

Although CODE is described in the context of the proposed e-Advising expert system, there are other possible applications for its approach. Given the diversity of institutions and rapid growth of globalization, the recognition of international degrees has become an important issue. Automating this task is similar to e-Advising and requires extraction and analysis of information from learning objects, such as transcripts, university calendars, and syllabi. Using the CODE tool as a core

The CODE approach

This section gives a high level overview of the approach employed in the CODE application to extract information from semi-structured HTML documents and store the information in a machine-readable XML representation (shown in Fig. 2). The inputs to the application are a semi-structured course outline in the form of an HTML file, a predefined XML template, and several libraries of patterns and key terms. The HTML to XML logic parses the HTML file into a document object model (DOM) then applies a

HTML to XML conversion

The HTML to XML conversion procedure developed for the CODE system consists of four major phases:

  • (1)

    Preprocessing.

  • (2)

    HTML parsing and DOM building.

  • (3)

    Information extraction.

  • (4)

    Sub-domain classification.

Details of the conversion process are presented in Fig. 3. The remainder of this section will explain the methods and techniques behind each phase in detail.

Evaluation

The success of the CODE application was evaluated for 50 HTML course outlines taken from University of New Brunswick, the University of Waterloo, and the Massachusetts Institute of Technology in the domains of Computer Science, Electrical Engineering, Computer Engineering, and Software Engineering. It should be noted that the goal of the CODE application is not to extract all the information from a course outline, but to accurately capture the most important content and metadata, as specified

Conclusion and future work

The work presented in this paper described an approach to extracting information from HTML course outlines and storing it in machine-readable XML for use in a course database for the proposed semi-automatic academic e-Advisor. An extensible and expandable application called CODE (course outline data extractor) was implemented and evaluated. The code application parses the HTML document into a DOM and applies a series of IE and classification methods that make use of a finite number of key terms

Acknowledgements

We would like to thank UNB students Tim Scribner and Martin Dames for their significant technical contribution and NSERC, NBIF, and UNB for funding the project.

References (19)

  • Y-S. Juang et al.

    An adaptive scheduling system with genetic algorithms for arranging employee training programs

    Expert Systems with Applications

    (2007)
  • S.V. Kamarthi et al.

    ADVISOR—An expert system for the selection of courses

    Expert Systems with Applications

    (1992)
  • Biletskiy, Y., & Scribner, T. (2005). Conversion of learning objects to meaningful XML. In Proceedings of the 8th...
  • Y. Biletskiy et al.

    Building ontologies for interoperability among learning objects and learners

    Lecture Notes in Computer Science

    (2004)
  • H. Boley et al.

    A match-making system for learners and learning objects

    International Journal of Interactive Technology and Smart Education

    (2005)
  • Cohen, W., Hurst, M., & Jensen, L. S. (2002). A flexible learning system for wrapping tables and lists in HTML...
  • Dames, M., & Biletskiy, Y. (2006). An extensible text extraction tool for learning objects. In Proceedings of the 8th...
  • N. Friesen

    Interoperability and learning objects: An overview of e-learning standardization

    Interdisciplinary Journal of Knowledge and Learning Objects

    (2005)
  • Gupta, S., Kaiser, G., Neistadt, D., & Grimm, P. (2003). DOM-based content extraction of HTML documents. In Proceedings...
There are more references available in the full text version of this article.

Cited by (20)

  • Enabling successful Collaboration 2.0: A REST-based Web Service and Web 2.0 technology oriented information platform for collaborative product development

    2012, Computers in Industry
    Citation Excerpt :

    Document Object Model (DOM): allows programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page [37]. XML-based 3D Models: are 3D objects formatted by X3D.

  • A semantic approach to expert system for e-Assessment of credentials and competencies

    2010, Expert Systems with Applications
    Citation Excerpt :

    Another important application of e-Assessment is to use it for comparative selection of a program for continuing learning based on personal. These two improvements of the present e-Assessment expert system are to be done through integration of two other applications into the system: “Information extraction from Syllabi for e-Advising” (Biletskiy, Brown, & Ranganathan, 2008) and “An adjustable personalized search and delivery of learning objects” (Biletskiy, Baghi, et al., 2008). The paper presented advances in Prior Learning Assessment and Recognition (PLAR), in particular (semi) automatic electronic assessment (e-Assessment) of diverse credentials and competencies.

  • A Career Focused Online and Autonomous e-Advising System for Computer Science Learners

    2023, Proceedings - 2023 IEEE International Conference on Advanced Learning Technologies, ICALT 2023
  • A student advising system using association rule mining

    2021, International Journal of Web-Based Learning and Teaching Technologies
  • A Systematic Review of Current Trends in Web Content Mining

    2019, Journal of Physics: Conference Series
View all citing articles on Scopus
View full text