Toward an Intelligent e-Learning System Using Document Classification Techniques

Yousef Abuzir

doi:10.1515/jisys-2014-0038

Open Access Published by De Gruyter November 4, 2014

Toward an Intelligent e-Learning System Using Document Classification Techniques

Yousef Abuzir

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2014-0038

Abstract

The purpose of this study is to propose and develop an intelligent e-learning system based on advanced document management techniques at Al-Quds Open University (QOU). In this article, we focus on a case using e-mail contents as supplement educational materials at QOU. We describe how the interactive classification system based on concept hierarchy can simplify this task. This system provides the functions to index, classify, and retrieve a collection of e-mail messages based on user profiles. By automatically indexing e-mail messages using our E-mail Classification System for e-Learning, instructors and/or students can easily find their messages and find the topics. The test results of our system evaluation showed that a good classification quality has been achieved, with a precision of 77.4%, recall of 90%, and F-measure of 82.8.

Keywords: E-mail classification; concept hierarchy; e-learning; classification; WCCAIS2014

1 Introduction

As information and communication technologies grow rapidly, new necessities and expectations for new services and applications arise. These applications will help develop Internet-based applications for training and self-learning skills and teaching and learning methodologies. Such applications provide the education sector with a great number of various services, including e-learning, e-education, computer-based learning/training, virtual laboratory, interactivity, Internet-based telemedicine system, computer-supported collaborative learning, and so on.

This article concentrates on the use of e-mail materials as supplements in traditional education (conventional education), open education, and distance learning. The rapid development of the information and communication technologies and their influence on the whole society also considerably influence the methods, content, and form of teaching. The existence of the Internet has changed the way of teaching too. Modern network technology has extended the students’ possibilities and has given them unprecedented and immediate access to up-to-date information from all over the world.

One reason for classifying is for scientific investigation – that of aiding researchers, students, and trainers in organizing known facts about a particular subject under investigation in order to better understand the structure of the subject and thus pinpoint areas for further research. More generally, classification has been described as “a theory of the structure of knowledge”.

In this article, we focus on a case study involving the classification of e-mail correspondence sent by the students and instructor of the 3D Spaces Projects Development, a course at Al-Quds Open University (QOU), so that the students can easily locate information within his/her e-mail messages. In the experiment, five root terms or concepts were constructed from the user manual, and each new term was evaluated by structured criteria.

The article is structured as follows: Section 2 presents an overview of previous work. In Section 3, we give an overview of the E-mail Classification System for e-Learning (ECSL). In Section 4, we describe how concept hierarchy has been created and extended to support the indexing and classification functions of e-mail messages in order to support student learning. In Section 5, E-mail Classification, we discuss the basic functionality of e-mail classification and the different approaches to organize e-mail messages. Section 6 shows the experiment and evaluation using the case study at QOU. We test the indexing and classification functions using a sample collection of e-mail messages. The last section presents the conclusions.

2 Literature Survey

Many research deal with the problem of classifying e-mail into folders [5, 8, 15, 16, 21, 25], which consists of the classification of incoming mail into the different folders previously created by the user. This task has received less attention in the literature due to the large number of folders and lack of balance of documents per class.

Recently, a trend has developed in many educational institutions to offer complete e-mail courses or to use e-mail to enhance teaching [6, 13, 14, 20, 22–24].

Koprinska et al. [15] studied supervised and semi-supervised classification of e-mail filing into folders and spam e-mail filtering, which are the two tasks considered in the study. His study shows that a random forest is a good choice for these tasks because it runs fast on large and high-dimensional databases, is easy to tune, and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines (SVMs), and naive Bayes.

Bekkerman et al. [5] has proposed the step-incremental time-based split that provides a realistic evaluation setup and allows the examination of the statistical significance of the results of classification into folders. Four classifiers have applied maximum entropy (MaxEnt), naive Bayes, SVM, and a new version of Winnow for the e-mail foldering task. Comparing the result of these four classification techniques on e-mail foldering shows that a fast and simple-to-implement Winnow classifier performs not worse and sometimes better, albeit insignificantly, than the more popular and more complex-to-implement SVM and MaxEnt methods. With respect to problem difficulty, Bekkerman et al. shows that the obtained foldering accuracies are relatively low: in 9 of 14 cases, they are below 70%. There is much room to improve the baseline. Sophisticated methods should been applied to the e-mail foldering task.

The work of Koren et al. [16] provides a method for generating a lighter form of folders, or tags, benefiting even the most passive users. The method automatically associates, whenever possible, an appropriate semantic tag with a given e-mail. Such an approach gives rise to an alternate mechanism for organizing and searching e-mail. Efficiency is achieved by working within a low-dimensional latent space and using a novel hierarchical classifier. Precision level is controlled by separating the task into a two-phase classification process. The results are encouraging and compare favorably with alternative approaches. Their method successfully tags 72% of incoming e-mail traffic.

Information bottleneck (IB) used to cluster the keywords based on their distribution on different class labels, so Wang et al. [25] use threads and address groups as additional features to e-mail texts and the MaxEnt model to improve the accuracy of the classifier. Experimental results show that these measures can improve the classifier’s performance because keywords change too rapidly in e-mails, whereas address groups are much steadier.

The intention of ChanMin’s research [6] was to provide practitioners and researchers with guidelines for the design and development of effective e-mail to improve classroom-based learning situations. Thus, based on the review of previous studies, their article investigated the characteristics of e-mail and the advantages of its use as well as design aspects to optimize e-mail use for the support of e-learning.

Rezanur Rahman et al. [20] present a survey carried out to understand the present status of Internet knowledge among learners and their views for possible introduction of e-mail communication as supporting tool for learning.

Joshi and Saxena [14] tries to analyze the e-mail responses of the people who, by surfing either IGNOU’s website or other sources, came to know about the various programs of studies offered by the School of Education.

Another study [13] examines the nature of feedback in education, discusses technology implementation issues of e-mail as a feedback and communication tool, and provides a list of suggestions for incorporating e-mail into the classroom to make the most of the medium’s relative advantages.

Vivian and Timothy [24] examined professor–student e-mail communications, interpersonal relationships, and teaching evaluations. Several findings have been gleaned. One of the findings is that academic task was the most frequent e-mail topic, whereas social relationship is a less frequent topic between professors and students. E-mail communication contributed positively to both professor–student relationship and teaching evaluation.

3 Structure of the ECSL

In this section, we present the main components and structure of our tool ECSL. The ECSL architecture consists of four layers, which are the student interface layer, the document processing layer [the advanced document management (ADM) layer], the information retrieval (IR) layer [2], and the persistent databases layer (Figure 1).

Figure 1

ECSL Architecture.

The concept of the ECSL document management system is shown in Figure 1. Electronic documents (e-mails) are imported into the system via the indexing engine in the ADM layer. This process automatically generates various index terms such as keywords, concepts, and relations, which are assigned to documents as a function of the e-mail content.

The indexing and retrieval processes being used in ECSL are based on one part on the hierarchical structure of the concept hierarchy database. Concept hierarchy is used to create associations between documents and concepts. The use of concept hierarchy therefore implies that the associations can also be expressed by terms not explicitly present in the analyzed e-mails. Such terms, which we call concepts, refer to broader terms of keywords that are found in the e-mails or that represent the main idea of the e-mail message.

One of the most important aspects of the ECSL is concept hierarchy database maintenance. The indexing process usually identifies in documents not only keywords, but also high-frequency terms that are not present in concept hierarchy. Despite their relevancy as index terms, those terms are not indicated as keywords because of their absence in concept hierarchy. To incorporate those terms in the database, an additional iteration is needed, which is an update of concept hierarchy with those high-frequency terms. This iteration needs to be followed by re-indexing of documents to maintain the consistency of the index and concept hierarchy databases.

4 Construction of Concept Hierarchy

A concept hierarchy is a set of concepts and relations between those concepts. Concept hierarchies have been used to organize and access information [1]. A concept hierarchy includes the concepts presented in a certain domain and the relations among those concepts. Concepts represent the topics of interests in the domain, and relations determine how to organize the concepts into a hierarchy. In some situations, task- and user-specific concept hierarchies are necessary to allow an overview and easy access of a large set of documents [4, 7].

We build concept hierarchy to facilitate the organization and navigation of large collections of information objects by creating meta-level data that reflect their underlying concepts and relationships. We used concept hierarchy to promote existing content-focused learning services to semantic-aware and personalized learning services. The underlying structure of concepts and relations may be expressed by domain ontologies or by other modeling formalisms, e.g., classifications or schemas, relational or object-oriented schemas, subject categorizations, indices, thesauri.

In our case, the structure of the knowledge of the three-dimensional (3D) design domain shows the explicit modeling of concept hierarchy and present a simple navigation in educational resources. It represents the terminology used in the 3D course and allows each student to view the terminology in areas that reflect the interest, needs, and abilities of the student.

For the description of 3D design course environment and its information and knowledge resources, two kinds of concept hierarchies are needed:

domain concept hierarchy, which is a course concept hierarchy that defines the structure of the course;
user profile concept hierarchy, which conceptualizes individual (student) interest and disciplines.

Domain concept hierarchy describes terms and their relations that are valid in a 3D design course. For each field, a different concept hierarchy is demanded. The usefulness of the concept hierarchy lies on the fact that they can help structure course contents logically according to the structures of disciplines. For students, it is highly important to gain an insight into the terminology of the studied area. If the terminology of concept hierarchy is used for defining keywords or subject descriptors of educational resources while searching repository of resources, students internalize the discipline terminology in a natural way.

User profiles describe a concept hierarchy construct, and to support a single student, it needs to adapt to the student’s personal preferences. The concept hierarchy of user profiles serves two goals: the first is to organize information into concept hierarchies and the second is to customize the concept hierarchies in based of the student’s preference.

For a better explanation of our proposed concept hierarchy solution, we developed our system, the ECSL. Our application integrates concept hierarchy and e-mail resources related to the 3D design course. The basic idea has been to use a collection of e-mail messages as a source of update terms for the concept hierarchy and student profiles. We used the ECSL to identify important terms as well as their significant relationships. The following subsections describe the iterative approach that consists of subsequently parsing collection of e-mail messages and inserting new terms and the relationships into the concept hierarchy database.

4.1 Methodology

There are different strategies used when constructing concept hierarchy. Some start by listing a set of concepts, whereas others go directly to placing a root concept and start linking other concepts from it based on pattern discovery. The method is a semi-automatic methodology that builds concept hierarchy database from a set of terms extracted from resources. The concept hierarchy-based pattern discovery architecture provides four steps for 3D design concept hierarchy building. These steps include root concepts selection, preprocess, pattern discovery analysis and extraction, term and pattern evaluation, analysis, and comparison evaluation. Figure 2 shows the concept hierarchy-based pattern discovery architecture in our approach.

Root concepts selection phase. The first step is to select the main concepts (the roots) manually.
Parsing phase. This step is to identify the key terms (concepts) that represent our domain. It includes deep parsing of the e-mail messages and the other resources to select keywords and store them in the database. Next, the system filters these lists by removing noise and inconsistent data based on specialized or domain-specific stop list and statistical method.
Pattern discovery and extraction phase. In this step, we extract the tuples such as object₁, relation, object₂ using the natural language processing tool of our ECSL. In our approach, we used the links in the tuples to define the set of the relations in the concept hierarchy. This method employed syntactic analysis to extract knowledge patterns and tried to identify representing knowledge of the patterns, which use lexico-syntactic patterns. It retrieves relevant patterns from the database and transforms the data format for pattern discovery.

In our system, pattern discovery presents an innovative and effective technique that includes the processes of pattern matching and pattern extracting to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information, including terms and relationships between the terms. A sample of patterns used are x also called y, x such as y, x especially y₁, y₂, and y_n, x consist of y, x is made of y, and x is member of y.

Terms and relation (pattern) evaluation phase. This evaluates the target pattern features from source data and their relationship to the terms. The evaluation is based on machine learning algorithm.

Figure 2

An Overview of the Concept Hierarchy Building.

4.2 Main Concepts Selection

The course used in our experiment relates to the 3D design course. Students use a multi-user platform that supports the development of 3D spaces projects. The course focuses on producing various 3D models. It develops the knowledge, skills, and attitudes required to produce 3D models. The documents (manuals) are meant to be a summarization of the types of actions and tools used by students of the course.

The initial stage of concept hierarchy construction was to select the main concepts (root terms) from the main textbook used by the student course outline and table of contents.

For our experiment, the following five main concepts (root terms) of 3D design course have been defined (Figure 3):

user interface
modeling
animation
materials and mapping
rendering

Figure 3

Root Terms – The Main Concepts.

In our approach, we are interested in the classification of the content of e-mail messages. In selecting the root terms, we want to show the main classification set we used in the classification of the content of the mail messages. The reason for this is that the root terms should reflect the classification criteria of content of the e-mail messages.

In our case, we decided to use the five concepts as roots (super concepts) of the concept hierarchy. Concept hierarchy and student profiles are constructed and updated by the method previously described in Section 4.1. For each root term, a set of actions and tools that are associated with the root term are structured. This set actions/tools is referred to as the “classification set” of the user type. A mapping between the new terms and the classification set can be used as the association between the root terms and the new terms. These classification sets are extracted manually from the user manual of 3D Max.

4.3 Concept Hierarchy Derivation

Table 1 shows a sample of the update terms and their frequency. This list is used to update the concept hierarchy database. It is a collection of terms and statistics gathered from our system ECSL when we parsed the 3D Max user manual and other materials related to the course. We used the filtered update list to build or update the concept hierarchy database and student profiles of the students.

Table 1

A Sample List of Update Terms and Their Frequency.

Update term	Frequency	Update term	Frequency
Brushing	9	Project into subtasks	1
Pie charts	2	Remove button	10
Authoring phase	1	Run the Wizard	9
Button	58	Scenario	44
Choose	40	Scenario object	33
Click	80	Scene	49
Click ok	24	Scene object	36
Click start	8	Scene objects	9
3D Plots	3	Select	38
Dialog	45	Selected object	10
Dialog appears	8	Send value	10
Enter	31	Task	51
File	83	Task manager	8
Files	47	Tasks	4
Media geometry graphics	1	Team members	10
Multiple appearance	9	Text box	20
Multiple appearance object	9	Together interaction	1
New	35	Ultima	48
New scenario	10	Ultima database	10
New scenario object	8	Users	3
Object beverages_2	9	Box	1
Objects	3	Value	22
Phase	3	VR	3
Process	3	VR project	1
Process involves	2	Wizard	47
Production process	2	Worldup®	4
Scatterplots	31	Worldup® simulations	2

In our example, terms like “enter”, “value”, “click”, “new”, etc. would be considered as general terms. These terms would not appear in the database. The first step was to eliminate these terms from the list. The list in Table 2 shows the new update list after removing the noisy terms. Later, we used the filtered list to add new terms to the concept hierarchy database and/or student profiles. As an example, the term “scenario” can be added as sub-concept for the root term “animation”. We used the classification set to map this term to its super-concept “animation”: “animation design based on the project scenario according to the project concept”.

Table 2

A Filtered List.

Objects	New scenario	Task
New scenario object	Assign tasks	Task manager
Scatterplots	Authoring phase	Team members
Button	Phase	Text box
3D plots	Process	Users
File	Scenario	VR project
File manager	Scenario object	Wizard
Media	Scene	3D Wizard
Multiple appearance object	Scene object	Box

The terms “new scenario object” and “multiple scenario object” can be considered as narrow terms for “scenario”. Developing a scenario adds a “new scenario object” to the project.

The pattern discovery model in our ECSL can be used to find a lexico-syntactic patterns. All keywords, concepts, and update terms are directly visible. The following two sentences are examples (see Figure 4):

VR applications are assemblies of various media: geometry, graphics, sound, video, and animation.
3D Max also allows users to manage their tasks in the project.

Figure 4

Some Examples of the Patterns.

From these two sentences, common patterns were identified. As we see in the first sentences, the simple pattern “:”

noun:{noun|noun,…}

This pattern shows the natural way to discover the hierarchical relation between terms in a normal text. The terms “geometry”, “graphics”, “sound”, “video”, and “animation” can be considered as narrow terms for media.

The second sentence shows another pattern “in the” that can be used to build the hierarchical relation between terms:

noun inthe noun

Using this technique, some lexico-syntactic patterns are extracted. However, these patterns are too general and need manual constraints. They do not prevent the extraction of pairs of terms that are not linked by the target relation.

The previous paragraphs describe the way in which we build the concept hierarch database. In the following section, we used the new database to parse, index, update, and classify the messages. The hierarchical structure of the concept hierarchy can be used as a tool to classify e-mail messages.

5 E-mail Classification

One reason for classifying is for aiding the student in organizing the known facts about the particular subject under investigation in order to understand better the structure of the subject and thus pinpoint areas for further research. More generally, classification has been described as “a theory of the structure of knowledge”.

“Hierarchies have been used for organization, summarization, and access to information. Hierarchies are an intuitive way to describe information. One can find organizational systems that utilize hierarchies in the Library of Congress, Yahoo, and the personal file cabinet” [9]. Hierarchies have been used in browsing and classifying large collections of electronic document as well for education. Finding different techniques in automatically generating hierarchies would be advantageous [17].

Generating hierarchies is not a new goal for IR, and there have been past attempts using automatic techniques. One example is [10], which automatically generates thesauri. Another example is Scatter/Gather [3], in which clustering is used to create document hierarchies.

Since we would like to use hierarchies to classify or find relevant documents, we use the hierarchal structure of concept hierarchy of the specific domain and user profiles of the students as a demonstration of how they would be used. The interface of ECSL offers a view of the e-mail messages based on user profiles. The interface is designed to ensure user control over the retrieval of the e-mail messages related to his queries based on the main five concepts and the structure of the control vocabulary (represent the virtual classification folders of the students). However, we realized that a case study is required to show the extent to which these types of hierarchies can be suitable for this task.

E-mail was originally intended as a tool for communication between users. E-mail facilitates fast communication through its high speed, reducing the number of telephone calls and providing possibilities for automatic documentation. It is a rich source of quality and up-to-date information. However, users need advanced facilities to store, organize, and retrieve information [12].

Two strategies on the use of e-mail can be identified: prioritizers and archivers. Prioritizers are users interested in using e-mail to maximize efficiency and limit the time spent with mail. Prioritizers deal with the problem of information overload by reducing the size of their mailbox, the number of folders, and the number of the subscription lists. This kind of user tends to use message filtering and automatic processing facilities to reduce the amount of effort and time spent in the management of his/her mailbox [5, 8, 15, 16, 21, 25].

Archivers, meanwhile, use e-mail as an information source and are willing to spend extra time to avoid the possibility of missing something important. They resist inhibiting the flow of incoming messages by filtering to avoid the risk of losing potentially useful information. Moreover, they tend to save most received messages with the assumption that they may be useful (some day), have a large number of folders, and consequently have difficulty finding filed messages. The archiver’s e-mail-handling strategy suggests that e-mail is much more than an efficient communication technology: it is a rich source of quality and up-to-date information for student and other users. In both cases, the large amount of e-mails makes it difficult to identify those that are important, to classify them according to appropriate criteria, and to discard useless messages [11, 18, 19].

For many students who store e-mail messages, organization of these messages is essential to reduce problems with message overview, orientation, and management. What are the strategies for organizing e-mail messages?

Students who use e-mail are rapidly finding themselves in need of some kind of help in structuring and getting a better overview of the information contained in their e-mail messages. Furthermore, they need better ways of information retrieval. The amount of effort required to retrieve relevant information is related to the amount of information stored.

The classification mechanism in our approach is based on classification techniques of concept hierarchy. The ECSL Toolkit parses the e-mail messages and indexes the message using the database of the concept hierarchy. Thereafter, the user can use document search environment to retrieve the related messages. The main interface for document search of ECSL shows the main five concepts (virtual folders) and the relevant messages for each concept or root term. Students can select any message from the list and browse it.

6 An Experiment and Evaluation: A Case Study at QOU

A rapid development of the information and communication technologies and their influence on the whole society also considerably influences the methods, content, and form of teaching. Therefore, information and communication technologies have influenced and changed the way of teaching. The new environment of information and communication technology has offered students a different tool to enhance learning and has given them immediate and easier access to up-to-date information from all over the world.

In this section, we discuss the educational model of ECSL that is based on student profiles and the classification techniques that are based on concept hierarchy. The system builds a flexible self-learning environment for the student and at the same time a collaborative communication environment to support effective and deep understanding among students and trainees.

The ECSL uses e-mail technologies and classification techniques to construct the learning environment. In this environment, students should be able to choose the most convenient learning materials (e-mail messages) in order to learn the contents and subjects they are interested in. This section discusses an online educational model and describes the architecture of the system, function, mechanism, educational communication, and collaboration facilities of this model. Finally, this section shows some preliminary evaluation of the ECSL.

6.1 Educational Model

In this work, our approach uses the structure of concept hierarchy as well as the content of e-mail messages and classification techniques in the process of teaching and learning in open and distance education. The ECSL Toolkit offers an interactive interface for students to view e-mail messages and retrieve other educational materials related to their queries. The system classifies e-mail messages based on students’ interest. The interface is designed to ensure student control over the retrieval of the e-mail messages related to his queries based on the five main concepts, the structure of the control vocabulary, and his needs represented by his queries.

We examine how students use e-mail to communicate with other students and lecturers. In this experiment, we use e-mail for teaching and learning. E-mail has been incorporated in teaching pedagogy across a training course for 3D design modeling. Figure 5 shows the architecture of the teaching and learning system using our system.

Figure 5

Educational Model Using E-mail as Communication and ECSL Connection.

Since our lecture is a laboratory course about concepts and programming of 3D design, learning is highly focused on both self-study and working in teams. Hence, we decided to focus on supporting open and distance learning. During the whole lecture, the system provides asynchronous communication facilities (e-mail). Furthermore, the students’ status IR, organizational information, and content information in the download module (manuals, examples, etc.) are supported.

Our approach considers that the e-mail body should be very useful, as this is the feature selection used in document classification for the education needs.

The ECSL can be used as an archive of course-related materials and resources, facilitating access to it by students and tutors anytime, anywhere. Course-related resources typically include

course information: tutors, groups, reading lists, assessment information;
teaching materials: lecture notes, slides, handouts, short topic tutorials, manuals, and other material used in lectures;
links to related materials on web resources: links to library collections.

The focus of ECSL is the delivery and retrieval of such resources. Interaction with students is generally limited to e-mail messages and other resources indexed by our system.

6.2 Classification System

This article intends to provide a brief introduction on the potential and current use of the e-mail in open and distance education to deliver and support learning and teaching activities over the educational network. The sample consisted of 16 students studying a 3D modeling course. About 80 e-mail messages were indexed by the ECSL indexing engine. This automatically created keywords and concepts.

To retrieve the message, a concept hierarchy database is used. The collection of e-mail messages is used as a source of learning materials for the 3D design course. In general, the stored messages are valuable for a student. The hierarchy of the concepts reflects the interests of the student who is browsing the e-mail messages.

The e-mail classification system consists of the following modules (Figure 6):

ECSL,
e-mail capture system,
e-mail database module.

Figure 6

E-mail Classification System.

The e-mail capture system accesses the e-mail messages and reads these messages using mail reader applications. The e-mail database module manages the e-mail database, which is a collection of meta-data about the e-mail messages.

6.3 Evaluation

Most mail reader applications allow users to file messages into folders (e.g., MS Outlook); in practice, this task tends to be tedious and an additional cognitive effort is required to decide upon an appropriate folder. In our case, the ECSL automatically classifies the e-mail messages. We evaluated this classification method using the ECSL Toolkit and a collection of e-mail messages. The collection includes a list of e-mail messages related to the domain of 3D design. The classification parameter in our experiment was the five main concepts in 3D design modeling systems instead of user-defined folders.

To evaluate our approach, we need to assess how good the automatically constructed concept hierarchy reflects a given domain (course and student). One possibility would be to compute how many of the e-learning resources in the e-mails are automatically classified correctly by our concept hierarchy.

We evaluated our approach on 3D design course materials sent through e-mails to students by their instructor. We used texts and learning materials sent by e-mails acquired from the e-mails as domain-specific text collection for the 3D design domain.

To evaluate our classification approach using our system ECSL, 80 e-mail messages were automatically indexed using the concept hierarchy database. Precision and recall are the basic measures used in IR systems for the evaluation of their performance and their search strategies. Recall is the ratio of the number of relevant messages retrieved to the total number of the relevant messages indexed. Precision is the ratio of the number of relevant messages retrieved to the total number of messages retrieved. F-measure is the harmonic mean of precision and recall. We can define the following notation:

RR=number of relevant messagesRNR=number of relevent messages not retrievedIRR=number of irrelevant messages retrieved

Recall, precision, and F-measure can be calculated using the following equations, respectively:

(1)Recall = RR(RR + RNR) ∗ 100%. (1)

(2)Precision = RR(RR + IRR) ∗ 100%. (2)

(3)F-measure = (Precision∗Recall)(Precision + Recall). (3)

Table 3 shows the result of the classification. The column “Actual relevant messages” shows the real number of messages in our collection related to each user type. The second column shows the number of the messages retrieved relevant to that main concept by ECSL. The third one provides us with the number of actual messages that are relevant to that user type from the retrieved messages. The last two columns show the number of relevant messages not retrieved and the number of irrelevant messages retrieved.

Table 3

A Summary of the Result.

User type	Actual relevant messages	Number of retrieved messages	Relevant messages retrieved RR	Relevant messages not retrieved RNR	Irrelevant messages retrieved IRR	Precision = RR/(RR + IRR)	Recall = RR/(RR + RNR)	F-measure = 2(Precision Recall)/(Precision + Recall)
User interface	32	35	29	3	6	29/(29 + 6) = 83%	29/(29 + 3) = 90%	86
Modeling	11	12	10	1	2	10/(10 + 2) = 83%	10/(10 + 1) = 91%	87
Animation	15	18	13	2	5	13/(13 + 5) = 87%	13/(13 + 2) = 86%	86
Materials and mapping	2	3	2	0	1	2/(2 + 1) = 67%	2/(2 + 0) = 100%	80
Rendering	7	9	6	1	3	6/(6 + 3) = 67%	6/(6 + 1) = 86%	75
	68	78	60	6	17	77.4	90	82.8

We used the previous equations to calculate precision, recall, and F-measure. Table 3 shows the percentage of recall, precision, and F-measure for each main concept as well as the percentage for recall and precision gained using concept hierarchy and student profiles in e-mail classification. Figure 7 shows the result as represented by a graph.

Figure 7

Precision, Recall, and F-Measure.

Concept hierarchy can be used to classify relevant messages and suppress the classification of non-relevant messages using super- and sub-concepts. The test results showed that a good indexing quality has been achieved. Figure 8 shows the sample message retrieved related to annotation.

Figure 8

A Sample Message Retrieved Relevant to Annotation.

The retrieval mechanism offers advanced facilities for text information search and enables us to treat e-mail as an information system. It offers facilities to locate and retrieve messages as well as to locate information contained in messages. The classification mechanism based on IR in our system presented as an alternative and a flexible classification mechanism to the management of e-mail messages. The students can easily find the topic in their e-mails. The system can help the trainers to collect and group different e-mails into topics (i.e., the five main concepts).

7 Conclusion

In any case, concept hierarchy represents an educational structure that should be considered not as an alternative but as an integrated and complementary element of traditional (conventional) learning that can respond to the growing demands of continuing education and of learning in general.

In this work, we have presented classification and retrieval mechanisms intended to aid in the management of learning materials sent by e-mail messages. These mechanisms are particularly targeted to fulfill the need of students, who use e-mail as an up-to-date and essential source of information in their learning activity, for an e-mail management strategy. The retrieval mechanism in ECSL puts forward the foundations enabling e-mail messages to be treated as a real learning management system, where information can be easily be located and retrieved.

Concept hierarchy is useful and effective in knowledge-based indexing and retrieval of e-mail messages in supporting e-learning for students.

This work evaluates the effectiveness of our approach as a communication medium in facilitating meaningful class participation, learning, and communications in open and distance education. The test results showed that a good classification quality has been achieved, with a precision of 77.4%, recall of 90%, and F-measure of 82.8.

The results obtained from this experiment demonstrate the validity of this approach, made possible by concept hierarchy and classification techniques (which are currently leading the field of information technologies), using such an innovative approach we can achieve effectiveness, efficiency, interactivity, and high quality learning methods. Further results and confirmations may be obtained by extending both user profiles and concept hierarchy to which such a methodology can be applied.

Corresponding author: Yousef Abuzir, Computer Information Systems Department, Al-Quds Open University, P.O. Box 1804, Al-Bireh, Palestine, e-mail: yabuzir@qou.edu

References

[1] Y. Abuzir, Deriving concepts hierarchies, in: Proceedings of the CLUK2004 Conference, University of Birmingham, England, January 2004.Search in Google Scholar

[2] Y. Abuzir and V. Fernando, Managing the flow of messages coming to newspaper editor, in: Proceedings of the 2002 International Conference on Artificial Intelligence (IC-AI’02), part of an international multi conferences in computer science, Monte Carlo Resort, Las Vegas, NV, June 24–27, 2002.Search in Google Scholar

[3] Y. Abuzir and F. Vandamme, E-newspaper classification and distribution based on user profiles and thesaurus, in: International Conference on Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet, January 21–27, 2002.Search in Google Scholar

[4] A. L. C. Bazzan and S. Labidi (eds.), Using concept hierarchies in knowledge discovery, in: SBIA 2004, LNAI 3171, pp. 255–265, 2004.10.1007/978-3-540-28645-5_26Search in Google Scholar

[5] R. Bekkerman, A. Mccallum and G. Huang, Automatic categorization of email into folders: benchmark experiments on Enron and SRI corpora, Technical Report IR-418, Computer Science Department, pp. 4–6.Search in Google Scholar

[6] K. ChanMin, Using email to enable e3 (effective, efficient, and engaging) learning, Distance Educ.29 (2008), 187–198.10.1080/01587910802154988Search in Google Scholar

[7] P. Cimiano, A. Hotho and S. Staab, Learning concept hierarchies from text corpora using formal concept analysis, J. Artif. Intell. Res.24 (2005), 305–339.10.1613/jair.1648Search in Google Scholar

[8] E. Crawford, J. Kay and E. McCreath, IEMS – the intelligent email sorter, in: ICML ‘02 Proceedings of the Nineteenth International Conference on Machine Learning, pp. 83–90, 2002.Search in Google Scholar

[9] C. Crouch, A cluster-based approach to thesaurus construction, in: Proceedings on the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 309–320, 1988.10.1145/62437.62467Search in Google Scholar

[10] D. Cutting, D. Karger, J. Pedersen and J. Turkey, Scatter/gather: a cluster-based approach to browsing large document collections, in: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, pp. 318–329, 1992.10.1145/133160.133214Search in Google Scholar

[11] S. N. Ferreira and K. Becker, A query language for retrieving information in electronic mail environments, in: Proceedings of the 11th Brazilian Symposium on Database Systems (SBBD97), Fortaleza, pp. 93–106, October 13–15, 1997.Search in Google Scholar

[12] S. N. Ferreira and K. Becker, Advanced facilities for information classification and retrieval in electronic mail systems (winner of the second prize in the 4th UNESCO-CLEI Thesis on Computer Science Award), in: Proceedings of the XXIII Conferencia Latino Americana de Informática, Valparaiso, Chile, Vol. 2, pp. 1029–1048, November 10–15, 1997.Search in Google Scholar

[13] J. Huett, Email as an educational feedback tool: relative advantages and implementation guidelines, Int. J. Instruct. Technol. Distance Learn.1 (2004), 35–44.Search in Google Scholar

[14] V. Joshi and A. Saxena, Analyzing e-mail communication of prospective learners, Turk. Online J. Distance Educ. TOJDE6 (2005), 32–41.Search in Google Scholar

[15] I. Koprinska, J. Poon, J. Clark and J. Chan, Learning to classify e-mail, Inf. Sci.177 (2007), 2167–2187.10.1016/j.ins.2006.12.005Search in Google Scholar

[16] Y. Koren, E. Liberty, Y. Maarek and R. Sandler, Automatically tagging email by leveraging other users’ folders, in: Proceeding KDD ‘11 Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 913–921, August 2011.10.1145/2020408.2020560Search in Google Scholar

[17] D. Lawrie, W. Bruce Croft and A. Rosenberg, Finding topic words for hierarchical summarization, in: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘01), ACM, New York, NY, pp. 349–357, 2001.10.1145/383952.384022Search in Google Scholar

[18] W. E. Mackay, Diversity in the use of electronic mail: a preliminary inquiry, ACM Trans. Inf. Syst.6 (1988), 380–397.10.1145/58566.58567Search in Google Scholar

[19] B. Olle, Keystroke level analysis of email message organization, in: Proceedings of CHI 2000, pp. 105–112, 2000.Search in Google Scholar

[20] K. M. R. Rezanur Rahman, A. Sadat and S. M. Numan, Enhancing distant learning through email communication: a case of BOU, Turk. Online J. Distance Educ. TOJDE9 (2008), 180–185.Search in Google Scholar

[21] B. Richards, J. Kay and A. Quigley, Activity modeling using email and web page classification, Technical Report 573, July 2005.Search in Google Scholar

[22] M. Robinson, Using email and the Internet in science teaching, J. Inf. Technol. Teacher Educ.3 (1994), 229–238.10.1080/0962029940030209Search in Google Scholar

[23] V. Ph. Vir, Can a course be taught entirely via email? Commun. ACM42 (1999), 29–30.Search in Google Scholar

[24] C. Sh. Vivian and K. F. Timothy, Can email communication enhance professor-student relationship and student evaluation of professor?: some empirical evidence, J. Educ. Comput. Res.37 (2007), 289–306.10.2190/EC.37.3.dSearch in Google Scholar

[25] M. Wang, Y. He and M. Jiang, Text categorization of Enron email corpus based on information bottleneck and maximal entropy, in: IEEE 10th International Conference on Signal Processing (ICSP), pp. 2472–2475, 2010.Search in Google Scholar

Received: 2014-2-24

Published Online: 2014-11-4

Published in Print: 2015-12-1

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Toward an Intelligent e-Learning System Using Document Classification Techniques

Abstract

1 Introduction

2 Literature Survey

3 Structure of the ECSL

4 Construction of Concept Hierarchy

4.1 Methodology

4.2 Main Concepts Selection

4.3 Concept Hierarchy Derivation

5 E-mail Classification

6 An Experiment and Evaluation: A Case Study at QOU

6.1 Educational Model

6.2 Classification System

6.3 Evaluation

7 Conclusion

References

Journal and Issue

Articles in the same Issue