Web metadata extraction and semantic indexing for learning objects extraction

Published: 05 June 2014

Volume 41, pages 649–664, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

John Atkinson¹,
Andrea Gonzalez¹,
Mauricio Munoz¹ &
…
Hernan Astudillo²

527 Accesses
9 Citations
Explore all metrics

Abstract

Secondary-school teachers are in constant need of finding relevant digital resources to support specific didactic goals. Unfortunately, generic search engines do not allow them to identify learning objects among semi-structured candidate educational resources, much less retrieve them by teaching goals. This article describes a multi-strategy approach for semantically guided extraction, indexing and search of educational metadata; it combines machine learning, concept analysis, and corpus-based natural language processing techniques. The overall model was validated by comparing extracted metadata against standard search methods and heuristic-based techniques for Classification Accuracy and Metadata Quality (as evaluated by actual teachers), yielding promising results and showing that this semantically guided metadata extraction can effectively enhance access and use of educational digital material.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Similar content being viewed by others

Web Metadata Extraction and Semantic Indexing for Learning Objects Extraction

Chapter © 2013

Enhanced metadata modelling and extraction methods to acquire contextual pedagogical information from e-learning contents for personalised learning systems

Article 17 April 2021

A Semantic Web-Based Framework for Information Retrieval in E-Learning Systems

Chapter © 2018

Notes

References

Almpanidis G, Kotropoulos C, Pitas I (2007) Combining text and link analysis for focused crawling. Inf Syst 32(5):886–908
Article Google Scholar
Alpaydin E (2004) Introduction to machine learning. The MIT Press
Baldi P, Frasconi P, Smyth P (2003) Modeling the internet and the web. Wiley
Bauer M, Maier R, Thalmann P (2010) Metadata generation for learning objects: an experimental comparison of automatic and collaborative solutions. e-Learning pp 181–195
Bhatia S, Mitra P (2012) Summarizing figures, tables, and algorithms in scientific publications to augment search results. In: ACM Transactions on Information Systems (TOIS), vol 1, pp 45–49
Bhattacharya I, Godbole S, Joshi S (2008) Structured entity identification and document categorization: two tasks with one joint model. Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. Las Vegas, Nevada, USA, 25–33
Bolettieri P, Falchi F, Gennaro C, Rabitti F (2007) Automatic metadata extraction and indexing for reusing e-learning multimedia objects. In: Workshop on multimedia information retrieval on The many faces of multimedia semantics, ACM, New York, NY, USA, pp 21–28
Chatti M, Muhammad N, Jarke M (2008) Aloa: a web services driven framework for automatic learning object annotation. Times of convergence technologies across learning contexts, pp 86–91
Cherfi H, Napoli A, Toussaint Y (2004) Knowledge-based selection of association rules for text mining. 16th European Conference on Artificial Intelligence - ECAI’04 (Valencia Spain) 24:485–489
Contreras J, Mendoza M, Becerra C, Astudillo H (2010) Enhancing learning objects metadata improvement with indexing and categorization. In: LACLO 2010, 5th Latin American Conference on Learning Objects. Sao Paulo, Brazil, pp 1–1
Day M, Tsai R, Sung C, Hsieh C, Lee C, Wu S, Wu K (2007) Reference metadata extraction using a hierarchical knowledge representation framework. Decis Support Syst 43(1):152–167
Article Google Scholar
Edvardsen L, Sølvberg I, Aalberg T, Trætteberg H (2009) Automatically generating high quality metadata by analyzing the document code of common file types. In: Proceedings of the 9th ACM/IEEE-CS joint conference on digital libraries. ACM, pp 29–38
Flynn P, Zhou L, Maly K, Zeil S, Zubair M (2007) Automated template-based metadata extraction architecture. In: ICADL (LNCS 4822). Springer, Berlin, pp 327–336
Gauch S, Wang Q (2009) Ontology-based focused crawling. International conference on information, process, and knowledge management, pp 123–128
Golub K, Ardo A (2005) Importance of html structural elements and metadata in automated subject classification. In: Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), vol 3652, pp 368–378
Greenberg J (2004) Metadata extraction and harvesting. J Internet Cat, pp 59–82
Guo Z, Jin H (2011) Reference metadata extraction from scientific papers. In: 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp 45–49
Hu Y, Li H, Cao Y, Teng L, Meyerzon D, Zheng R (2006) Automatic extraction of titles from general documents using machine learning. Inf Process Manag pp 1276–1293
Huynh T, Hoang K (2010) Gate framework based metadata extraction from scientific papers. In: International Conference on Education and Management Technology (ICEMT), pp 188–191
Jain S, Pareek J (2009) Keyphrase extraction tool (ket) for semantic metadata annotation of learning materials. International Conference on Signal Processing Systems, Singapore
Jain S, Pareek J (2010) Automatic topic(s) identification from learning material: an ontological approach. Second International Conference on Computer Engineering and Applications, Indonesia
Google Scholar
Jin H, Chen H (2008) Semrex: efficient search in a semantic overlay for literature retrieval. Futur Gener Comput Syst 24(6):475–488
Article Google Scholar
Jurafsky D, Martin J (2009) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition, 2nd edn. Prentice Hall
Kovacevic M (2005) Visual adjacency multigraphs-a novel approach to web page classification. Proceedings of SAWM04 workshop, ECML2004
Landauer T, McNamara D, Dennis S, Kintsch W (2007) Handbook of latent semantic analysis (University of Colorado Institute of Cognitive Science Series). Lawrence Erlbaum Associates
Lehmann L, Hildebrandt T, Rensing C, Steinmetz R (2008) Capture, management, and utilization of lifecycle information for learning resources. IEEE Trans Learn Technol 1(1):75–87
Article Google Scholar
Lu X, Kataria S, Brouwer W, Wang J, Mitra P, Giles C (2009) Automated analysis of images in documents for intelligent document search. In: IJDAR, 2, pp 65–81
Manning C, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press
Marinai S (2009) Metadata extraction from pdf papers for digital library ingest. 10th International Conference on Document Analysis and Recognition
Meire M, Ochoa X, Duval E (2007) Samgi: automatic metadata generation v2. 0. In: Proceedings of world conference on educational multimedia. Hypermedia and Telecommunications, vol 2007, pp 1195–1204
Nugent G, Kupzyk K, Riley S, Miller L (2009) Empirical usage metadata in learning objects. 39th ASEE/IEEE Frontiers in Education Conference, San Antonio, TX, USA
Ojokoh B, Adewale O, Falaki S (2009) Automated document metadata extraction. In: Journal of Information Science, pp 563–570
Olson D, Delen D (2008) Advanced data mining techniques. Springer
Park J, Lu C (2009) Application of semi-automatic metadata generation in libraries: types, tools, and techniques. Libr Inf Sci Res 31:225–231
Article Google Scholar
Ping L (2009) Towards combining web classification and web information extraction: a case study. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. Paris, France, pp 1235–1244
Ray S, Mitra P, Kirk A, Szep S, Pellegrino D (2013) Figure metadata extraction from digital documents. 12th International Conference on Document Analysis and Recognition
Sen A (2004) Metadata management: past, present and future. Decis Support Syst 37(1):151–173
Article Google Scholar
Wu C, Marchese M, Jiang J, Ivanyukovich A, Liang Y (2007) Machine learning-based keywords extraction for scientific literature. J UCS 13(10):1471–1483
Google Scholar
Xiong Y, Luo P, Zhao Y, Lin F (2009) Ofcourse: web content discovery, classification and information extraction. The 18th ACM Conference on Information and Knowledge Management, Hong Kong

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, Universidad de Concepcion, Concepcion, Chile
John Atkinson, Andrea Gonzalez & Mauricio Munoz
Department of Informatics, Universidad Tecnica Federico Santa Maria, Valparaiso, Chile
Hernan Astudillo

Authors

John Atkinson
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Munoz
View author publications
You can also search for this author in PubMed Google Scholar
Hernan Astudillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Atkinson.

Additional information

This research was supported by FONDECYT (Chile) under grant number 1130035, and project grant Basal FB0821 CCTVal (Chile)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Atkinson, J., Gonzalez, A., Munoz, M. et al. Web metadata extraction and semantic indexing for learning objects extraction. Appl Intell 41, 649–664 (2014). https://doi.org/10.1007/s10489-014-0557-6

Download citation

Published: 05 June 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10489-014-0557-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions