Skip to main content

Content Independent Metadata Production as a Machine Learning Problem

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Abstract

Metadata provide a high-level description of digital library resources and represent the key to enable the discovery and selection of suitable resources. However the growth in size and diversity of digital collections makes manual metadata extraction an expensive task. This paper proposes a new content independent method to automatically generate metadata in order to characterize resources in a given learning objects repository. The key idea is to rely on few existing metadata to learn predictive models of metadata values. The proposed method is content independent and handles resources in different formats: text, image, video, Java applet, etc.

Two classical machine learning approaches are studied in this paper: in the first approach a supervised machine learning technique classify each value of a metadata field to be predicted according to the other a-priori filled metadata fields. The second approach used the FP-Growth algorithm to discover relationships between the different metadata fields as association rules. Experiments on two well-known educational data repositories show that both approaches can enhance metadata extraction and can even fill subjective metadata fields that are difficult to extract from the content of a resource, such as the difficulty of a resource.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liddy, E., Chen, J., Finneran, C., Diekema, A., Harwell, S., Yilmazel, O.: Generating and Evaluating Automatic Metadata for Educational Resources. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 513–514. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Greenberg, J., Robertson, W.D.: Semantic web construction: an inquiry of authors’ views on collaborative metadata generation. In: Proc. of the 2002 Int. Conf. on Dublin Core and Metadata Applications: Metadata for e-communities: Supporting Diversity and Convergence, Dublin Core Metadata Initiative, pp. 45–52 (2002)

    Google Scholar 

  3. Greenberg, J., Spurgin, K., Crystal, A.: Functionalities for automatic metadata generation applications: a survey of metadata experts’ opinions. International Journal of Metadata, Semantics and Ontologies 1, 3–20 (2006)

    Article  Google Scholar 

  4. Greenberg, J.: Metadata extraction and harvesting: A comparison of two automatic metadata generation applications. Journal of Internet Cataloging 6, 59–82 (2004)

    Article  Google Scholar 

  5. Tang, X., Zeng, Q., Cui, T., Wu, Z.: Regular expression-based reference metadata extraction from the web. In: IEEE 2nd Symposium on Web Society (SWS), pp. 346–350 (2010)

    Google Scholar 

  6. Shek, E.C., Yang, J.: Knowledge-based metadata extraction from postscript files. In: Proc. of the 5th ACM Conference on Digital Libraries, pp. 77–84. ACM Press (2000)

    Google Scholar 

  7. Changuel, S., Labroche, N., Bouchon-Meunier, B.: A General Learning Method for Automatic Title Extraction from HTML Pages. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 704–718. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proc. of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2003, pp. 37–48. IEEE Computer Society (2003)

    Google Scholar 

  9. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Record 29, 1–12 (2000)

    Article  Google Scholar 

  10. Duval, E., Forte, E., Cardinaels, K., Verhoeven, B., Van Durm, R., Hendrikx, K., Forte, M.W., Ebel, N., Macowicz, M., Warkentyne, K., Haenni, F.: The Ariadne knowledge pool system. Communications of the ACM 44, 72–78 (2001)

    Article  Google Scholar 

  11. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proc. of the 26th Annual International ACM SIGIR Conf. on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 119–126. ACM, New York (2003)

    Chapter  Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  13. John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proc. of the Eleventh Conf. on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann (1995)

    Google Scholar 

  14. Quinlan, R.: C4.5: Programs for Machine Learning, 1st edn. Morgan Kaufmann (1993)

    Google Scholar 

  15. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  16. Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer (1999)

    Google Scholar 

  17. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  18. Changuel, S., Labroche, N., Bouchon-Meunier, B.: Automatic Web Pages Author Extraction. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 300–311. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Wang, R., Xu, L., Marsland, S., Rayudu, R.: An efficient algorithm for mining frequent closed itemsets in dynamic transaction databases. International Journal of Intelligent Systems Technologies and Applications 4, 313–326 (2008)

    Article  Google Scholar 

  20. Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. In: Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 125–134. ACM Press (1999)

    Google Scholar 

  21. Alvarez, S.A.: Chi-squared computation for association rules: preliminary results. Technical report, Computer Science Department, Boston College (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Changuel, S., Labroche, N. (2012). Content Independent Metadata Production as a Machine Learning Problem. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics