Skip to main content

Ontology-Based Data Mining in Digital Libraries

  • Chapter
  • First Online:
  • 1816 Accesses

Part of the book series: Annals of Information Systems ((AOIS,volume 6))

Abstract

The paper proposes matching short forms (abbreviated titles from the citation report) with their corresponding longer ones (journal titles in the digital library). The main problem is that there are often a number of syntactically different abbreviated forms for one abbreviated title in the citation report. We use character- and token-based similarity metrics to identify duplicate records. Also, we improve the process of identifying syntactically different data with the automated discovery of ontological knowledge representations such as thesauri from correctly matched data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Febrl, Freely Extensible Biomedical Record Linkage, http://sourceforge.net/projects/febrl

  2. 2.

    http://www.thomsonscientific.com

  3. 3.

    The author is the first author.

  4. 4.

    http://liinwww.ira.uka.de/bibliography/index

  5. 5.

    http://citeseer.ist.psu.edu

  6. 6.

    Note: titles are grouped in clusters.

References

  1. Benjelloun O, Garcia-Molina H, Su Q, Widom J (2005) Swoosh: A Generic Approach to Entity Resolution. Stanford University technical report, March 2005.

    Google Scholar 

  2. Bilenko M, Mooney RJ, Cohen WW, Ravikumar P, Fienber SE (2003) Adaptive name matching in information integration, IEEE Intelligent Systems, 18(5), 16–23.

    Article  Google Scholar 

  3. Cohen WW (1998) Integration of Heterogeneous Databases without common domains using query based on textual similarity. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’ 98), 201–212.

    Google Scholar 

  4. Cohen WW, Ravikumar P, Feinberg S (2003) A comparison of string metrics for matching names and records. In Proceedings of the KDD2003 (also available at http://www.cs.cmu.edu/$\sim$ pradeepr/papers/kdd03.pdf).

  5. Daconta M, Obst LJ, Smith KT (2003) The Semantic Web, Wiley, New York.

    Google Scholar 

  6. Das S, Chong EI, George E, Srinivasan J (2004), Supporting ontology-based semantic matching in RDBMS. In Proceedings of the 30th VLDB Conference, Toronto, Canada.

    Google Scholar 

  7. Devedzic V (2006) Semantic Web and Education, Springer, Berlin.

    Google Scholar 

  8. Elmagarmid A, Ipeirotis P, Verykios V (2007) Duplicate record detection: a survey, IEEE Transaction on Knowledge and Data Engineering, 19(1), 1–16.

    Article  Google Scholar 

  9. Fellegi IP, Sunter AB (1969) A theory for record linkage, Journal of the American Statistical Association, 328(64), 1183–1210.

    Article  Google Scholar 

  10. Gruber T (1993) A translation approach to portable ontologies, Knowledge Acquisition, 5(2), 199–220.

    Article  Google Scholar 

  11. Guha S, Koudas N, Marathe A, Srivastava D (2004) Merging the results of approximate match operations. In Proceedings of the 30th VLDB Conference 2004, 636–647.

    Google Scholar 

  12. International Standard ISO 2788: Documentation – Guidelines for the establishment and development of monolingual thesauri, Second edition – 1986-11-15, International Organization for Standardization.

    Google Scholar 

  13. JCR (2005) Journal Citation Report, Institute for Scientific Information, Thomson, http://scientific.thomson.com/products/jcr/

  14. Jaro MA (1976) Unimatch: A Record Linkage System: User’s Manual,technical report, US Bureau of the Census, Washington, DC.

    Google Scholar 

  15. Kantardzic M (2003) Data Mining: Concepts, Models, Methods, and Algorithms, Wiley, New York.

    MATH  Google Scholar 

  16. KOBSON (2005) Internal data of the project on the evaluation of the Serbian authors publishing productivity.

    Google Scholar 

  17. Larose D (2004) Discovering Knowledge in Data, Wiley, New York.

    Book  Google Scholar 

  18. Lawrence S, Giles CL, Bollacker K (1999) Digital libraries and autonomous citation indexing, IEEE Computer, 32(6), 67–71.

    Google Scholar 

  19. Levenshtein VI (1966), Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady, 10(8), 707–710.

    MathSciNet  Google Scholar 

  20. Mahesh K, Kud J, Dixon P (1999) Oracle at Trec8: a lexical approach, NIST Special Publication 500-246. In The Eighth Text REtrieval Conference (TREC 8).

    Google Scholar 

  21. Milutinovic V (2007) DataMining Versus Semantic Web (also available at http://galeb.etf.bg.ac.yu/ ∼ vm/tutorial/tutorial.html).

  22. ODM (2005) Oracle Data Mining Concepts 10g release 2 (also available at http://download.oracle.com/docs/html/B14339_01/4descriptive.htm#i1005741).

  23. Pyle D (1999) Data Preparation for Data Mining, Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  24. Salton G, Buckley C (1988) Term weighting approaches in automatic text retrieval, Information and Processing Management, 24(5), 513–523.

    Article  Google Scholar 

  25. Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In Proceedings of Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), 269–278.

    Google Scholar 

  26. Tejada S, Knoblock C, Minton S (2002) Learning domain-independent string transformation for high accuracy object identification. In Proceedings of ACM SIGKDD 2002.

    Google Scholar 

  27. Winkler WE (1995) Matching and record linkage. In B. G. Cox (ed.), Business Survey Methods, Wiley, New York, 355–384.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Kovačević .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Kovačević, A. (2010). Ontology-Based Data Mining in Digital Libraries. In: Devedžić, V., Gaševic, D. (eds) Web 2.0 & Semantic Web. Annals of Information Systems, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1219-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-1219-0_7

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-1218-3

  • Online ISBN: 978-1-4419-1219-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics