Ontology-Based Data Mining in Digital Libraries

Kovačević, Ana

doi:10.1007/978-1-4419-1219-0_7

Ontology-Based Data Mining in Digital Libraries

Ana Kovačević³

Chapter
First Online: 01 January 2009

1816 Accesses

Part of the book series: Annals of Information Systems ((AOIS,volume 6))

Abstract

The paper proposes matching short forms (abbreviated titles from the citation report) with their corresponding longer ones (journal titles in the digital library). The main problem is that there are often a number of syntactically different abbreviated forms for one abbreviated title in the citation report. We use character- and token-based similarity metrics to identify duplicate records. Also, we improve the process of identifying syntactically different data with the automated discovery of ontological knowledge representations such as thesauri from correctly matched data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Febrl, Freely Extensible Biomedical Record Linkage, http://sourceforge.net/projects/febrl
2.
http://www.thomsonscientific.com
3.
The author is the first author.
4.
http://liinwww.ira.uka.de/bibliography/index
5.
http://citeseer.ist.psu.edu
6.
Note: titles are grouped in clusters.

References

Benjelloun O, Garcia-Molina H, Su Q, Widom J (2005) Swoosh: A Generic Approach to Entity Resolution. Stanford University technical report, March 2005.
Google Scholar
Bilenko M, Mooney RJ, Cohen WW, Ravikumar P, Fienber SE (2003) Adaptive name matching in information integration, IEEE Intelligent Systems, 18(5), 16–23.
Article Google Scholar
Cohen WW (1998) Integration of Heterogeneous Databases without common domains using query based on textual similarity. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’ 98), 201–212.
Google Scholar
Cohen WW, Ravikumar P, Feinberg S (2003) A comparison of string metrics for matching names and records. In Proceedings of the KDD2003 (also available at http://www.cs.cmu.edu/$\sim$ pradeepr/papers/kdd03.pdf).
Daconta M, Obst LJ, Smith KT (2003) The Semantic Web, Wiley, New York.
Google Scholar
Das S, Chong EI, George E, Srinivasan J (2004), Supporting ontology-based semantic matching in RDBMS. In Proceedings of the 30th VLDB Conference, Toronto, Canada.
Google Scholar
Devedzic V (2006) Semantic Web and Education, Springer, Berlin.
Google Scholar
Elmagarmid A, Ipeirotis P, Verykios V (2007) Duplicate record detection: a survey, IEEE Transaction on Knowledge and Data Engineering, 19(1), 1–16.
Article Google Scholar
Fellegi IP, Sunter AB (1969) A theory for record linkage, Journal of the American Statistical Association, 328(64), 1183–1210.
Article Google Scholar
Gruber T (1993) A translation approach to portable ontologies, Knowledge Acquisition, 5(2), 199–220.
Article Google Scholar
Guha S, Koudas N, Marathe A, Srivastava D (2004) Merging the results of approximate match operations. In Proceedings of the 30th VLDB Conference 2004, 636–647.
Google Scholar
International Standard ISO 2788: Documentation – Guidelines for the establishment and development of monolingual thesauri, Second edition – 1986-11-15, International Organization for Standardization.
Google Scholar
JCR (2005) Journal Citation Report, Institute for Scientific Information, Thomson, http://scientific.thomson.com/products/jcr/
Jaro MA (1976) Unimatch: A Record Linkage System: User’s Manual,technical report, US Bureau of the Census, Washington, DC.
Google Scholar
Kantardzic M (2003) Data Mining: Concepts, Models, Methods, and Algorithms, Wiley, New York.
MATH Google Scholar
KOBSON (2005) Internal data of the project on the evaluation of the Serbian authors publishing productivity.
Google Scholar
Larose D (2004) Discovering Knowledge in Data, Wiley, New York.
Book Google Scholar
Lawrence S, Giles CL, Bollacker K (1999) Digital libraries and autonomous citation indexing, IEEE Computer, 32(6), 67–71.
Google Scholar
Levenshtein VI (1966), Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady, 10(8), 707–710.
MathSciNet Google Scholar
Mahesh K, Kud J, Dixon P (1999) Oracle at Trec8: a lexical approach, NIST Special Publication 500-246. In The Eighth Text REtrieval Conference (TREC 8).
Google Scholar
Milutinovic V (2007) DataMining Versus Semantic Web (also available at http://galeb.etf.bg.ac.yu/ ∼ vm/tutorial/tutorial.html).
ODM (2005) Oracle Data Mining Concepts 10g release 2 (also available at http://download.oracle.com/docs/html/B14339_01/4descriptive.htm#i1005741).
Pyle D (1999) Data Preparation for Data Mining, Morgan Kaufmann, San Francisco, CA.
Google Scholar
Salton G, Buckley C (1988) Term weighting approaches in automatic text retrieval, Information and Processing Management, 24(5), 513–523.
Article Google Scholar
Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In Proceedings of Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), 269–278.
Google Scholar
Tejada S, Knoblock C, Minton S (2002) Learning domain-independent string transformation for high accuracy object identification. In Proceedings of ACM SIGKDD 2002.
Google Scholar
Winkler WE (1995) Matching and record linkage. In B. G. Cox (ed.), Business Survey Methods, Wiley, New York, 355–384.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Security Studies, University of Belgrade, Gospodara Vučića 50, 11040, Beograd, Serbia
Ana Kovačević

Authors

Ana Kovačević
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Kovačević .

Editor information

Editors and Affiliations

School of Business Administration, University of Belgrade, Jove Ilica 154, Belgrade, 11000, Serbia
Vladan Devedžić
School of Computing &, Athabasca University, University Drive 1, Athabasca, T9S 3A3, Canada
Dragan Gaševic

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kovačević, A. (2010). Ontology-Based Data Mining in Digital Libraries. In: Devedžić, V., Gaševic, D. (eds) Web 2.0 & Semantic Web. Annals of Information Systems, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1219-0_7

Download citation

DOI: https://doi.org/10.1007/978-1-4419-1219-0_7
Published: 03 November 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-1218-3
Online ISBN: 978-1-4419-1219-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics