Discovering mappings in hierarchical data from multiple sources using the inherent structure

Candan, K. Selçuk; Kim, Jong Wook; Liu, Huan; Suvarna, Reshma

doi:10.1007/s10115-005-0230-9

Discovering mappings in hierarchical data from multiple sources using the inherent structure

Regular Paper
Published: 30 January 2006

Volume 10, pages 185–210, (2006)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

K. Selçuk Candan¹,
Jong Wook Kim¹,
Huan Liu¹ &
…
Reshma Suvarna¹

77 Accesses
5 Citations
Explore all metrics

Abstract

Unprecedented amounts of media data are publicly accessible. However, it is increasingly difficult to integrate relevant media from multiple and diverse sources for effective applications. The functioning of a multimodal integration system requires metadata, such as ontologies, that describe media resources and media components. Such metadata are generally application-dependent and this can cause difficulties when media needs to be shared across application domains. There is a need for a mechanism that can relate the common and uncommon terms and media components. In this paper, we develop an algorithm to mine and automatically discover mappings in hierarchical media data, metadata, and ontologies, using the structural information inherent in these types of data. We evaluate the performance of this algorithm for various parameters using both synthetic and real-world data collections and show that the structure-based mining of relationships provides high degrees of precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Ontology-Based Indexing of Multimedia Documents on the World Wide Web

Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

Article 25 July 2017

DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons

References

Bille P (2003) A Tree edit distance, alignment distance and inclusion. IT University of Copenhagen, Technical Report Series, TR-2003-23
Bremer J, Gertz M (2003) An efficient XML node identification and indexing scheme. VLDB
Brickley D, Guha R (2000) Resource description framework (RDF) schema specification. http://www.w3.org/TR/RDF-schema
Candan KS, Kim JW, Liu H, Suvarna R (2004) Structure-based mining of hierarchical media data, meta-data, and ontologies. In: Proceedings of the 5th workshop on multimedia data mining in conjunction with the ACM conference on knowledge discovery & data mining, August 22–25. Seattle, WA, USA
Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 294–305
Candan KS, Li WS (2001) Discovering web document associations for web site summarization. DaWaK 152–161
Candan KS, Li WS (2001) On similarity measures for multimedia database applications. Knowl Inf Syst 3(1):30–51
Article MATH Google Scholar
Chawathe S (1999) On the editing comparing hierarchical data in external memory. In: Proceedings of the 25th international conference on very large data bases. Edinburgh, Scotland, UK
Chawathe S, GarciaMolina H (1997) Meaningful change detection in structured data. In: Proceedings of the ACM SIGMOD international conference on management of data. Tucson, Arizona, pp 26–37
Cooper BF, Sample N, Franklin MJ, Hjaltason GR, Shadmon M (2001) A fast index for semistructured data. VLDB, pp 341–350
Doan A, Domingos P, Levy A (2000) Learning source descriptions for data integration. In: Proceedings of the WebDB workshop, pp 81–92
Document Object Model (DOM) (1997) http://www.w3.org/DOM/
Dublin Core Initiative and Metadata Element Set (1995) http://dublincore.org
Extensible 3D (X3D) Graphics (2000) http://www.web3d.org/x3d.html
Extensible Markup Language (XML) (2004) http://www.w3.org/TR/REC-xml
Farach M, Thorup M (1997) Sparse dynamic programming for evolutionarytree comparison. SIAM J Comput 26(1):210–223
Article MathSciNet MATH Google Scholar
Goldman R, Widom J (1997) Enabling query formulation and optimization in semistructured databases. VLDB, pp 436–445
Gower J (1975) Generalized procrustes analysis. Psychometrika 40:33–51
Article MATH MathSciNet Google Scholar
Guha RV, Bray T (1997) Meta content framework using XML. http://www.w3.org/TR/NOTE-MCF-XML-970624
Kendall DG (1984) Shape manifolds: procrustean metrics and complex projective spaces. Bull London Math Soc 16:81–121
Article MATH MathSciNet Google Scholar
Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1–27
Article MATH MathSciNet Google Scholar
Kruskal JB (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2):115–129
Article MATH MathSciNet Google Scholar
Kruskal JB, Wish M (1978) Multidimensional scaling. Sage Publications, Beverly Hills
Lassila O (1997) Introduction to RDF metadata. http://www.w3.org/TR/NOTE-rdf-simple-intro
Lee J, Kim M, Lee Y (1993) Information retrieval based on conceptual distance in IS–A hierarchies. J Doc 49(2):188–207
Article Google Scholar
Li Q, Moon B (2001) Indexing and querying XML data for regular path expressions, VLDB
Li W, Clifton C (1994) Semantic integration in heterogeneous databases using neural networks. In: Proceedings of the 20th international conference on very large data bases, pp 1–12
Li WS, Candan KS, Vu Q, Agrawal D (2002) Query relaxation by structure and semantics for retrieval of logical web documents. TKDE 14(4):768–791
Google Scholar
Lu SY (1979) A tree-to-tree distance and its application to cluster analysis. IEEE Trans PAMI 1:219–224
MATH Google Scholar
Luccio F, Pagli L (1995) Approximate matching for two families of trees. Inf Comput 123(1):111–120
Article MathSciNet MATH Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical Statistical Probability, vol 1, pp 281–297
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proceedings of the 27th international conference on very large data bases, pp 49-58
McHugh J, Abiteboul S, Goldman R, Quass D, Widom J (1997) Lore: a database management system for semistructured data. SIGMOD Rec 26(3):54–66
Article Google Scholar
Miller R, Ioannidis Y, Ramakrishnan R (1994) Schema equivalence in heterogeneous systems: bridging theory and practice. Inf Syst 19(1):3–31
Article Google Scholar
Miller RJ, Haas L, Hernandez MA (2000) Schema mapping as query discovery. In: Proceedings of the 26th international conference on very large data bases, pp 77–88
Milo T, Suciu D (1999) Index structures for path expressions. In: Proceedings of the ICDT'99. ICDT, pp 277–295
Milo T, Zohar S (1998) Using schema matching to simplify heterogeneous data translation. In: Proceedings of the conference on very large data bases, pp 122–133
Mitra P, Wiederhold G, Jannink J (1999) Semiautomatic integration of knowledge sources. In: Proceedings of Fusion'99. Sunnyvale, USA
Mitra P, Wiederhold G, Kersten M (2000) A graph oriented model for articulation of ontology interdependencies. In: Proceedings of the extending database technologies. Lecture Notes in Computer Science, vol 1777, pp 86–100
Myers E (1986) An O(ND) difference algorithms and its variations. Algorithmica 1(2):251–266
Article MATH MathSciNet Google Scholar
Namespaces in XML (1999) http://www.w3.org/TR/REC-xml-names
Palopoli L, Sacca D, Ursino D (1998) An automatic technique for detecting type conflicts in database schemas. In: Proceedings of the 7th international conference on information and knowledge management (CIKM), pp 306–313
Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst, Manage Cybern 19(1):17–30
Article Google Scholar
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10:334–350
Article MATH Google Scholar
Rao P, Moon B (2004) PRIX: indexing and querying XML using Prufer sequences, ICDE
Resnik P (1995) Using information content to evaluate semantic similarity in a taxanomy. IJCAI, pp 448–453
Resnik P (1999) Sematic similarity in a taxanomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95-130
MATH MathSciNet Google Scholar
Selkow S (1977) The tree to tree editing problem. Inf Process Lett 6(6):184–186
Article MATH MathSciNet Google Scholar
Tai KC (1979) The tree-to-tree correction problem. J ACM 36:422–433
Article MathSciNet Google Scholar
The Moving Picture Experts Group (MPEG) (2001) homepage http://www.chiariglione.org/mpeg/
Torgerson WS (1952) Multidimensional scaling. I. Theory and method. Psycometrika 17:401–419
Article MATH MathSciNet Google Scholar
University of Pennsylvania TreeBank Project collection at http://www.cs.washington.edu/research/xmldatasets/www/repository.html
Wang H, Park S, Fan W, Yu P (2003) ViST: a dynamic index method for querying XML data by tree structures. SIGMOD
Wang J, Zhang K, Jeong K, Shasha D (1994) A system for approximate tree matching. IEEE TKDE, pp 559–571
Zhang C, Naughton JF, DeWitt DJ, Luo Q, Lohman GM (2001) On supporting containment queries in relational database management
Zhang K (1989) The editing distance between trees: algorithms and applications. PhD Thesis, Courant Institute, Department of Computer Science
Google Scholar
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18:1245–1262
Article MathSciNet MATH Google Scholar
Zhang K, Shasha D (1997) Approximate tree pattern matching. In: Apostolico A, Galil Z (eds) Pattern matching in strings, trees, and arrays. Oxford University, Oxford, pp 341–371
Zhang K, Wang JTL, Shasha D (1996) On the editing distance between undirected acyclic graphs. Int J Comput Sci 7(1):43–57
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Arizona State University, Tempe, AZ, 82857, USA
K. Selçuk Candan, Jong Wook Kim, Huan Liu & Reshma Suvarna

Authors

K. Selçuk Candan
View author publications
You can also search for this author inPubMed Google Scholar
Jong Wook Kim
View author publications
You can also search for this author inPubMed Google Scholar
Huan Liu
View author publications
You can also search for this author inPubMed Google Scholar
Reshma Suvarna
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to K. Selçuk Candan.

Additional information

K. Selçuk Candan is an Associate Professor at the Department of Computer Science and Engineering at the Arizona State University. He joined the department in August 1997, after receiving his Ph.D. from the Computer Science Department at the University of Maryland at College Park. He received the 1997 ACM DC Chapter award of Samuel N. Alexander Fellowship for his Ph.D. work. His research interests include development of indexing and retrieval schemes for multimedia and Web information and management of dynamic, heterogeneous, and distributed data. He has published various articles in respected journals and conferences in these areas. He also served as program committee member, chair person, and guest editor in various workshops, conferences, and journals. He received his B.S. degree, first ranked in the department, in computer science from Bilkent University in Turkey in 1993. http://www.public.asu.edu/~candan.

Jong Wook Kim received his B.S. from Korea University, Seoul, Korea in 1998, his M.S. from KAIST, Daejon, Korea, in 2000. He is currently a Ph.D. student at the Department of Computer Science and Engineering, Arizona State University, AZ, USA. His primary research interests are web data mining, information retrieval and database systems. His current research concentrates on mining in web communities like discussion board.

Huan Liu earned his Ph.D. in Computer Science in 1989 at University of Southern California, and Bachelor of Engineering in the Electrical Engineering and Computer Science Department at Shanghai Jiao Tong University in 1983. He conducted research at Telecom (Telstra) Australia Research Laboratories in Melbourne, Australia. In January 1994, he joined the School of Computing at the National University of Singapore, and became an Associate Professor. Since January 2000, he is with Department of Computer Science and Engineering at Arizona State University as an Associate Professor. He is a senior member of IEEE, member of ACM, and AAAI. His principal research interests include machine learning, feature and subset selection, data preprocessing, bioinformatics, and data (including text and web) mining. He has worked on real-world data mining applications and published extensively in journal and conference papers, book chapters, and books. He serves on the editorial board of journals, handbook of data mining, encyclopedia of data mining and warehousing.

Reshma Suvarna is currently employed at Honeywell as a Senior Software Engineer. Her work at Honeywell is in the area of Aerospace Electronic Systems. She recieved her Masters Degree in Computer Science and Engineering from Arizona State University in 2003. In addition to her current work in the Aerospace Electronic Systems, she is interested in data mining, web mining, and software engineering research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Candan, K.S., Kim, J.W., Liu, H. et al. Discovering mappings in hierarchical data from multiple sources using the inherent structure. Knowl Inf Syst 10, 185–210 (2006). https://doi.org/10.1007/s10115-005-0230-9

Download citation

Received: 22 August 2004
Revised: 24 January 2005
Accepted: 26 March 2005
Published: 30 January 2006
Issue Date: August 2006
DOI: https://doi.org/10.1007/s10115-005-0230-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering mappings in hierarchical data from multiple sources using the inherent structure

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Ontology-Based Indexing of Multimedia Documents on the World Wide Web

Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now