Skip to main content

CatRelate: A New Hierarchical Document Category Integration Algorithm by Learning Category Relationships

  • Conference paper
Digital Libraries: International Collaboration and Cross-Fertilization (ICADL 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3334))

Included in the following conference series:

Abstract

We address the problem of integrating documents from a source catalog into a master catalog. Current technologies for solving the problem deem it as a flat category integration problem without considering the useful hierarchy information in the catalog, or deal with it hierarchically but without a rigorous model. In contrast, our method is based on correctly identifying relationships among categories, such as Match, Disjoint, SubConcept, SuperConcept, and Overlap, which come from the relations of sets in Set theory. Compared with traditional Match/NotMatch relationship in literature, our approach is more expressive in defining the relationship. The relationships among categories are first learned in a probabilistic way, and then refined by considering the hierarchy context. Our preliminary experiments show that it can help to correctly identify category relationships, and thus increase the accuracy of document integration.

The work described in this paper was substantially supported by a grant from the Research grant Council of the Hong Kong Special Administrative Region, China (Project No: CUHK 4179/03E) and CUHK Strategic Grant (No: 4410001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: On Integrating Catalogs. In: Proceedings of WWW10 Conference, Hong Kong, May 1-5, pp. 603–612 (2001)

    Google Scholar 

  2. Cheng, T.H., Wei, C.: Integration of Document-category Hierarchies: A Clustering-based Approach. In: Web 2003 (The Second Workshop on e-Business), Seattle, Washington, USA (December 13-14, 2003)

    Google Scholar 

  3. Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic Web. The VLDB Journal 12, 303–319 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, S., Yang, C.C., Lam, W. (2004). CatRelate: A New Hierarchical Document Category Integration Algorithm by Learning Category Relationships. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, Ep. (eds) Digital Libraries: International Collaboration and Cross-Fertilization. ICADL 2004. Lecture Notes in Computer Science, vol 3334. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30544-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30544-6_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24030-3

  • Online ISBN: 978-3-540-30544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics