Skip to main content

Learning to Integrate Web Catalogs with Conceptual Relationships in Hierarchical Thesaurus

  • Conference paper
Information Retrieval Technology (AIRS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Included in the following conference series:

Abstract

Web catalog integration has been addressed as an important issue in current digital content management. Past studies have shown that exploiting a flattened structure with auxiliary information extracted from the source catalog can improve the integration results. Although earlier studies have also shown that exploiting a hierarchical structure in classification may bring better advantages, the effectiveness has not been testified in catalog integration. In this paper, we propose an enhanced catalog integration (ECI) approach to extract the conceptual relationships from the hierarchical Web thesaurus and further improve the accuracy of Web catalog integration. We have conducted experiments of real-world catalog integration with both a flattened structure and a hierarchical structure in the destination catalog. The results show that our ECI scheme effectively boosts the integration accuracy of both the flattened scheme and the hierarchical scheme with the advanced Support Vector Machine (SVM) classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant., R.: On Integrating Catalogs. In: Proc. the 10th WWW Conf. (WWW10), May 2001, pp. 603–612 (2001)

    Google Scholar 

  2. Boyapati, V.: Improving Hierarchical Text Classification Using Unlabeled Data. In: Proc. the 25th Annual ACMConf. on Research and Development in Information Retrieval (SIGIR 2002), Augest 2002, pp. 363–364 (2002)

    Google Scholar 

  3. Chen, I.-X., Ho, J.-C., Yang, C.-Z.: An iterative approach for web catalog integration with support vector machines. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 703–708. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proc. the 23rd Annual ACM Conf. on Research and Development in Information Retrieval (SIGIR 2000), pp. 256–263 (July 2000)

    Google Scholar 

  5. Frakes, W., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  6. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Joachims, T.: Making Large-Scale SVM Learning Practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)

    Google Scholar 

  8. Keller, A.M.: Smart Catalogs and Virtual Catalogs. In: Kalakota, R., Whinston, A. (eds.) Readings in Electronic Commerce. Addison-Wesley, Reading (1997)

    Google Scholar 

  9. Kim, D., Kim, J., Lee, S.: Catalog Integration for Electronic Commerce through Category-Hierarchy Merging Technique. In: Proc. the 12th Int’l Workshop on Research Issues in Data Engineering: Engineering e-Commerce/e-Business Systems (RIDE 2002), pp. 28–33 (Febraury 2002)

    Google Scholar 

  10. Marron, P.J., Lausen, G., Weber, M.: Catalog Integration Made Easy. In: Proc. the 19th Int’l Conf. on Data Engineering (ICDE 2003), pp. 677–679 (March 2003)

    Google Scholar 

  11. Rennie, J.D.M., Rifkin, R.: Improving Multiclass Text Classification with the Support Vector Machine. Tech. Report AI Memo AIM-2001-026 and CCL Memo 210. MIT (October 2001)

    Google Scholar 

  12. Sarawagi, S., Chakrabarti, S., Godbole., S.: Cross-Training: Learning Probabilistic Mappings between Topics. In: Proc. the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 177–186 (Augest 2003)

    Google Scholar 

  13. Stonebraker, M., Hellerstein, J.M.: Content Integration for e-Commerce. In: Proc. of the 2001 ACM SIGMOD Int’l Conf. on Management of Data, pp. 552–560 (May 2001)

    Google Scholar 

  14. Sun, A., Lim, E.-P., Ng., W.-K.: Performance Measurement Framework for Hierarchical Text Classification. Journal of the American Society for Information Science and Technology (JASIST) 54(11), 1014–1028 (2003)

    Article  Google Scholar 

  15. Tsay, J.-J., Chen, H.-Y., Chang, C.-F., Lin, C.-H.: Enhancing Techniques for Efficient Topic Hierarchy Integration. In: Proc. the 3rd Int’l Conf. on Data Mining (ICDM 2003), pp. 657–660 (November 2003)

    Google Scholar 

  16. Wu, C.-W., Tsai, T.-H., Hsu, W.-L.: Learning to Integrate Web Taxonomies with Fine- Grained Relations: A Case Study Using Maximum Entropy Model. In: Proc. of Asia Information Retrieval Symposium 2005 (AIRS 2005), pp. 190–205 (October 2005)

    Google Scholar 

  17. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proc. the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pp. 42–49 (Augest 1999)

    Google Scholar 

  18. Zadrozny., B.: Reducing Multiclass to Binary by Coupling Probability Estimates. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14 (NIPS 2001). MIT Press, Cambridge (2002)

    Google Scholar 

  19. Zhang, D., Lee, W.S.: Web Taxonomy Integration using Support Vector Machines. In: Proc. WWW 2004, pp. 472–481 (May 2004)

    Google Scholar 

  20. Zhang, D., Lee, W.S.: Web Taxonomy Integration through Co-Bootstrapping. In: Proc. SIGIR 2004, pp. 410–417 (July 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ho, JC., Chen, IX., Yang, CZ. (2006). Learning to Integrate Web Catalogs with Conceptual Relationships in Hierarchical Thesaurus. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_17

Download citation

  • DOI: https://doi.org/10.1007/11880592_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45780-0

  • Online ISBN: 978-3-540-46237-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics