Skip to main content

A Feature-Reinforcement–Based Approach for Supporting Poly-Lingual Category Integration

  • Conference paper
Designing E-Business Systems. Markets, Services, and Networks (WEB 2008)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 22))

Included in the following conference series:

  • 1135 Accesses

Abstract

Document-category integration (or category integration for short) is fundamental to many e-commerce applications, including information integration along supply chains and information aggregation by intermediaries. Because of the trend of globalization, the requirement for category integration has been extended from monolingual to poly-lingual settings. Poly-lingual category integration (PLCI) aims to integrate two document catalogs, each of which consists of documents written in a mix of languages. Several category integration techniques have been proposed in the literature, but these techniques focus only on monolingual category integration rather than PLCI. In this study, we propose a feature-reinforcement-based PLCI (namely, FR-PLCI) technique that takes into account the master documents of all languages when integrating source documents (in the source catalog) written in a specific language into the master catalog. Using the monolingual category integration (MnCI) technique as a performance benchmark, our empirical evaluation results show that our proposed FR-PLCI technique achieves better integration accuracy than MnCI does in both English and Chinese category integration tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Bayardo, R., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Agrawal, R., Srikant, R.: On Integrating Catalogs. In: Proceedings of the Tenth International Conference on World Wide Web, pp. 603–612. ACM Press, Hong Kong (2001)

    Chapter  Google Scholar 

  3. Brill, E.: A Simple Rule-based Part of Speech Tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155. Association for Computational Linguistics, Trento (1992)

    Chapter  Google Scholar 

  4. Brill, E.: Some Advances in Rule-Based Part of Speech Tagging. In: Proceedings of the 12th National Conference on Artificial Intelligence (AAAI 1994), Seattle, WA, pp. 722–727 (1994)

    Google Scholar 

  5. Jing, Y., Croft, W.B.: An Association Thesaurus for Information Retrieval. Technical Report, Department of Computer Science, University of Massachusetts at Amherst (1994)

    Google Scholar 

  6. Stonebraker, M., Hellerstein, J.M.: Content Integration for E-business. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 552–560. ACM Press, Santa Barbara (2001)

    Chapter  Google Scholar 

  7. Voutilainen, A.: Nptool: A Detector of English Noun Phrases. In: Proceedings of Workshop on Very Large Corpora, Ohio, pp. 48–57 (1993)

    Google Scholar 

  8. Wei, C., Cheng, T.: A Clustering-Based Approach for Supporting Document-category Integration. In: Proceedings of 7th Pacific Asia Conference on Information Systems (PACIS), Adelaide, South Australia, pp. 1314–1326 (2003)

    Google Scholar 

  9. Wei, C., Shi, H., Yang, C.C.: Feature Reinforcement Approach to Poly-Lingual Text Categorization. In: Proceedings of International Conference on Asia Digital Library (2007)

    Google Scholar 

  10. Yang, C.C., Luk, J.: Automatic Generation of English/Chinese Thesaurus Based on a Parallel Corpus in Laws. Journal of the American Society for Information Science and Technology 54(7), 671–682 (2003)

    Article  Google Scholar 

  11. Yang, C.C., Luk, J., Yung, S., Yen, J.: Combination and Boundary Detection Approach for Chinese Indexing. Journal of the American Society for Information Science 51(4), 340–351 (2000)

    Article  Google Scholar 

  12. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of 14th International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  13. Zhang, D., Lee, W.S.: Learning to Integrate Web Taxonomies. Journal of Web Semantics 2(2), 131–151 (2004)

    Article  Google Scholar 

  14. Zhang, D., Lee, W.S.: Web Taxonomy Integrating using Support Vector Machines. In: Proceedings of 13th International Conference on World Wide Web (WWW), New York, pp. 472–481 (2004)

    Google Scholar 

  15. Zhang, D., Lee, W.S.: Web Taxonomy Integration through Co-Bootstrapping. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, pp. 410–417 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wei, CP., Chen, CC., Cheng, TH., Yang, C.C. (2009). A Feature-Reinforcement–Based Approach for Supporting Poly-Lingual Category Integration. In: Weinhardt, C., Luckner, S., Stößer, J. (eds) Designing E-Business Systems. Markets, Services, and Networks. WEB 2008. Lecture Notes in Business Information Processing, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01256-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01256-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01255-6

  • Online ISBN: 978-3-642-01256-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics