Abstract
Document-category integration (or category integration for short) is fundamental to many e-commerce applications, including information integration along supply chains and information aggregation by intermediaries. Because of the trend of globalization, the requirement for category integration has been extended from monolingual to poly-lingual settings. Poly-lingual category integration (PLCI) aims to integrate two document catalogs, each of which consists of documents written in a mix of languages. Several category integration techniques have been proposed in the literature, but these techniques focus only on monolingual category integration rather than PLCI. In this study, we propose a feature-reinforcement-based PLCI (namely, FR-PLCI) technique that takes into account the master documents of all languages when integrating source documents (in the source catalog) written in a specific language into the master catalog. Using the monolingual category integration (MnCI) technique as a performance benchmark, our empirical evaluation results show that our proposed FR-PLCI technique achieves better integration accuracy than MnCI does in both English and Chinese category integration tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Bayardo, R., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)
Agrawal, R., Srikant, R.: On Integrating Catalogs. In: Proceedings of the Tenth International Conference on World Wide Web, pp. 603–612. ACM Press, Hong Kong (2001)
Brill, E.: A Simple Rule-based Part of Speech Tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155. Association for Computational Linguistics, Trento (1992)
Brill, E.: Some Advances in Rule-Based Part of Speech Tagging. In: Proceedings of the 12th National Conference on Artificial Intelligence (AAAI 1994), Seattle, WA, pp. 722–727 (1994)
Jing, Y., Croft, W.B.: An Association Thesaurus for Information Retrieval. Technical Report, Department of Computer Science, University of Massachusetts at Amherst (1994)
Stonebraker, M., Hellerstein, J.M.: Content Integration for E-business. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 552–560. ACM Press, Santa Barbara (2001)
Voutilainen, A.: Nptool: A Detector of English Noun Phrases. In: Proceedings of Workshop on Very Large Corpora, Ohio, pp. 48–57 (1993)
Wei, C., Cheng, T.: A Clustering-Based Approach for Supporting Document-category Integration. In: Proceedings of 7th Pacific Asia Conference on Information Systems (PACIS), Adelaide, South Australia, pp. 1314–1326 (2003)
Wei, C., Shi, H., Yang, C.C.: Feature Reinforcement Approach to Poly-Lingual Text Categorization. In: Proceedings of International Conference on Asia Digital Library (2007)
Yang, C.C., Luk, J.: Automatic Generation of English/Chinese Thesaurus Based on a Parallel Corpus in Laws. Journal of the American Society for Information Science and Technology 54(7), 671–682 (2003)
Yang, C.C., Luk, J., Yung, S., Yen, J.: Combination and Boundary Detection Approach for Chinese Indexing. Journal of the American Society for Information Science 51(4), 340–351 (2000)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of 14th International Conference on Machine Learning, pp. 412–420 (1997)
Zhang, D., Lee, W.S.: Learning to Integrate Web Taxonomies. Journal of Web Semantics 2(2), 131–151 (2004)
Zhang, D., Lee, W.S.: Web Taxonomy Integrating using Support Vector Machines. In: Proceedings of 13th International Conference on World Wide Web (WWW), New York, pp. 472–481 (2004)
Zhang, D., Lee, W.S.: Web Taxonomy Integration through Co-Bootstrapping. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, pp. 410–417 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wei, CP., Chen, CC., Cheng, TH., Yang, C.C. (2009). A Feature-Reinforcement–Based Approach for Supporting Poly-Lingual Category Integration. In: Weinhardt, C., Luckner, S., Stößer, J. (eds) Designing E-Business Systems. Markets, Services, and Networks. WEB 2008. Lecture Notes in Business Information Processing, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01256-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-01256-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01255-6
Online ISBN: 978-3-642-01256-3
eBook Packages: Computer ScienceComputer Science (R0)