Skip to main content

Exploiting Attribute-Wise Distribution of Keywords and Category Dependent Attributes for E-Catalog Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5226))

Abstract

E-catalogs are semi-structured documents that consist of multiple attributes and values. Although the conventional text classification techniques are applicable to the e-catalog classification as well, they cannot use the attribute information effectively to improve the classification accuracy. In this paper, we propose an e-catalog classification algorithm by extending Naïve Bayesian Classifier to use the attribute information. Specifically, we focus on exploiting two e-catalog specific characteristics: the attribute-wise keyword distribution and the category dependent attributes. Experiments on real data validate the proposed method.

This work was supported by the Postal Technology R&D program of MKE/IITA. [2006-X-001-02, Development of Real-time Postal Logistics System].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, V., Zykov, V., Klein, M., Schulten, E., Fensel, D.: GoldenBullet: Automated Classification of Product Data in E-commerce. Business Information System (2002)

    Google Scholar 

  2. Kim, Y., Lee, T., Chun, J., Lee, S.: Modified Naïve Bayes Classifier for E-Catalog Classification. In: Lee, J., Shim, J., Lee, S.-g., Bussler, C.J., Shim, S. (eds.) DEECS 2006. LNCS, vol. 4055, pp. 246–257. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Kim, D., Lee, S.: Catalog Management in e-Commerce Systems. In: Computer Science and Technology (CST 2003) (2003)

    Google Scholar 

  4. Hepp, M., Leukel, J., Schmitz, V.: A Quantitative Analysis of Product Categorization Standards: eCl@ss, UNSPSC, eOTD, and RNTD. Knowledge and Information Systems (KAIS) 13(1), 77–114 (2007)

    Article  Google Scholar 

  5. Yi, J., Sundaresan, N.: A Classifier for Semi-Structured Documents. In: 6th ACM SIGKDD, pp. 340–344 (2000)

    Google Scholar 

  6. Denoyer, L., Gallinari, P.: Bayesian Network Model for Semi-structured Document Classification. Information Processing & Management 40(5), 807–827 (2004)

    Article  Google Scholar 

  7. Bratko, A., Filipic, B.: Exploiting Structural Information for Semi-structured Document Categorization. Information Processing & Management 42(3), 679–694 (2006)

    Article  Google Scholar 

  8. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  9. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  10. Singhal, A.: Modern Information Retrieval: A Brief Overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24(4), 35–43 (2001)

    Google Scholar 

  11. KOCIS (Korea Ontology-based e-Catalog Information System) (Accessed on, April 2008), http://www.g2b.go.kr:8100/index.jsp

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, Yg., Lee, T., Lee, Sg., Park, JH. (2008). Exploiting Attribute-Wise Distribution of Keywords and Category Dependent Attributes for E-Catalog Classification. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2008. Lecture Notes in Computer Science, vol 5226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87442-3_121

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87442-3_121

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87440-9

  • Online ISBN: 978-3-540-87442-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics