Skip to main content

Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web

  • Conference paper
From Web to Social Web: Discovering and Deploying User and Content Profiles (WebMine 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4737))

Included in the following conference series:

Abstract

We describe an approach to extract attribute-value pairs from product descriptions in order to augment product databases by representing each product as a set of attribute-value pairs. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. We formulate the extraction task as a classification problem and use Naïve Bayes combined with a multi-view semi-supervised algorithm (co-EM). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the semi-supervised classification algorithm. The extracted attributes and values are then linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods products. The extracted attribute-value pairs can be useful in a variety of applications, including product recommendations, product comparisons, and demand forecasting. In this paper, we describe one practical application of the extracted attribute-value pairs: a prototype of an Assortment Comparison Tool that allows retailers to compare their product assortments to those of their competitors. As the comparison is based on attributes and values, we can draw meaningful conclusions at a very fine-grained level. We present the details and research issues of such a tool, as well as the current state of our prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT-98 (1998)

    Google Scholar 

  2. Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: EMNLP/VLC (1999)

    Google Scholar 

  3. Ghani, R., Jones, R.: A comparison of efficacy of bootstrapping algorithms for information extraction. In: LREC 2002 Workshop on Linguistic Knowledge Acquisition (2002)

    Google Scholar 

  4. Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.: Text mining for product attribute extraction. SIGKDD Explorations, Special Issue on Successful Real-World Data Mining Applications

    Google Scholar 

  5. Jones, R.: Learning to extract entities from labeled and unlabeled text. Ph.D. Dissertation (2005)

    Google Scholar 

  6. Kuhn, H.: The hungarian method for the assignment problem. Naval Research Logistic Quaterly 2, 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  7. Kullback, S., Leibler, R.: On information and sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  8. Lin, D.: Dependency-based evaluation of MINIPAR. In: Workshop on the Evaluation of Parsing Systems (1998)

    Google Scholar 

  9. Liu, B., Hu, M., Cheng, J.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of WWW 2005 (2005)

    Google Scholar 

  10. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: CIKM-2000. Proceedings of the Ninth International Conference on Information and Knowledge Management (2000)

    Google Scholar 

  11. Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: HLT 2004 (2004)

    Google Scholar 

  12. Popescu, A.-M., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of EMNLP 2005 (2005)

    Google Scholar 

  13. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: AAI 99 Workshop on Machine Learning for Information Extraction (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bettina Berendt Andreas Hotho Dunja Mladenic Giovanni Semeraro

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Probst, K., Ghani, R., Krema, M., Fano, A., Liu, Y. (2007). Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web. In: Berendt, B., Hotho, A., Mladenic, D., Semeraro, G. (eds) From Web to Social Web: Discovering and Deploying User and Content Profiles. WebMine 2006. Lecture Notes in Computer Science(), vol 4737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74951-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74951-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74950-9

  • Online ISBN: 978-3-540-74951-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics