Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web

Probst, Katharina; Ghani, Rayid; Krema, Marko; Fano, Andy; Liu, Yan

doi:10.1007/978-3-540-74951-6_3

Katharina Probst¹,
Rayid Ghani¹,
Marko Krema¹,
Andy Fano¹ &
…
Yan Liu²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4737))

Included in the following conference series:

Workshop on Web Mining

665 Accesses
5 Citations

Abstract

We describe an approach to extract attribute-value pairs from product descriptions in order to augment product databases by representing each product as a set of attribute-value pairs. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. We formulate the extraction task as a classification problem and use Naïve Bayes combined with a multi-view semi-supervised algorithm (co-EM). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the semi-supervised classification algorithm. The extracted attributes and values are then linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods products. The extracted attribute-value pairs can be useful in a variety of applications, including product recommendations, product comparisons, and demand forecasting. In this paper, we describe one practical application of the extracted attribute-value pairs: a prototype of an Assortment Comparison Tool that allows retailers to compare their product assortments to those of their competitors. As the comparison is based on attributes and values, we can draw meaningful conclusions at a very fine-grained level. We present the details and research issues of such a tool, as well as the current state of our prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Data Driven Discovery of Attribute Dictionaries

What Matters for Shoppers: Investigating Key Attributes for Online Product Comparison

The WDC Gold Standards for Product Feature Extraction and Product Matching

References

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT-98 (1998)
Google Scholar
Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: EMNLP/VLC (1999)
Google Scholar
Ghani, R., Jones, R.: A comparison of efficacy of bootstrapping algorithms for information extraction. In: LREC 2002 Workshop on Linguistic Knowledge Acquisition (2002)
Google Scholar
Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.: Text mining for product attribute extraction. SIGKDD Explorations, Special Issue on Successful Real-World Data Mining Applications
Google Scholar
Jones, R.: Learning to extract entities from labeled and unlabeled text. Ph.D. Dissertation (2005)
Google Scholar
Kuhn, H.: The hungarian method for the assignment problem. Naval Research Logistic Quaterly 2, 83–97 (1955)
Article MathSciNet Google Scholar
Kullback, S., Leibler, R.: On information and sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951)
Article MATH MathSciNet Google Scholar
Lin, D.: Dependency-based evaluation of MINIPAR. In: Workshop on the Evaluation of Parsing Systems (1998)
Google Scholar
Liu, B., Hu, M., Cheng, J.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of WWW 2005 (2005)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: CIKM-2000. Proceedings of the Ninth International Conference on Information and Knowledge Management (2000)
Google Scholar
Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: HLT 2004 (2004)
Google Scholar
Popescu, A.-M., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of EMNLP 2005 (2005)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: AAI 99 Workshop on Machine Learning for Information Extraction (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Accenture Technology Labs, Chicago, IL, USA
Katharina Probst, Rayid Ghani, Marko Krema & Andy Fano
Carnegie Mellon University, Pittsburgh, PA, USA
Yan Liu

Authors

Katharina Probst
View author publications
You can also search for this author in PubMed Google Scholar
Rayid Ghani
View author publications
You can also search for this author in PubMed Google Scholar
Marko Krema
View author publications
You can also search for this author in PubMed Google Scholar
Andy Fano
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bettina Berendt Andreas Hotho Dunja Mladenic Giovanni Semeraro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Probst, K., Ghani, R., Krema, M., Fano, A., Liu, Y. (2007). Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web. In: Berendt, B., Hotho, A., Mladenic, D., Semeraro, G. (eds) From Web to Social Web: Discovering and Deploying User and Content Profiles. WebMine 2006. Lecture Notes in Computer Science(), vol 4737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74951-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-74951-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74950-9
Online ISBN: 978-3-540-74951-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics