Skip to main content
Log in

Learning to extract and summarize hot item features from multiple auction web sites

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

It is difficult to digest the poorly organized and vast amount of information contained in auction Web sites which are fast changing and highly dynamic. We develop a unified framework which can automatically extract product features and summarize hot item features from multiple auction sites. To deal with the irregularity in the layout format of Web pages and harness the uncertainty involved, we formulate the tasks of product feature extraction and hot item feature summarization as a single graph labeling problem using conditional random fields. One characteristic of this graphical model is that it can model the inter-dependence between neighbouring tokens in a Web page, tokens in different Web pages, as well as various information such as hot item features across different auction sites. We have conducted extensive experiments on several real-world auction Web sites to demonstrate the effectiveness of our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agichtein E, Ganti V (2004) Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 20–29

  2. Auction Sotware Review (2003) In http://www.auctionsoftwarereview.com/article-ebay-statistics.asp

  3. Aumann Y, Feldman R, Liberzon Y, Rosenfeld B, Schler J (2006). Visual information extraction. Knowl Inform Syst 10(1):1–15

    Article  Google Scholar 

  4. Bunescu R, Mooney R (2004) Collective information extraction with relational markov networkds. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL), pp 439–446

  5. Chang C, Lui SC (2001) IEPAD: information extraction based on pattern discovery. In: Proceedings of the tenth international conference on world wide web (WWW), pp 681–688

  6. Ciravegna F (2001) (LP)2 an adaptive algorithm for information extraction from web-related texts. In: Proceedings of the seventeenth international joint conference on artificial intelligence (IJCAI), pp 1251–1256

  7. Collins M (2002) Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 489–496

  8. Crescenzi V, Mecca G (2004) Automatic information extraction from large websites. J ACM 51(5):731–779

    Article  MathSciNet  Google Scholar 

  9. Crescenzi V, Mecca G, Merialdo P (2001) ROADRUNNER: Towards automatic data extraction from large web sites. In: Proceedings of the 27th very large databases conference (VLDB), pp 109–118

  10. Etzioni O, Cafarella M, Kok S, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2005) Unsupservised named-entity extraction from the web: an experimental study. Artif Intell 165(1): 91–134

    Article  Google Scholar 

  11. Feldman R, Rosenfeld B, Fresko M (2006) TEG - a hybrid approach to information extraction. Knowl Inform Syst 9(1):1–18

    Article  Google Scholar 

  12. Freitag D, McCallum A (2000) Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the seventeenth national conference on artificial intelligence (AAAI), pp 584–589

  13. Ghani R (2005) Price prediction and insurance for online auctions. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 411–418

  14. Ghani R, Simmons H (2004) Predicting the end-price of online auctions. In: International workshop on data mining and adaptive modelling methods for economics and management

  15. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 168–177

  16. Kschischang F, Frey B, Loeliger H (2001) Factor graphs and the sum-product algorithm. IEEE Trans on Inform Theory 47(2):498–519

    Article  MATH  MathSciNet  Google Scholar 

  17. Kushmerick N (2000) Wrapper induction: efficiency and expressiveness. Artif Intell 118(1–2): 15–68

    Article  MATH  MathSciNet  Google Scholar 

  18. Kushmerick N, Thomas B (2002) Adaptive information extraction: core technologies for information agents. In: Intelligents information agents R&d in europe: An agentLink perspective, pp 79–103

  19. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of eighteenth international conference on machine learning (ICML), pp 282–289

  20. Li Z, Ng WK, Sun A (2005) Web data extraction based on structural similarity. Knowl Inform Syst 8(4):438–491

    Article  Google Scholar 

  21. Liu B, Grossman R, Zhai Y (2003) Mining data records in web pages. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), pp 601–606

  22. Mani I, Maybury M (1999) In advances in automatic text summarization. MIT press, Cambridge

    Google Scholar 

  23. McCallum A, Jensen D (2003) A note on the unification of information extraction and data mining using conditional-probability, relational models. In: Proceedings of the IJCAI workshop on learning statistical models from relational data

  24. McCallum A, Wellner B (2003) Toward conditional models of identity uncertainty with application to proper noun coreference. In: Proceedings of the IJCAI workshop on information integration on the web

  25. Muslea I, Minton S, and Knoblock C (2001) Hierarchical wrapper induction for semistructured information sources. J Auton Agents Multi-Agent Syst 4(1–2):93–114

    Article  Google Scholar 

  26. Popescu A, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the human language technology conference conference on empirical methods in natural language processing, pp 339–346

  27. Wang J, Karypis G (2005) On efficiently summarizing categorical databases. Knowl Inform Syst 9(1):19–37

    Article  Google Scholar 

  28. Wellner B, McCallum A, Peng F, Hay M (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI), pp 593–601

  29. Wong TL, Lam W (2004) A probabilistic approach for adapting information extraction wrappers and discovering new attributes. In: Proceedings of the 2004 IEEE international conference on data mining (ICDM), pp 257–264

  30. Wong TL, Lam W, Chan SK (2006) Extracting and summarizing hot items features across different auction web sites. In: The tenth Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 334–345

  31. Wong TL, Lam W (2007) Adapting web information extraction knowledge via mining site- invariant and site-dependent features. ACM Trans Internet Technol (in press)

  32. Yi J, Niblack W (2005) Sentiment mining in web fountain. In: Proceedings of the 21st international conference on data engineering (ICDE), pp 1073–1083

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tak-Lam Wong.

Additional information

The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Nos: CUHK 4179/03E and CUHK4193/04E) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050363 and 2050391). This work is also affiliated with the Microsoft-CUHK Joint Laboratory for Human-centric Computing and Interface Technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, TL., Lam, W. Learning to extract and summarize hot item features from multiple auction web sites. Knowl Inf Syst 14, 143–160 (2008). https://doi.org/10.1007/s10115-007-0078-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0078-2

Keywords

Navigation