skip to main content
10.1145/1631272.1631529acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Extracting informative images from web news pages via imbalanced classification

Published: 19 October 2009 Publication History

Abstract

In this paper we propose an imbalanced classification algorithm to extract informative images from web news pages. Our algorithm resolve the difficult problem based on two approaches. First, we limit our dataset to a specific application area so that the patterns of the informative images can be captured by existing classification algorithms. Second, we propose an automatic negative samples filtering algorithm to eliminate most negative samples, so that the classification training data is rebalanced. Because most classification algorithms have reduced performance on imbalanced training data, our algorithm improves the overall performance significantly. In addition, our approach is inherently robust to new web sites and style/layout change of web sites.

References

[1]
Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. A brief survey of web data extraction tools. ACM SIGMOD Record, 31(2):84--93, 2002.
[2]
Albert Orriols and Ester Bernad'oMansilla. The class imbalance problem in learning classifier systems: A preliminary study. GECCO Workshops, pages 74--78, 2005.
[3]
Yanhong Zhai and Bing Liu. Structured data extraction from the web based on partial tree alignment. IEEE Transactions on Knowledge and Data Engineering, 18(12):1614--1628, 2006.

Cited By

View all
  • (2018)Multimedia news exploration and retrieval by integrating keywords, relations and visual featuresMultimedia Tools and Applications10.1007/s11042-010-0639-351:2(625-648)Online publication date: 31-Dec-2018

Index Terms

  1. Extracting informative images from web news pages via imbalanced classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '09: Proceedings of the 17th ACM international conference on Multimedia
      October 2009
      1202 pages
      ISBN:9781605586083
      DOI:10.1145/1631272

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 October 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. imbalanced classification
      2. informative image

      Qualifiers

      • Short-paper

      Conference

      MM09
      Sponsor:
      MM09: ACM Multimedia Conference
      October 19 - 24, 2009
      Beijing, China

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 24 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Multimedia news exploration and retrieval by integrating keywords, relations and visual featuresMultimedia Tools and Applications10.1007/s11042-010-0639-351:2(625-648)Online publication date: 31-Dec-2018

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media