skip to main content
10.1145/1458082.1458364acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Semi-supervised text categorization by active search

Published: 26 October 2008 Publication History

Abstract

In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high classification accuracy. To address this problem, a novel web-assisted text categorization framework is proposed in this paper. Important keywords are first automatically identified from the available labeled documents to form the queries. Search engines are then utilized to retrieve from the Web a multitude of relevant documents, which are then exploited by a semi-supervised framework. To our best knowledge, this work is the first study of this kind. Extensive experimental study shows the encouraging results of the proposed text categorization framework: using Google as the web search engine, the proposed framework is able to reduce the classification error by 30% when compared with the state-of-the-art supervised text categorization method.

References

[1]
O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.
[2]
R. Collobert, F. Sinz, J. Weston, and L. Bottou. Large scale transductive SVMs. JMLR, 7:1687--1712, 2006.
[3]
V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.

Cited By

View all
  • (2022)Improved Inference for Imputation-Based Semisupervised Learning Under Misspecified SettingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307731233:11(6346-6359)Online publication date: Nov-2022
  • (2020)Modeling Latent Relation to Boost Things Categorization ServiceIEEE Transactions on Services Computing10.1109/TSC.2017.271515913:5(915-929)Online publication date: 1-Sep-2020
  • (2015)BODH: A model for information retrieval from research articles2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT)10.1109/ICECCT.2015.7226102(1-6)Online publication date: Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. semi-supervised learning
  2. text categorization

Qualifiers

  • Poster

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Improved Inference for Imputation-Based Semisupervised Learning Under Misspecified SettingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307731233:11(6346-6359)Online publication date: Nov-2022
  • (2020)Modeling Latent Relation to Boost Things Categorization ServiceIEEE Transactions on Services Computing10.1109/TSC.2017.271515913:5(915-929)Online publication date: 1-Sep-2020
  • (2015)BODH: A model for information retrieval from research articles2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT)10.1109/ICECCT.2015.7226102(1-6)Online publication date: Mar-2015
  • (2015)Investigation of BPNN & RBFN in text classification by Active search2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT)10.1109/ICECCT.2015.7226042(1-6)Online publication date: Mar-2015
  • (2013)A text mining method for research project selection using KNN2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE)10.1109/ICGCE.2013.6823562(900-904)Online publication date: Dec-2013
  • (2010)Predicting user evaluations of spoken dialog systems using semi-supervised learning2010 IEEE Spoken Language Technology Workshop10.1109/SLT.2010.5700865(283-288)Online publication date: Dec-2010
  • (2010)Learning Document Labels from Enriched Click GraphsProceedings of the 2010 IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2010.190(57-64)Online publication date: 13-Dec-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media