research-article

Mutually beneficial learning with application to on-line news classification

Authors:

Nenghai YuAuthors Info & Claims

PIKM '07: Proceedings of the ACM first Ph.D. workshop in CIKM

Pages 85 - 92

https://doi.org/10.1145/1316874.1316889

Published: 09 November 2007 Publication History

Abstract

There are three common challenges in real-world classification applications, i.e. how to use domain knowledge, how to resist noisy samples and how to use unlabeled data. To address these problems, a novel classification framework called Mutually Beneficial Learning (MBL) is proposed in this paper. MBL integrates two learning steps together. In the first step, the underlying local structures of feature space are discovered through a learning process. The result provides necessary capability to resist noisy samples and prepare better input for the second step where a consecutive classification process is further applied to the result. These two steps are iteratively performed until a stop condition is met. Different from traditional classifiers, the output of MBL consists of two components: a common classifier and a set of rules corresponding to local structures. In application, a test sample is first matched with the discovered rules. If a matched rule is found, the label of the rule is assigned to the sample; otherwise, the common classifier will be utilized to classify the sample. We applied the MBL to online news classification, and our experimental results showed that MBL is significantly better than Naïve Bayes and SVM, even when the data is noisy or partially labeled.

References

[1]

T. Joachims, Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer, 2002

[2]

C. Chang and C. Lin, LIBSVM: a library for support vector machines, 2001.

[3]

T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. European Conference on Machine Learning, 1998.

Digital Library

[4]

J. Rennie. and R. Rifkin, Improving Multiclass Text Classification with the Support Vector Machine. MIT. AI Memo AIM-2001-026. 2001.

[5]

A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, 1998.

[6]

S. Fabrizio, Machine learning in automated text categorization. ACM Computing Surveys, 2002.

Digital Library

[7]

C. Apte, F. Damerau, and S. Weiss. Text mining with decision rules and decision trees. In Proc. of the Conference on Automated Learning and Discorery, 1998.

[8]

L. Baker and A. Mccallum. Distributional clustering of words for text categorization, ACM SIGIR'98, 1998.

Digital Library

[9]

W. Lam and C. Ho. Using a generalized instance set for automatic text categorization, ACM SIGIR'98, 1998.

Digital Library

[10]

Lei Wang, Ping Xue, and Kap Luk Chan. Incorporating Prior Knowledge into SVM for Image Retrieval. In Proc. International Conference on Pattern Recognition'04, 2004.

Digital Library

[11]

G. Fung, O. Mangasarian and J. Shavlik. Knowledge-Based Support Vector Machine Classifiers, In Proc.NIPS'02,2002.

[12]

R. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating Prior Knowledge into Boosting, In Proc. 19th International Conference on Machine Learning, 2002.

Digital Library

[13]

O. Chapelle, A. Zien, and Schölkopf, B. (Eds.). Semi-supervised learning, MIT Press, 2006

[14]

A. Demiriz, K. Bennett, and M. Embrechts. Semi-supervised clustering using genetic algorithms. In Proc. Artificial Neural Networks in Engineering, 1999.

[15]

C. Rosenberg, M. Hebert, and H. Schneiderman. Semi-supervised self-training of object detection models. Seventh IEEE Workshop on Applications of Computer Vision, 2005.

Digital Library

[16]

I. Davidson, S. S. Ravi, Clustering under Constraints: Feasibility Issues and the K-Means Algorithm, In Proc. of 5th SIAM Data Mining Conference, 2005.

[17]

A. Dayanik, D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification, SIGIR'06, 2006,

Digital Library

[18]

Q. V. Le, A. J. Smola, and T. Gärtner. Simpler knowledge-based support vector machines. In Proc. 23rd International Conference on Machine Learning, 2006.

Digital Library

[19]

T. Chklovski. Learner: a system for acquiring commonsense knowledge by analogy, In Proc. of the 2nd international conference on Knowledge capture, 2003.

Digital Library

[20]

H. Yu, C. Zhai, and J. Han. Text classification from positive and unlabeled documents, In Proc. Conference on Information and Knowledge Management, 2003.

Digital Library

[21]

R. K. Ando, and T. Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. J. Mach. Learn. Res. 6, 2005, 1817--1853.

Digital Library

[22]

L. Reyzin, and R. E. Schapire. How boosting the margin can also boost classifier complexity, In Proc.23rd International Conference on Machine Learning, 2006.

Digital Library

[23]

H. Yu, J. Yang, and J. Han, Classifying Large Data Sets Using SVM with Hierarchical Clusters, SIGKDD'03, 2003.

Digital Library

[24]

P. Ciaccia, M. Patella, and P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In Proceedings of VLDB International Confereence, 1997.

Digital Library

[25]

A. Blum, and T. Mitchell. Combining labeled and unlabeled data with co-training, In Annual Conference on Computational Learning Theory (COLT'98), 1998.

Digital Library

[26]

C. Zhai, and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. In Proc. of SIGIR'2001, 2001.

Digital Library

[27]

P.R. Clarkson and R. Rosenfeld. Statistical Language Modeling Using the CMU-Cambridge Toolkit, In Proc. ESCA Eurospeech, 1997.

Cited By

Liu JXia CLi XYan HLiu T(2020)A BERT-based Ensemble Model for Chinese News Topic PredictionProceedings of the 2020 2nd International Conference on Big Data Engineering10.1145/3404512.3404524(18-23)Online publication date: 29-May-2020
https://dl.acm.org/doi/10.1145/3404512.3404524
Chib VJafri A(2019)An Effective Approach of Extracting Local Documents from the Distributed Representation of Text using Document Embedding and Latent Semantic Analysis2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)10.1109/ICSSIT46314.2019.8987859(152-156)Online publication date: Nov-2019
https://doi.org/10.1109/ICSSIT46314.2019.8987859
Ju RZhou PLi CLiu L(2015)An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.336(2276-2283)Online publication date: Oct-2015
https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.336
Show More Cited By

Index Terms

Mutually beneficial learning with application to on-line news classification
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Wikipedia-based hybrid document representation for textual news classification

The sheer amount of news items that are published every day makes worth the task of automating their classification. The common approach consists in representing news items by the frequency of the words they contain and using supervised learning ...
Learning ECOC Code Matrix for Multiclass Classification with Application to Glaucoma Diagnosis

Classification of different mechanisms of angle closure glaucoma (ACG) is important for medical diagnosis. Error-correcting output code (ECOC) is an effective approach for multiclass classification. In this study, we propose a new ensemble learning ...
Automatic online news monitoring and classification for syndromic surveillance

Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PIKM '07: Proceedings of the ACM first Ph.D. workshop in CIKM

November 2007

184 pages

ISBN:9781595938329

DOI:10.1145/1316874

General Chairs:
Aparna Varde
Virginia State University, USA
,
Jian Pei
Simon Fraser University, Canada

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM07

Sponsor:

CIKM07: Conference on Information and Knowledge Management

November 9, 2007

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 25 of 62 submissions, 40%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
241
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu JXia CLi XYan HLiu T(2020)A BERT-based Ensemble Model for Chinese News Topic PredictionProceedings of the 2020 2nd International Conference on Big Data Engineering10.1145/3404512.3404524(18-23)Online publication date: 29-May-2020
https://dl.acm.org/doi/10.1145/3404512.3404524
Chib VJafri A(2019)An Effective Approach of Extracting Local Documents from the Distributed Representation of Text using Document Embedding and Latent Semantic Analysis2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)10.1109/ICSSIT46314.2019.8987859(152-156)Online publication date: Nov-2019
https://doi.org/10.1109/ICSSIT46314.2019.8987859
Ju RZhou PLi CLiu L(2015)An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.336(2276-2283)Online publication date: Oct-2015
https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.336
Yi YLiu LLi CSong WLiu S(2012)Machine Learning Algorithms with Co-occurrence Based Term Association for Text MiningProceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication Networks10.1109/CICN.2012.141(958-962)Online publication date: 3-Nov-2012
https://dl.acm.org/doi/10.1109/CICN.2012.141
Sun ALiu YLim E(2011)Web classification of conceptual entities using co-trainingExpert Systems with Applications: An International Journal10.1016/j.eswa.2011.03.01038:12(14367-14375)Online publication date: 1-Nov-2011
https://dl.acm.org/doi/10.1016/j.eswa.2011.03.010
Li CSong WPark S(2009)An automatically constructed thesaurus for neural network based document categorizationExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.02.00636:8(10969-10975)Online publication date: 1-Oct-2009
https://dl.acm.org/doi/10.1016/j.eswa.2009.02.006

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten