skip to main content
10.1145/1316874.1316889acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Mutually beneficial learning with application to on-line news classification

Published: 09 November 2007 Publication History

Abstract

There are three common challenges in real-world classification applications, i.e. how to use domain knowledge, how to resist noisy samples and how to use unlabeled data. To address these problems, a novel classification framework called Mutually Beneficial Learning (MBL) is proposed in this paper. MBL integrates two learning steps together. In the first step, the underlying local structures of feature space are discovered through a learning process. The result provides necessary capability to resist noisy samples and prepare better input for the second step where a consecutive classification process is further applied to the result. These two steps are iteratively performed until a stop condition is met. Different from traditional classifiers, the output of MBL consists of two components: a common classifier and a set of rules corresponding to local structures. In application, a test sample is first matched with the discovered rules. If a matched rule is found, the label of the rule is assigned to the sample; otherwise, the common classifier will be utilized to classify the sample. We applied the MBL to online news classification, and our experimental results showed that MBL is significantly better than Naïve Bayes and SVM, even when the data is noisy or partially labeled.

References

[1]
T. Joachims, Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer, 2002
[2]
C. Chang and C. Lin, LIBSVM: a library for support vector machines, 2001.
[3]
T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. European Conference on Machine Learning, 1998.
[4]
J. Rennie. and R. Rifkin, Improving Multiclass Text Classification with the Support Vector Machine. MIT. AI Memo AIM-2001-026. 2001.
[5]
A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, 1998.
[6]
S. Fabrizio, Machine learning in automated text categorization. ACM Computing Surveys, 2002.
[7]
C. Apte, F. Damerau, and S. Weiss. Text mining with decision rules and decision trees. In Proc. of the Conference on Automated Learning and Discorery, 1998.
[8]
L. Baker and A. Mccallum. Distributional clustering of words for text categorization, ACM SIGIR'98, 1998.
[9]
W. Lam and C. Ho. Using a generalized instance set for automatic text categorization, ACM SIGIR'98, 1998.
[10]
Lei Wang, Ping Xue, and Kap Luk Chan. Incorporating Prior Knowledge into SVM for Image Retrieval. In Proc. International Conference on Pattern Recognition'04, 2004.
[11]
G. Fung, O. Mangasarian and J. Shavlik. Knowledge-Based Support Vector Machine Classifiers, In Proc.NIPS'02,2002.
[12]
R. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating Prior Knowledge into Boosting, In Proc. 19th International Conference on Machine Learning, 2002.
[13]
O. Chapelle, A. Zien, and Schölkopf, B. (Eds.). Semi-supervised learning, MIT Press, 2006
[14]
A. Demiriz, K. Bennett, and M. Embrechts. Semi-supervised clustering using genetic algorithms. In Proc. Artificial Neural Networks in Engineering, 1999.
[15]
C. Rosenberg, M. Hebert, and H. Schneiderman. Semi-supervised self-training of object detection models. Seventh IEEE Workshop on Applications of Computer Vision, 2005.
[16]
I. Davidson, S. S. Ravi, Clustering under Constraints: Feasibility Issues and the K-Means Algorithm, In Proc. of 5th SIAM Data Mining Conference, 2005.
[17]
A. Dayanik, D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification, SIGIR'06, 2006,
[18]
Q. V. Le, A. J. Smola, and T. Gärtner. Simpler knowledge-based support vector machines. In Proc. 23rd International Conference on Machine Learning, 2006.
[19]
T. Chklovski. Learner: a system for acquiring commonsense knowledge by analogy, In Proc. of the 2nd international conference on Knowledge capture, 2003.
[20]
H. Yu, C. Zhai, and J. Han. Text classification from positive and unlabeled documents, In Proc. Conference on Information and Knowledge Management, 2003.
[21]
R. K. Ando, and T. Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. J. Mach. Learn. Res. 6, 2005, 1817--1853.
[22]
L. Reyzin, and R. E. Schapire. How boosting the margin can also boost classifier complexity, In Proc.23rd International Conference on Machine Learning, 2006.
[23]
H. Yu, J. Yang, and J. Han, Classifying Large Data Sets Using SVM with Hierarchical Clusters, SIGKDD'03, 2003.
[24]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In Proceedings of VLDB International Confereence, 1997.
[25]
A. Blum, and T. Mitchell. Combining labeled and unlabeled data with co-training, In Annual Conference on Computational Learning Theory (COLT'98), 1998.
[26]
C. Zhai, and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. In Proc. of SIGIR'2001, 2001.
[27]
P.R. Clarkson and R. Rosenfeld. Statistical Language Modeling Using the CMU-Cambridge Toolkit, In Proc. ESCA Eurospeech, 1997.

Cited By

View all
  • (2020)A BERT-based Ensemble Model for Chinese News Topic PredictionProceedings of the 2020 2nd International Conference on Big Data Engineering10.1145/3404512.3404524(18-23)Online publication date: 29-May-2020
  • (2019)An Effective Approach of Extracting Local Documents from the Distributed Representation of Text using Document Embedding and Latent Semantic Analysis2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)10.1109/ICSSIT46314.2019.8987859(152-156)Online publication date: Nov-2019
  • (2015)An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.336(2276-2283)Online publication date: Oct-2015
  • Show More Cited By

Index Terms

  1. Mutually beneficial learning with application to on-line news classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PIKM '07: Proceedings of the ACM first Ph.D. workshop in CIKM
      November 2007
      184 pages
      ISBN:9781595938329
      DOI:10.1145/1316874
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 November 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. implicit domain knowledge
      2. local structure
      3. mutually beneficial learning
      4. news classification

      Qualifiers

      • Research-article

      Conference

      CIKM07

      Acceptance Rates

      Overall Acceptance Rate 25 of 62 submissions, 40%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)A BERT-based Ensemble Model for Chinese News Topic PredictionProceedings of the 2020 2nd International Conference on Big Data Engineering10.1145/3404512.3404524(18-23)Online publication date: 29-May-2020
      • (2019)An Effective Approach of Extracting Local Documents from the Distributed Representation of Text using Document Embedding and Latent Semantic Analysis2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)10.1109/ICSSIT46314.2019.8987859(152-156)Online publication date: Nov-2019
      • (2015)An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.336(2276-2283)Online publication date: Oct-2015
      • (2012)Machine Learning Algorithms with Co-occurrence Based Term Association for Text MiningProceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication Networks10.1109/CICN.2012.141(958-962)Online publication date: 3-Nov-2012
      • (2011)Web classification of conceptual entities using co-trainingExpert Systems with Applications: An International Journal10.1016/j.eswa.2011.03.01038:12(14367-14375)Online publication date: 1-Nov-2011
      • (2009)An automatically constructed thesaurus for neural network based document categorizationExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.02.00636:8(10969-10975)Online publication date: 1-Oct-2009

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media