Semisupervised learning from different information sources

Li, Tao; Ogihara, Mitsunori

doi:10.1007/s10115-004-0155-8

Semisupervised learning from different information sources

Published: 01 March 2005

Volume 7, pages 289–309, (2005)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Tao Li¹ &
Mitsunori Ogihara¹

116 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

This paper studies the use of a semisupervised learning algorithm from different information sources. We first offer a theoretical explanation as to why minimising the disagreement between individual models could lead to the performance improvement. Based on the observation, this paper proposes a semisupervised learning approach that attempts to minimise this disagreement by employing a co-updating method and making use of both labeled and unlabeled data. Three experiments to test the effectiveness of the approach are presented in this paper: (i) webpage classification from both content and hyperlinks; (ii) functional classification of gene using gene expression data and phylogenetic data and (iii) machine self-maintaining from both sensory and image data. The results show the effectiveness and efficiency of our approach and suggest its application potentials.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abney S (2002) Bootstrapping. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002). Morgan Kaufmann, San Francisco, CA, pp 360–367
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. In: Langley P (ed) Proceedings of the 17th international conference on machine learning (ICML 2000). Morgan Kaufmann, San Francisco, CA, pp 9–16
Becker S (1996) Mutual information maximization: models of cortical self-organization. Netw Comput Neural Syst 7:7–31
Article Google Scholar
Bennett KP, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD 2002). ACM Press, New York, NY, pp 289–296
Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: Brodley CE, Danyluk AP (eds) Proceedings of the eighteenth international conference on machine learning (ICML 2001). Morgan Kaufmann, San Francisco, CA, pp 19–26
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory (COLT 1998). ACM Press, New York, NY, pp 92–100
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet C, Furey TS, Manuel Ares J, Haussler D (2000) Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences 97(1):262–267
Article Google Scholar
Buc F, Grandvalet Y, Ambroise C (2002) Semi-supervised marginboost. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14 (NIPS 2001). The MIT Press, Cambridge, MA, pp 553–560
Castelli V, Cover T (1996) The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans Inf Theory 42:2102–2117
Article MathSciNet Google Scholar
Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the 1999 joint SIGDAT conference on empirical methods in natural language processing and very large corpora. ACM Press, New York, NY, pp 100–110
Cozman FG, Cohen I (2002) Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the fifteenth international Florida artificial intelligence society conference (FLAIRS 2002), pp 327–331
Dasgupta S, Littman ML, McAllester D (2001) PAC generalization bounds for co-training. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14 (NIPS 2001). The MIT Press, Cambridge, MA, pp 375–382
De Sa VR, Ballard D (1998) Category learning through multimodality sensing. Neural Comput 10:1097–1117
Article Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38
MathSciNet Google Scholar
Fung G, Mangasarian O (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15:29–44
Article Google Scholar
Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems 6 (NIPS 1993). Morgan Kaufmann, San Francisco, CA, pp 120–127
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Langley P (ed) Proceedings of the 17th international conference on machine learning (ICML 2000). Morgan Kaufmann, San Francisco, CA, pp 327–334
Li T, Zhu S, Li Q, Ogihara M (2003) Gene functional classification by semi-supervised learning from heterogeneous data. In: Proceedings of the 18th annual ACM symposium on applied computing (SAC’03). ACM Press, New York, NY, pp 78–82
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
Article Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill
Muslea I, Minton S, Knoblock CA (2000) Selective sampling with redundant views. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on innovative applications of artificial intelligence (AAAI/IAAI 2000). The MIT Press, Cambridge, MA, pp 621–626
Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. The MIT Press, Cambridge, MA, pp 355–368
Nigam K (2001) Using unlabeled data to improve text classification. Doctoral Dissertation, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on information and knowledge management (CIKM 2000). ACM Press, New York, NY, pp 86–93
Nigam K, McCallum AK, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39:103–134
Article Google Scholar
Pavlidis P, Weston J, Cai J, Grundy WN (2001) Gene functional classification from heterogeneous data. In: Proceedings of fifth annual international conference on computational molecular biology (RECOMB 2001). ACM Press, New York, NY, pp 249–255
Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco, CA
Roth D, Zelenko D (2000) Toward a theory of learning coherent concepts. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on innovative applications of artificial intelligence (AAAI/IAAI 2000). The MIT Press, Cambridge, MA, pp 639–644
Seeger M (2000) Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh, Edinburgh, UK
Vapnik VN (1998) Statistical learning theory. Wiley, New York, NY
Weiss G, Provost F (2001) The effect of class distribution on classifier learning: an empirical study. Technical report ML-TR 44, Department of Computer Science, Rutgers University, New Brunswick, NJ
Google Scholar
Wu L, Oviatt SL, Cohen PR (1999) Multimodal integration—a statistical view. IEEE Trans Multimedia 1:334–341
Article Google Scholar
Zhang T, Oles F (2000) A probability analysis on the value of unlabeled data for classification problems. In: Langley P (ed) Proceedings of the 17th international conference on machine learning (ICML 2000). Morgan Kaufmann, San Francisco, CA, pp 1191–1198

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Rochester, Rochester, NY, 14627-0226, USA
Tao Li & Mitsunori Ogihara

Authors

Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Mitsunori Ogihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, T., Ogihara, M. Semisupervised learning from different information sources. Knowl Inf Syst 7, 289–309 (2005). https://doi.org/10.1007/s10115-004-0155-8

Download citation

Received: 18 August 2003
Revised: 15 November 2003
Accepted: 19 December 2003
Published: 01 March 2005
Issue Date: March 2005
DOI: https://doi.org/10.1007/s10115-004-0155-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semisupervised learning from different information sources

Abstract

Access this article

Similar content being viewed by others

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Semi-supervised Hierarchical Classification Based on Local Information

Semi-supervised multi-label classification using an extended graph-based manifold regularization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semisupervised learning from different information sources

Abstract

Access this article

Similar content being viewed by others

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Semi-supervised Hierarchical Classification Based on Local Information

Semi-supervised multi-label classification using an extended graph-based manifold regularization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation