Abstract
Many applications are facing the problem of learning from multiple information sources, where sources may be labeled or unlabeled, and information from multiple information sources may be beneficial but cannot be integrated into a single information source for learning. In this paper, we propose an ensemble learning method for different labeled and unlabeled sources. We first present two label propagation methods to infer the labels of training objects from unlabeled sources by making a full use of class label information from labeled sources and internal structure information from unlabeled sources, which are processes referred to as global consensus and local consensus, respectively. We then predict the labels of testing objects using the ensemble learning model of multiple information sources. Experimental results show that our method outperforms two baseline methods. Meanwhile, our method is more scalable for large information sources and is more robust for labeled sources with noisy data.
Similar content being viewed by others
References
Adhikari A, Rao RR (2008) Synthesizing heavy association rules from different real data sources. Pattern Recognit Lett 29(1):59–71
Adhikari A, Ramachandrarao P, Pedrycz W (2011) Study of select items in different data sources by grouping. Knowl Inf Syst 27(1):23–43
Ahmed E, Nabli A, Gargouri F (2013) A new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization. Appl Intell 39(2):217–235
Aksela M, Laaksonen J (2006) Using diversity of errors for selecting members of a committee classifier. Pattern Recognit 39:608–623
Augsten N, Bohlen M, Gamper J (2013) The address connector: noninvasive synchronization of hierarchical data sources. Knowl Inf Syst 37(3):639–663
Bache K, Lichman M (2013) UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA. http://archive.ics.uci.edu/ml
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Camacho D, Aler R, Borrajo D, Molina J (2006) Multi-agent plan based information gathering. Appl Intell 25(1):59–71
Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Czyz J, Kittler J, Vandendorpe L (2004) Multiple classifier combination for face-based identity verification. Pattern Recognit 37:1459–1469
Dietterich T (2002) The handbook of brain theory and neural networks, 2nd edn. MIT Press, Cambridge
Freund Y (1990) Boosting a weak learning algorithm by majority. In: Proceedings of the third annual workshop on computational learning theory, pp 202–216
Freund Y (1996) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285
Fujino A, Ueda N, Nagata M (2013) Adaptive semi-supervised learning on labeled and unlabeled data with different distributions. Knowl Inf Syst 37:129–154
Fumera G, Roli F (2005) A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 27(6):942–956
Gao J, Fan W, Sun Y, Han J (2009) Heterogeneous source consensus learning via decision propagation and negotiation. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (KDD-09), Paris, France, June, pp 339–347
Gao J, Liang F, Fan W, Sun Y, Han J (2009) Graph-based consensus maximization among multiple supervised and unsupervised models. In: Advances in neural information processing systems (NIPS-09), pp 585–593
Grossi V, Turini F (2012) Streaming mining: a novel architecture for ensemble-based classification. Knowl Inf Syst 30:247–281
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Hansen L, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239
Kuncheva L, Rodriguez J (2012) A weighted voting framework for classifiers ensembles. Knowl Inf Syst. doi:10.1007/s10115-012-0586-6
Lee H, Kim E, Pedrycz W (2012) A new selective neural network ensemble with negative correlation. Appl Intell 37(4):488–498
Li T, Ogihara M (2005) Semisupervised learning from different information sources. Knowl Inf Syst 7:289–309
Pise N, Kulkarni P (2008) A survey of semi-supervised learning methods. In: Proceedings of 2008 international conference on computational intelligence and security (CIS-08), pp 30–34
Preece A, Hui K, Gray A, Matri P (2001) Designing for scalability in a knowledge fusion system. Knowl-Based Syst 14:173–179
Schapire R (1990) The strength of weak learnability. Mach Learn 5:197–227
Schapire R, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37:297–336
Tang XL, Han M (2010) Semi-supervised Bayesian ARTMAP. Appl Intell 33(3):202–317
The DBLP Computer Science Bibliography. http://www.informatik.uni-trier.de/~ley/db/
Verma B, Hassan S (2011) Hybrid ensemble approach for classification. Appl Intell 34(2):258–278
Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474
Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 15(2):353–367
Ye M, Wu X, Hu X, Hu D (2013) Multi-level rough set reduction for decision rule mining. Appl Intell 39(3):642–658
Yin X, Han J, Yang J, Yu PS (2006) Efficient classification across multiple database relations: a CrossMine approach. IEEE Trans Knowl Data Eng 18(6):770–783
Yuan L, Wang Y, Thompson P, Narayan VA, Ye J (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage 61:622–632
Zhang S, You X, Jin Z, Wu X (2009) Mining globally interesting patterns from multiple databases using kernel estimation. Expert Syst Appl 36(8):10863–10869
Zhang P, Zhu X, Tan J, Guo L (2010) Classifier and cluster ensembles for mining concept drifting data streams. In: Proceedings of the 10th IEEE international conference on data mining (KDD-10), pp 1175–1180
Zhao Z, Glotin H, Xie Z, Gao J, Wu X (2012) Cooperative sparse representation in two opposite directions for semi-supervised image annotation. IEEE Trans Image Process 21(9):4218–4231
Zhou D, Bousque O, Lal TN, Weston J (2004) Learning with local and global consistency. In: Proceedings of advances in neural information processing systems (NIPS-04), pp 321–328
Zhu X (2005) Semi-supervised learning literature survey. Technical report 1530, Department of Computer Sciences, University of Wisconsin, Madison
Zhu X, Jin R (2009) Multiple information sources cooperative learning. In: Proceedings of the 21st international joint conference on artificial intelligence (IJCAI-09), California, July, pp 1369–1376
Zhu X, Li B, Wu X, Dan H, Zhang C (2011) CLAP: collaborative pattern mining for distributed information systems. Decis Support Syst 52(1):40–51
Zhuang F, Luo P, Xiong H, Xiong Y (2010) Cross-domain learning from multiple sources: a consensus regularization perspective. IEEE Trans Knowl Data Eng 22(12):1664–1678
Acknowledgements
This work is supported in part by National 863 Program of China under grant 2012AA011005, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education of China under grant IRT13059, the National 973 Program of China under grant 2013CB329604, the Natural Science Foundation of China (under grants 61379021, 61273292, 61229301), the US National Science Foundation (NSF) under grant CCF-0905337 and the Industrial Science and Technology Pillar Program of Changzhou, Jiangsu, China, under grant CE20120026.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Lin, Y., Hu, X. & Wu, X. Ensemble learning from multiple information sources via label propagation and consensus. Appl Intell 41, 30–41 (2014). https://doi.org/10.1007/s10489-013-0508-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0508-7