Abstract
A view validation algorithm has been shown to predict whether or not the views are sufficiently compatible for solving a particular learning task. But it only works when a natural split of features exists. If the split does not exist, it will fail to manufacture a feature split to build the best views. In this paper, we present a general algorithm CCFP (Correlation and Compatibility based Feature Partitioner) to automate multi-view detection. CCFP first labels the large amount of unlabeled examples using single view algorithm, then calculates the conditional SU (Symmetric Uncertainty) between every pair of features and the IG (Information Gain) of each feature given the examples labeled previously by single view algorithm with high-confidence predictions. According to the estimated values of SU and IG, all the features will be partitioned into two views that are low correlated, compatible and sufficient enough. The experiment results show that multi-view learner with views generated by CCFP outperforms learner with views generated by other means clearly.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Brefeld, U., Scheffer, T.: Co-EM support vector learning. In: Proceedings of the 21st international conference on Machine learning (2004)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of Information and Knowledge Management (2000)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39 (1977)
Ion, M., Minton, S., Knoblock, C.: Adaptive view validation: A first step towards automatic view detection. In: The 19th International Conference on Machine Learning (ICML 2002), pp. 443–450 (2002)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory, pp. 92–100 (1998)
Rayid, G.: Combining labeled and unlabeled data for text classification with a large number of categories. In: Proceedings of IEEE Conference on Data Mining (2001)
Rayid, G.: Combining labeled and unlabeled data for multiclass text classification. In: Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 187–194 (2002)
Yu, L., Liu, H.: Feature Selection for High-Dimensional Data: A Fast Correlation- Based Filter Solution. In: Proceedings of the 19th International Conference on Machine Learning (ICML 2003), pp. 856–863 (2003)
Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical recipes in C. Cambridge University Press, Cambridge (1988)
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of ICML 1997 (1997)
Miller, G.: WordNet: An online lexical database. International Journal of Lexicography (1990)
Ion, M., Minton, S., Knoblock, C.: Active + Semi-supervised Learning = Robust Multi-view Learning. In: The 19th International Conference on Machine Learning (ICML 2002), pp. 435–442 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, K., Tang, J., Li, J., Wang, K. (2005). Feature-Correlation Based Multi-view Detection. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424925_127
Download citation
DOI: https://doi.org/10.1007/11424925_127
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25863-6
Online ISBN: 978-3-540-32309-9
eBook Packages: Computer ScienceComputer Science (R0)