Abstract
In this paper, we design a local classification algorithm using implicit link analysis, considering the situation that the labeled and unlabeled data are drawn from two different albeit related domains. In contrast to many global classifiers, e.g. Support Vector Machines, our local classifier only takes into account the neighborhood information around unlabeled data points, and is hardly based on the global distribution in the data set. Thus, the local classifier has good abilities to tackle the non-i.i.d. classification problem since its generalization will not degrade by the bias w.r.t. each unlabeled data point. We build a local neighborhood by connecting the similar data points. Based on these implicit links, the Relaxation Labeling technique is employed. In this work, we theoretically and empirically analyze our algorithm, and show how our algorithm improves the traditional classifiers. It turned out that our algorithm greatly outperforms the state-of-the-art supervised and semi-supervised algorithms when classifying documents across different domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lewis, D.D.: Representation and learning in information retrieval. PhD thesis, Amherst, MA, USA (1992)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992)
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin–Madison (2006)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of Sixteenth International Conference on Machine Learning (1999)
Schmidhuber, J.: On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultat fur Informatik (1994)
Thrun, S., Mitchell, T.: Learning One More Thing. IJCAI, 1217–1223 (1995)
Caruana, R.: Multitask Learning. Machine Learning 28(1), 41–75 (1997)
Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Proc. of the Sixteenth Annual Conference on Learning Theory COLT 2003 (2003)
Pelkowitz, L.: A continuous relaxation labeling algorithm for markov random fields. IEEE Transactions on Systems, Man and Cybernetics 20(3), 709–715 (1990)
Wu, P., Dietterich, T.: Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 871–878
Daume III, H., Marcu, D.: Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research 1, 1–15 (1993)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD, pp. 307–318 (1998)
Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR, pp. 485–492 (2006)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (1997)
Kullback, S., Leibler, R.: On Information and Sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ling, X., Dai, W., Xue, GR., Yu, Y. (2008). Knowledge Transferring Via Implicit Link Analysis. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-78568-2_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78567-5
Online ISBN: 978-3-540-78568-2
eBook Packages: Computer ScienceComputer Science (R0)