A Bootstrapping Method for Learning from Heterogeneous Data

Nhung, Ngo Phuong; Phuong, Tu Minh

doi:10.1007/978-3-642-27142-7_49

A Bootstrapping Method for Learning from Heterogeneous Data

Ngo Phuong Nhung²³ &
Tu Minh Phuong²³

Conference paper

1402 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7105))

Abstract

In machine learning applications where multiple data sources present, it is desirable to effectively exploit the sources simultaneously to make better inferences. When each data source is presented as a graph, a common strategy is to combine the graphs, e.g. by taking the sum of their adjacency matrices, and then apply standard graph-based learning algorithms. In this paper, we take an alternative approach to this problem. Instead of performing the combination step, a graph-based learner is created on each graph and makes predictions independently. The method works in an iterative manner: labels predicted by some learners in each round are added to the labeled set and the models are retrained. By nature, the method is based on two popular semi-supervised learning approaches: bootstrapping and graph-based methods, to take their advantages. We evaluated the method on the gene function prediction problem with real biological datasets. Experiments show that our method significantly outperforms a standard graph-based algorithm and compares favorably with a state-of-the-art gene function prediction method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.: Bootstrapping. In: Proceedings of ACL 2002, pp. 360–367 (2002)
Google Scholar
Argyriou, A., Herbster, M., Pontil, M.: Combining graph Laplacians for semi- supervised learning. In: Advances in Neural Information Processing Systems18, NIPS 18, MIT Press, Cambridge (2006)
Google Scholar
Balcan, M.-F., Blum, A., Yang, K.: Co-training and expansion: Towards bridging theory and practice. In: NIPS 17, pp. 89–96 (2005)
Google Scholar
Barutcuoglu, Z., Schapire, R., Troyanskaya, O.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Article Google Scholar
Blum, A., Mitchel, T.: Combining labeled and unlabeled data with co-training. In: Proc. Workshop on on Computational Learning Theory, COLT 1998, pp. 92–100 (1998)
Google Scholar
Chapelle, O., Scholkopf, B., Zien, A. (eds.): SemiSupervised Learning. MIT Press (2006)
Google Scholar
Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C., Morris, Q.: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology 9(suppl. 1), S4 (2008)
Article Google Scholar
Mostafavi, S., Morris, Q.: Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 26(14), 1759–1765 (2010)
Article Google Scholar
Muhlenbach, F., Lallich, S., Zighed, D.A.: Identifying and handling mislabeled instances. Journal of Intelligent Information Systems 22(1), 89–109 (2004)
Article MATH Google Scholar
Pavlidis, P., Weston, J., Cai, J., Grundy, W.N.: Gene functional classification from heterogeneous data. In: Proceedings of RECOMB 2001, pp. 249–255 (2001)
Google Scholar
Tang, W., Lu, Z., Dhillon, I.S.: Clustering with multiple graphs. In: ICDM 2009, pp. 1016–1021 (2009)
Google Scholar
Tsuda, K., Shin, H., SchÄolkopf, B.: Fast protein classification with multiple networks. Bioinformatics 21, ii59–ii65 (2005)
Article Google Scholar
Wang, W., Zhou, Z.H.: A new analysis of co-training. In: Proceedings of International Conference on Machine Learning, ICML 2010, pp. 1135–1142 (2010)
Google Scholar
Zhang, M.L., Zhou, Z.H.: CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 99, 1–15 (2011)
Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)
Google Scholar
Zhou, D., Burges, C.J.C.: Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 1159–1166 (2007)
Google Scholar
Zhou, Z.-H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Article Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Science, University of Wisconsin at Madison (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Posts & Telecommunications Institute of Technology, Hanoi, Vietnam
Ngo Phuong Nhung & Tu Minh Phuong

Authors

Ngo Phuong Nhung
View author publications
You can also search for this author in PubMed Google Scholar
Tu Minh Phuong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hannam University, 133 Ojeong-dong, daeduk-gu, 306-791, Daejeon, South Korea
Tai-hoon Kim
Department of Neuroscience, The Ohio State University, 470 Hitchcock Hall, 2070 Neil Avenue, 43110, Columbus, Ohio, U.S.A.
Hojjat Adeli
Infobright, M5E 1P8, Toronto, ON, Canada
Dominik Slezak
Department of Computer Science, Oslo University College, P.O. Box 4, N-0130, St. Olavs plass, Oslo, Norway
Frode Eika Sandnes
Nanjing University of Aeronautics and Astronautics, 210016, Nanjing, China
Xiaofeng Song
Electronics and Communications Research Institute (ETRI), 305-700, Daejeon, Korea
Kyo-il Chung
Mississippi State University, 39762, Oktibbeha, MS, USA
Kirk P. Arnett

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nhung, N.P., Phuong, T.M. (2011). A Bootstrapping Method for Learning from Heterogeneous Data. In: Kim, Th., et al. Future Generation Information Technology. FGIT 2011. Lecture Notes in Computer Science, vol 7105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27142-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-642-27142-7_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27141-0
Online ISBN: 978-3-642-27142-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics