skip to main content
10.1145/1871437.1871747acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Robust prediction from multiple heterogeneous data sources with partial information

Published:26 October 2010Publication History

ABSTRACT

Significant research efforts for robust integration of information from multiple sources are being pursued at a rapid pace. However, the information in heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. Most of the recent research on data integration have been primarily focused on the cases where the information is available across all the different sources and can not effectively integrate sources in the presence of partial information. We develop an ensemble method that boosts the decisions made from different models on individual sources and obtain robust results for the task of class prediction. We propose a heterogeneous boosting framework that uses all the available information even if some of the sources do not provide any information about some objects. We demonstrate the effectiveness of the proposed framework for the problem of gene function prediction and compare to the state-of-the-art methods using several real-world biological datasets. We also show that the proposed method outperforms any kind of imputation schemes that are widely used while integrating data with partial information

References

  1. N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321--357, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  2. M. des Jardins, P. Karp, M. Krummenacker, T. Lee, and C. Ouzounis. Prediction of enzyme classification from protein sequence without the use of sequence similarity. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pages 92--99, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. G. Dietterich. Ensemble methods in machine learning. In MCS '00: Proceedings of the First International Workshop on Multiple Classifier Systems, pages 1--15, London, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148--156, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Karaoz, T. Murali, S. Letovsky, Y. Zheng, C. Ding, C. Cantor, and S. Kasif. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl Acad. Sci. USA, 101:2888--2893, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Kuncheva, J. Bezdek, and R. Duin. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition, 34(2):299--314, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. G. Lanckriet, T. De Bie, N. Cristianini, M. Jordan, and W. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20:2626--2635, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Polikar. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3):21--45, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. Roli, G. Giacinto, and V. Gianni. Methods for designing multiple classifier systems. In Multiple Classifier Systems, pages 78--87, 2001. Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Ruepp, D. Zollner, A. and Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. Guldener, G. Mannhaupt, M. Munsterkotter, and H. Mewes. The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32(18):5539--5545, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. J. Smola, S. V. N. Vishwanathan, and T. Hofmann. Kernel methods for missing variables. In Proceedings of International Workshop on Artificial Intelligence and Statistics, pages 325--332, 2005.Google ScholarGoogle Scholar
  12. G. Valentini. True path rule hierarchical ensembles for genome-wide gene function prediction. IEEE ACM Transactions on Computational Biology and Bioinformatics (in press), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Zhao, L. Chen, and K. Aihara. Protein function prediction with the shortest path in functional linkage graph and boosting. International Journalof Bioinformatics Research and Application, 4(4):375--384, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Robust prediction from multiple heterogeneous data sources with partial information

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
        October 2010
        2036 pages
        ISBN:9781450300995
        DOI:10.1145/1871437

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader