ABSTRACT
We present a theoretical analysis framework that shows how ensembles of collective classifiers can improve predictions for graph data. We show how collective ensemble classification reduces errors due to variance in learning and more interestingly inference. We also present an empirical framework that includes various ensemble techniques for classifying relational data using collective inference. The methods span single- and multiple-graph network approaches, and are tested on both synthetic and real world classification tasks. Our experimental results, supported by our theoretical justifications, confirm that ensemble algorithms that explicitly focus on both learning and inference processes and aim at reducing errors associated with both, are the best performers.
- A. V. Assche, C. Vens, H. Blockeel, and S. Dzeroski. A random forest approach to relational learning. In ICML'04 Workshop on SRL and its Connections.Google Scholar
- R. S. Y. F. P. Bartlett and W. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In ICML'97.Google Scholar
- L. Breiman. Bagging predictors. MLJ'96. Google ScholarDigital Library
- P. Domingos. A unified bias-variance decomposition for zero-one and squared loss. In AAAI'00. Google ScholarDigital Library
- H. Eldardiry and J. Neville. Across-model collective ensemble classification. In AAAI'11.Google Scholar
- H. Eldardiry and J. Neville. A resampling technique for relational data graphs. In SNA-SIGKDD'08.Google Scholar
- A. Fast and D. Jensen. Why stacked models perform effective collective classification. In ICDM'08. Google ScholarDigital Library
- J. Friedman. On bias, variance, 0/1-loss, and the curse-of-dimensionality. DMKD'97. Google ScholarDigital Library
- S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. NC'92. Google ScholarDigital Library
- A. HeB and N.Kushmerick. Iterative ensemble classification for relational data: a case study of semantic web services. In ECML'04.Google Scholar
- G. James. Variance and bias for general loss functions. MLJ'03. Google ScholarDigital Library
- Z. Kou and W. W. Cohen. Stacked graphical models for effecient inference for markov random fields. In SDM'07.Google Scholar
- S. Natarajan, T.Khot, K. Kersting, B. Gutmann, and J. Shavlik. Gradient-based boosting for statistical relational learning: The relational dependecy network case. MLJ'12. Google ScholarDigital Library
- J. Neville and D. Jensen. A bias/variance decomposition for models using collective inference. MLJ'08. Google ScholarDigital Library
- J. Neville and D. Jensen. Leveraging relational autocorrelation with latent group models. In ICDM'05. Google ScholarDigital Library
- J. Neville and D. Jensen. Relational dependency networks. JMLR'07. Google ScholarDigital Library
- C. Preisach and L. Schmidt-Thieme. Ensembles of relational classifiers. KIS'08. Google ScholarDigital Library
- J. Quinlan. Bagging, boosting and c4.5. In AAAI'96. Google ScholarDigital Library
- K. Tumer and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition'96.Google Scholar
- R. Xiang and J. Neville. Understanding propagation error and its effect on collective classification. In ICDM'11. Google ScholarDigital Library
- Y.Freund and R.E.Schapire. Experiments with a new boosting algorithm. In ICML'96.Google Scholar
Index Terms
- An analysis of how ensembles of collective classifiers improve predictions in graphs
Recommendations
Statistical Instance-Based Pruning in Ensembles of Independent Classifiers
The global prediction of a homogeneous ensemble of classifiers generated in independent applications of a randomized learning algorithm on a fixed training set is analyzed within a Bayesian framework. Assuming that majority voting is used, it is ...
An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers
Intelligent SystemsAbstractEnsembles of classifiers are a very popular type of method for performing classification, due to their usually high predictive accuracy. However, ensembles have two drawbacks. First, ensembles are usually considered a ‘black box’, non-...
How large should ensembles of classifiers be?
We propose to determine the size of a parallel ensemble by estimating the minimum number of classifiers that are required to obtain stable aggregate predictions. Assuming that majority voting is used, a statistical description of the convergence of the ...
Comments