An Extension of Iterative Scaling for Decision and Data Aggregation in Ensemble Classification

Pal, Siddharth; Miller, David J.

doi:10.1007/s11265-006-0009-6

Siddharth Pal¹ &
David J. Miller¹

120 Accesses
5 Citations
Explore all metrics

Abstract

Improved iterative scaling (IIS) is an algorithm for learning maximum entropy (ME) joint and conditional probability models, consistent with specified constraints, that has found great utility in natural language processing and related applications. In most IIS work on classification, discrete-valued “feature functions” are considered, depending on the data observations and class label, with constraints measured based on frequency counts, taken over hard (0–1) training set instances. Here, we consider the case where the training (and test) set consist of instances of probability mass functions on the features, rather than hard feature values. IIS extends in a natural way for this case. This has applications (1) to ME classification on mixed discrete-continuous feature spaces and (2) to ME aggregation of soft classifier decisions in ensemble classification. Moreover, we combine these methods, yielding a method, with proved learning convergence, that jointly performs (soft) decision-level and feature-level fusion in making ensemble decisions. We demonstrate favorable comparisons against standard Adaboost.M1, input-dependent boosting, and other supervised combining methods, on data sets from the UC Irvine Machine Learning repository.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A geometric framework for multiclass ensemble classifiers

Article Open access 27 September 2023

Hierarchical Bayesian Classifier Combination

Cross-Entropy Based Ensemble Classifiers

Notes

Smoothed estimates are also used [6]. However, these are still based on “hard” frequency counts.
We still assume there are hard instances for the class label. However, it is also possible to consider probabilistic (soft) class labels.
Discrete and mixed discrete-continuous feature spaces can also be handled. Restriction here to purely continuous features is simply for clarity, without loss of generality.
Here, as one example, we are considering the case of a (single) mixture model for each vector $\underline{A}_i$.
Higher order constraints, which encode dependencies between base classifiers, are also possible. However they would entail greater complexity, and a large training set to accurately measure the constraints.
This search was done with $N_e$ fixed at 10. The selected number of hidden units was then used for all ensemble sizes.

References

Y. H. Abdel-Haleem, S. Renals, and N. D. Lawrence. “Acoustic Space Dimensionality Selection and Combination Using the Maximum Entropy Principle,” IEEE ICASSP, 2004.
E. Alpaydin, “Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms,” Neural Comput., vol. 11, no. 8, 1999, pp. 1885–1892.
Article Google Scholar
A. Berger, The Improved Iterative Scaling Algorithm: A Gentle Introduction Tut. (Available from http://www.cs.cmu.edu/~aberger/maxent.html).
A. L. Berger, S. Della Pietra, and V. J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Comput. Linguist., vol. 22, no. 1, 1996, pp. 39–71.
Google Scholar
S. Boyd and L. Vandenberghe, “Convex Optimization,” Cambridge University Press, 2004 (March).
S. F. Chen and R. Rosenfeld, “A Survey of Smoothing Techniques for ME Models,” IEEE Trans. Speech Audio Process., vol. 8, 2000, pp. 37–50.
Article Google Scholar
M. Collins, R. Schapire, and Y. Singer, “Logistic Regression, AdaBoost and Bregman Distances,” Proc. of 13 Annual Conf. on Comput. Learn. Theory, 2000, pp. 158–169.
J. N. Darroch and D. Ratcliff, “Generalized Iterative Scaling for Log-linear Models,” Ann. Math. Stat., vol. 43, 1972, pp. 1470–1480.
Google Scholar
S. Della Pietra, V. Della Pietra, and J. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, 1996, pp. 380–393.
Article Google Scholar
Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Proc. ICML, 1996, pp. 148–156.
N. Friedman, M. Goldszmidt, and T. J. Lee, “Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting,” Proc. ICML, 1998, pp. 179–187.
E. T. Jaynes, “Papers on Probability, Statistics and Statistical Physics,” Reidel, Dordrecht, 1982.
Google Scholar
R. Jin, Y. Liu, L. Si, J. Carbonell, and A. Hauptmann, “A New Boosting Algorithm Using Input-dependent Regularizer,” Proc. ICML, 2003.
H. Kang, K. Kim, and J. Kim, “Optimal Approximation of Discrete Probability Distribution with kth-order Dependency and its Application to Combining Multiple Classifiers,” Pattern Recogn. Lett., vol. 18, no. 6, 1997, pp. 515–523.
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern. Anal. Mach. Intell., vol. 20, no. 3, 1998, pp. 226–239.
Article Google Scholar
R. Kohavi and M. Sahami, “Error-based and Entropy-based Discretization of Continuous Features,” in Proc. of the 2nd International Conference KDD, 1996, pp. 114–119.
R. Lau and M. Sahami, “Adaptive Language Modelling Using the Maximum Entropy Approach,” in Procs. of the ARPA Human Lang. Tech. Workshop, 1993, pp. 110–113.
G. Lebanon and J. Lafferty. “Boosting and Maximum Likelihood for Exponential Models,” NIPS, vol. 15, 2001.
R. Malouf, “A Comparison of Algorithms for Maximum Entropy Parameter Estimation,” in Procs of the sixth conf. on Natural Language Learning, 2002, pp. 49–55.
J. Jeon and R. Manmatha, “Using Maximum Entropy for Automatic Image Annotation,” Image and Video Retrieval: Third Intl. Conf., CIVR, 2004.
R. Meir, R. El-Yaniv, and S. Ben-David, “Localized Boosting,” in Proc. Conf. on Comput. Learning Theory, 2000, pp. 190–199.
D. J. Miller and L. Yan, “Approximate Maximum Entropy Joint Feature Inference Consistent with Arbitrary Lower-order Probability Constraints: Application to Statistical Classification,” Neural Comput., vol. 12, no. 9, 2000, pp. 2175–2207.
D. J. Miller and S. Pal, “Transductive Methods for the Distributed Ensemble Classification Problem,” Neural Comput (in press).
D. J. Miller and L. Yan, “An Approximate Maximum Entropy Method for Classification and more General Inference: Relation to other Maxent Methods and to Naive Bayes,” CISS, 2000.
S. J. Phillips, M. Dudik, and R. E. Schapire, “A Maximum Entropy Approach to Species Distribution Modeling,” ICML, 2004.
A. Schwaighofer, “SVM Toolbox for Matlab,” Available from http://ida.first.fraunhofer.de/~anton/software.html.
G. Schwarz, “Estimating the Dimension of a Model,” Ann. Statist., vol. 6, no. 2, 1978, pp. 461–464.
MATH Google Scholar
P. Smyth, “Clustering Using Monte Carlo Cross-validation,” KDD, 1996, pp. 126–133.
L. Tjen-Sien, L. Wei-Yin, and S. Yu-Shan, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms,” Mach. Learn., 2000, pp. 203–229.
N. Ueda and R. Nakano, “Combining Discriminant-based Classifiers Using the Minimum Classification Error Discriminant,” IEEE W. on NNSP, 1997, pp. 365–374.
S. Wang, D. Schuurmans, and Y. Zhao, “The Latent Maximum Entropy Principle,” IEEE Trans. on Inf. Theory, 2002. (Submitted).
L. Yan and D. J. Miller, “Critic-driven Ensemble Classification via a Learning Method Akin to Boosting,” in Intell. Eng. Sys. Through ANN 1, 2001, pp. 27–32.
L. Yan and D. J. Miller, “General Statistical Inference for Discrete and Mixed Spaces by an Approximate Application of the Maximum Entropy Principle,” IEEE Trans. on NN, vol. 11, no. 3, 2000, pp. 558–573.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, The Pennsylvania State University, Room 227C, EE West, University Park, PA, 16802-2701, USA
Siddharth Pal & David J. Miller

Authors

Siddharth Pal
View author publications
You can also search for this author in PubMed Google Scholar
David J. Miller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David J. Miller.

Appendices

Appendix A

For an arbitrary parameter vector $\underline{\gamma}$, the conditional log-likelihood is

$$L{\left( {\underline{\gamma } } \right)} = {\sum\limits_{t = 1}^T {\ln } }{\left[ {\frac{{e^{{{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {F_{i} = j\left| t \right.} \right]}\gamma {\left( {C = c^{{{\left( t \right)}}} ,F_{i} = j} \right)}} }} }}} }}{{{\sum\limits_{k = 1}^{N_{c} } {e^{{{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {F_{i} = j\left| t \right.} \right]}\gamma {\left( {C = k,F_{i} = j} \right)}} }} }}} } }}}} \right]}$$

(22)

For a change in the parameter vector $\underline{\Delta\gamma}$, the change in log-likelihood is $L(\underline{\gamma}+\underline{\Delta\gamma})-L(\underline{\gamma})$. Using the identity $-\ln(\alpha)\geq 1-\alpha$, we obtain the lower bound

$$\begin{array}{*{20}l} {{L{\left( {\underline{\gamma } + \underline{{\Delta \gamma }} } \right)} - L{\left( {\underline{\gamma } } \right)} \geqslant {\sum\limits_{t = 1}^T {{\left[ {{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {F_{i} = j\left| t \right.} \right]}\Delta \gamma {\left( {C = c^{{{\left( t \right)}}} ,F_{i} = j} \right)} + 1} }} }} \right]}} }} \hfill} \\ {{ - {\sum\limits_{t = 1}^T {{\sum\limits_{k = 1}^{N_{c} } {P{\left[ {C = k\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]} \times e^{{{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {F_{i} = j\left| t \right.} \right]}} }} }\Delta \gamma {\left( {C = k,F_{i} = j} \right)}}} \widehat{ = }B{\left( {\underline{{\Delta \gamma }} \left| {\underline{\gamma } } \right.} \right)}} }} }} \hfill} \\ \end{array} $$

With $\underline{\Delta\gamma}$ chosen so that $B(\underline{\Delta\gamma}|\underline{\gamma}) \geq 0$ there is improvement in log-likelihood. The obvious approach then is to maximize $B(\underline{\Delta\gamma}|\underline{\gamma})$. However, there is coupled dependence in $B(\underline{\Delta\gamma}|\underline{\gamma})$ on the individual components $\left\{\Delta\gamma(C=c,{F}_{i}=j)\right\}$, which would necessitate a complicated joint optimization. Thus, we seek an auxiliary function that decouples this dependence. We first rewrite

$$\begin{array}{*{20}l} {{B{\left( {\underline{{\Delta \gamma }} + \underline{\gamma } } \right)} = {\sum\limits_{t = 1}^T {{\left[ {{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {F_{i} = j\left| t \right.} \right]}\Delta \gamma {\left( {C = c^{{{\left( t \right)}}} ,F_{i} = j} \right)} + 1} }} }} \right]}} }} \hfill} \\ {{ - {\sum\limits_{t = 1}^T {{\sum\limits_{k = 1}^{N_{c} } {P{\left[ {C = k\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]} \times e^{{{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {\frac{{P{\left[ {F_{i} = j\left| t \right.} \right]}}}{{N_{d} }}} }} }{\left[ {N_{d} \Delta \gamma } \right]}{\left( {C = k,F_{i} = j} \right)}}} } }} }} \hfill} \\ \end{array} $$

(23)

Then, we note that $\sum\nolimits_{i=1}^{N_{d}}\sum\nolimits_{j\in{\cal A}_{i}}\frac{P[{F}_{i}=j|t]}{N_{d}}=1$ i.e., $\left\{\frac{1}{N_d}P[{F}_{i}=j|t], \hspace{0.05in} i=1,...,N_d,\hspace{0.05in} j\in{\cal A}_{i}\right\} $ is an instance of the joint pmf $P[I,F_I]$, associated with first selecting a feature $i \in \left\{1,...,N_d\right\}$ and then a feature value $f_i \in {\cal A}_{i}$. Applying Jensen’s inequality $e^{\sum_{x} p(x)q(x)} \leq \sum_{x} p(x) e^{q(x)}$ to the right hand side of Eq. (23), we have

$$\begin{array}{*{20}l} {{B{\left( {\underline{{\Delta \gamma }} + \underline{\gamma } } \right)} \geqslant T + {\sum\limits_{t = 1}^T {{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {F_{i} = j\left| t \right.} \right]}\Delta \gamma {\left( {C = c^{{{\left( t \right)}}} ,F_{i} = j} \right)}} }} }} }} \hfill} \\ {{ - \frac{1}{{N_{d} }}{\sum\limits_{t = 1}^T {{\sum\limits_{k = 1}^{N_{c} } {{\sum\limits_{i = 1}^{N_{d} } {{\sum\limits_{j \in \mathcal{A}_{i} } {P{\left[ {C = k\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]}P{\left[ {F_{i} = j\left| t \right.} \right]}e^{{N_{d} \Delta \gamma {\left( {C = k,F_{i} = j} \right)}}} } }\widehat{ = }A{\left( {\underline{{\Delta \gamma }} \left| {\underline{\gamma } } \right.} \right)}} }} }} }.} \hfill} \\ \end{array} $$

(24)

Thus $L(\underline{\gamma}+\underline{\Delta\gamma})-L(\underline{\gamma}) \geq B(\underline{\Delta\gamma}|\underline{\gamma}) \geq A(\underline{\Delta\gamma}|\underline{\gamma})$, i.e., we have a new, not as tight, lower bound. Let $\underline{\Delta \gamma^{\ast}} = \arg\max_{\underline{\Delta \gamma}} A(\cdot)$. Since it is easy to verify that $A(\underline{0}|\underline{\gamma})=0$, it must be true that $A(\underline{\Delta \gamma^{\ast}}|\underline{\gamma}) \geq 0$, i.e., property A1 in Section 2.3 is satisfied by this function. Moreover, $A(\underline{\Delta\gamma}|\underline{\gamma})$ can be additively decoupled into individual terms each depending on a single $\Delta\gamma(C=k,{F}_{n}=q)$. Differentiating $A(\underline{\Delta\gamma}|\underline{\gamma})$ with respect to $\Delta\gamma(C=k,{F}_{n}=q)$ and equating to 0 we get the choice of $\underline{\Delta\gamma}$ to maximize $A(\underline{\Delta\gamma}|\underline{\gamma})$, i.e.,

$$\begin{array}{*{20}l} {{\Delta \gamma ^{ * } {\left( {C = k,F_{n} = q} \right)} = \frac{1}{{N_{d} }}\ln {\left[ {\frac{{{\sum\limits_{t = 1;C = k}^T {P{\left[ {F_{n} = q\left| t \right.} \right]}} }}}{{{\sum\limits_{t = 1}^T {P{\left[ {F_{n} = q\left| t \right.} \right]}P{\left[ {C = k\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]}} }}}} \right]}} \hfill} \\ {{ = \frac{1}{{N_{d} }}\ln \frac{{P_{g} {\left[ {C = k,F_{n} = q} \right]}}}{{P_{m} {\left[ {C = k,F_{n} = q} \right]}}},\forall k,n,q.} \hfill} \\ \end{array} $$

(25)

Appendix B

Theorem 1

Consider the function $A(\underline{\Delta\gamma}|\underline{\gamma})$ defined in Eq. (24). Let $\underline{\Delta \gamma^{\ast}} = \arg\max_{\underline{\Delta \gamma}} A(\underline{\Delta \gamma}|\underline{\gamma})$. Then, $A(\underline{\Delta\gamma^{\ast}}|\underline{\gamma})=0$ $ff\;\underline{{\Delta \gamma ^{ * } }} = \underline{0} $. When this occurs the constraints (4) are all met.

Proof

Setting $ \dfrac{\partial A(\underline{\Delta\gamma}|\underline{\gamma})}{\partial\Delta\gamma(F_{i}=j,C=c)}=0$ gives the solution in Eq. (25). Moreover, this is a maximum since $\frac{{\partial ^{2} A{\left( {\underline{{\Delta \gamma }} \left| {\underline{\gamma } } \right.} \right)}}}{{\partial \Delta \gamma ^{2} {\left( {F_{i} = j,C = c} \right)}}} < 0\forall i,j,c$. Thus, $\underline{\Delta \gamma^{\ast}} = \left\{ \Delta\gamma^{\ast}(F_{i}=j,C=c) \hspace{0.05in} \forall i,\hspace{0.05in} \forall j \in {\cal A}_i, \hspace{0.05in} c \in C \right\}.$

Plugging the solution for $\underline{\Delta \gamma^{\ast}}$ back into $A(\underline{\Delta\gamma}|\underline{\gamma})$ in Eq. (24) and simplifying gives

$$\begin{array}{*{20}l} {{A{\left( {\underline{{\Delta \gamma ^{ * } }} \left| {\underline{\gamma } } \right.} \right)} = T + {\sum\limits_t^T {{\sum\limits_i {{\sum\limits_j {{\sum\limits_c {P{\left[ {F_{i} = j\left| t \right.} \right]}} }} }} }} }} \hfill} \\ {{\frac{1}{{N_{d} }}\ln {\left[ {\frac{{{\sum\limits_{t = 1:c^{{{\left( t \right)}}} = c}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}} }}}{{{\sum\nolimits_{t = 1}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}P{\left[ {C = c\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]}} }}}} \right]}} \hfill} \\ {{ - \frac{1}{{N_{d} }}{\sum\limits_i {{\sum\limits_j {{\sum\limits_{c = 1}^C {{\sum\limits_{t = 1:c^{{{\left( t \right)}}} = c} {P{\left[ {F_{i} = j\left| t \right.} \right]}} }} }} }} }} \hfill} \\ \end{array} $$

(26)

Noting that the third term equals T, we have

$$\begin{array}{*{20}l} {{A{\left( {\underline{{\Delta \gamma ^{ * } }} \left| {\underline{\gamma } } \right.} \right)} = \frac{1}{{N_{d} }}{\sum\limits_t^T {{\sum\limits_i {{\sum\limits_j {{\sum\limits_c {P{\left[ {F_{i} = jt} \right]}} }} }} }} }\ln {\left[ {\frac{{{\sum\limits_{t = 1:c^{{{\left( t \right)}}} = c}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}} }}}{{{\sum\limits_{t = 1}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}P{\left[ {C = c\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]}} }}}} \right]}} \hfill} \\ {{ = - \frac{1}{{N_{d} }}{\sum\limits_t^T {{\sum\limits_i {{\sum\limits_j {{\sum\limits_c {P{\left[ {F_{i} = j\left| t \right.} \right]}\ln {\left[ {\frac{{\frac{1}{T}{\sum\limits_{t = 1}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}P{\left[ {C = c\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]}} }}}{{\frac{1}{T}{\sum\limits_{t = 1:c^{{{\left( t \right)}}} = c}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}} }}}} \right]}} }} }} }} }} \hfill} \\ \end{array} $$

Now, since $E(-\ln(\cdot)) \ge -\ln(E(\cdot))$ by Jensen’s inequality, we have

$$A{\left( {\underline{{\Delta \gamma ^{ * } }} \left| {\underline{\gamma } } \right.} \right)} \geqslant - \frac{T}{{N_{d} }}{\sum\limits_i {\ln } }{\left[ {{\sum\limits_{j,c} {\frac{1}{T}{\sum\limits_{t = 1:c^{{{\left( t \right)}}} = c}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}{\left[ {\frac{{\frac{1}{T}{\sum\limits_{t\prime = 1}^T {P{\left[ {F_{i} = j\left| {t\prime } \right.} \right]}P{\left[ {C = c\left| {P_{{\underline{f} ^{{{\left( {t\prime } \right)}}} }} } \right.} \right]}} }}}{{\frac{1}{T}{\sum\limits_{t\prime = 1:c^{{t\prime }} = c}^T {P{\left[ {F_{i} = j\left| {t\prime } \right.} \right]}} }}}} \right]}} }} }} \right]}$$

$$\begin{array}{*{20}c} { = - \frac{T}{{N_{d} }}{\sum\limits_i {\ln } }{\left[ {{\sum\limits_j {{\sum\limits_c {\frac{1}{T}{\sum\limits_{t\prime = 1}^T {P{\left[ {F_{i} = j\left| {t\prime } \right.} \right]}P{\left[ {C = c\left| {P_{{\underline{f} ^{{{\left( {t\prime } \right)}}} }} } \right.} \right]}} }} }} }} \right]}} \\ { = - \frac{T}{{N_{d} }}{\sum\limits_i {\ln {\left( 1 \right)} = 0} }} \\ \end{array} $$

i.e., $A(\underline{\Delta \gamma^{\ast}}|\underline{\gamma}) \ge 0$. Finally by strict convexity of $\ln\left(\cdot\right)$, equality is achieved iff the argument of the $\ln\left(\cdot\right)$ is 1. But this occurs iff

$${\sum\limits_{t = 1}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}P{\left[ {C = c\left| {P_{{\underline{f} ^{{{\left( t \right)}}} }} } \right.} \right]}} } = {\sum\limits_{t = 1;C = c}^T {P{\left[ {F_{i} = j\left| t \right.} \right]}\forall i,j,c} }$$

i.e., by Eq. (25), iff $\underline{\Delta \gamma^{\ast}} = \underline{0}$. Clearly, when this occurs, the constraints are met.▪

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pal, S., Miller, D.J. An Extension of Iterative Scaling for Decision and Data Aggregation in Ensemble Classification. J VLSI Sign Process Syst Sign Im 48, 21–37 (2007). https://doi.org/10.1007/s11265-006-0009-6

Download citation

Published: 03 February 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11265-006-0009-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Extension of Iterative Scaling for Decision and Data Aggregation in Ensemble Classification

Abstract

Access this article

Similar content being viewed by others

A geometric framework for multiclass ensemble classifiers

Hierarchical Bayesian Classifier Combination

Cross-Entropy Based Ensemble Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Extension of Iterative Scaling for Decision and Data Aggregation in Ensemble Classification

Abstract

Access this article

Similar content being viewed by others

A geometric framework for multiclass ensemble classifiers

Hierarchical Bayesian Classifier Combination

Cross-Entropy Based Ensemble Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation