Mixture of experts: a literature survey

Masoudnia, Saeed; Ebrahimpour, Reza

doi:10.1007/s10462-012-9338-y

Mixture of experts: a literature survey

Published: 12 May 2012

Volume 42, pages 275–293, (2014)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Saeed Masoudnia¹ &
Reza Ebrahimpour²

5256 Accesses
184 Citations
15 Altmetric
Explore all metrics

Abstract

Mixture of experts (ME) is one of the most popular and interesting combining methods, which has great potential to improve performance in machine learning. ME is established based on the divide-and-conquer principle in which the problem space is divided between a few neural network experts, supervised by a gating network. In earlier works on ME, different strategies were developed to divide the problem space between the experts. To survey and analyse these methods more clearly, we present a categorisation of the ME literature based on this difference. Various ME implementations were classified into two groups, according to the partitioning strategies used and both how and when the gating network is involved in the partitioning and combining procedures. In the first group, The conventional ME and the extensions of this method stochastically partition the problem space into a number of subspaces using a special employed error function, and experts become specialised in each subspace. In the second group, the problem space is explicitly partitioned by the clustering method before the experts’ training process starts, and each expert is then assigned to one of these sub-spaces. Based on the implicit problem space partitioning using a tacit competitive process between the experts, we call the first group the mixture of implicitly localised experts (MILE), and the second group is called mixture of explicitly localised experts (MELE), as it uses pre-specified clusters. The properties of both groups are investigated in comparison with each other. Investigation of MILE versus MELE, discussing the advantages and disadvantages of each group, showed that the two approaches have complementary features. Moreover, the features of the ME method are compared with other popular combining methods, including boosting and negative correlation learning methods. As the investigated methods have complementary strengths and limitations, previous researches that attempted to combine their features in integrated approaches are reviewed and, moreover, some suggestions are proposed for future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alpaydin E, Jordan MI (1996) Local linear perceptrons for classification. IEEE Trans Neural Netw 7(3): 788–792
Article Google Scholar
Avnimelech R, Intrator N (1999) Boosted mixture of experts: an ensemble learning scheme. Neural Comput 11(2): 483–497
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140
MATH MathSciNet Google Scholar
Chen K, Xu L, Chi H (1999) Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw 12(9): 1229–1252
Article Google Scholar
Dailey MN, Cottrell GW (1999) Organization of face and object recognition in modular neural network models. Neural Netw 12(7–8): 1053–1074
Article Google Scholar
Dietterich T (2000) Ensemble methods in machine learning. Multiple classifier systems, Cagliari, Italy. Springer, LNCS, pp 1–15
Ebrahimpour R (2007) View-independent face recognition with mixture of experts. PhD thesis, Institute for studies in Theoretical Physics and Mathematics (IPM)
Ebrahimpour R, Kabir E, Yousefi MR (2007) Face detection using mixture of MLP experts. Neural Process Lett 26(1): 69–82. doi:10.1007/s11063-007-9043-z
Article Google Scholar
Ebrahimpour R, Kabir E, Esteky H, Yousefi MR (2008a) A mixture of multilayer perceptron experts network for modeling face/nonface recognition in cortical face processing regions. Intell Autom Soft Comput 14(2): 151–162
Article Google Scholar
Ebrahimpour R, Kabir E, Yousefi MR (2008b) Teacher-directed learning in view-independent face recognition with mixture of experts using overlapping eigenspaces. Comput Vis Image Underst 111(2): 195–206. doi:10.1016/j.cviu.2007.10.003
Article Google Scholar
Ebrahimpour R, Kabir E, Yousefi MR (2008c) Teacher-directed learning in view-independent face recognition with mixture of experts using single-view eigenspaces. J Franklin Inst 345(2): 87–101. doi:10.1016/j.jfranklin.2007.06.004
Article MATH Google Scholar
Ebrahimpour R, Kabire E, Esteky H, Yousefi MR (2008d) View-independent face recognition with mixture of experts. Neurocomputing 71(4–6): 1103–1107. doi:10.1016/j.neucom.2007.08.021
Article Google Scholar
Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2010) Mixture of MLP-experts for trend forecasting of time series: a case study of the Tehran stock exchange. Int J Forecast
Ebrahimpour R, Arani SAAA, Masoudnia S (2011a) Improving combination method of NCL experts using gating network. Neural Comput Appl 1–7. doi:10.1007/s00521-011-0746-8.
Ebrahimpour R, Kabir E, Yousefi MR (2011b) Improving mixture of experts for view-independent face recognition using teacher-directed learning. Mach Vis Appl 22(2): 421–432
Article Google Scholar
Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2011) Mixture of MLP-experts for trend forecasting of time series: a case study of the Tehran stock exchange. Int J Forecast 27(3): 804–816. doi:10.1016/j.ijforecast.2010.02.015
Article Google Scholar
Ebrahimpour R, Sadeghnejad N, Arani SAAA, Mohammadi N (2012) Boost-wise pre-loaded mixture of experts for classification tasks. Neural Comput Appl:1-13. doi:10.1007/s00521-012-0909-2
Goodband JH, Haas OCL, Mills JA (2006) A mixture of experts committee machine to design compensators for intensity modulated radiation therapy. Pattern Recogn 39(9): 1704–1714. doi:10.1016/j.patcog.2006.03.018
Article MATH Google Scholar
Guler I, Ubeyli ED (2005) A mixture of experts network structure for modelling Doppler ultrasound blood flow signals. Comput Biol Med 35(7): 565–582. doi:10.1016/j.compbiomed.2004.04.001
Article Google Scholar
Gutta S, Huang JRJ, Jonathon P, Wechsler H (2000) Mixture of experts for classification of gender, ethnic origin, and pose of human faces. IEEE Trans Neural Netw 11(4): 948–960
Article Google Scholar
Hansen JV (1999) Combining predictors: comparison of five meta machine learning methods. Inform Sci 119(1–2): 91–105
Article Google Scholar
Hansen JV (2000) Combining predictors: meta machine learning methods and bias/variance and ambiguity decompositions. Computer Science Dept., Aarhus Univ, Aarhus
Google Scholar
Hong X, Harris CJ (2001) A mixture of experts network structure construction algorithm for modelling and control. Appl Intell 16(1): 59–69
Article Google Scholar
Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4): 820–834
Article Google Scholar
Islam MM, Yao X, Nirjon SMS, Islam MA, Murase K (2008) Bagging and boosting negatively correlated neural networks. IEEE Trans Syst Man Cybern B 38(3): 771–784. doi:10.1109/Tsmcb.2008.922055
Article Google Scholar
Jacobs RA (1997) Bias/variance analyses of mixtures-of-experts architectures. Neural Comput 9(2): 369–383
Article MATH Google Scholar
Jacobs RA, Jordan MI, Barto AG (1991) Task decomposition through competition in a modular connectionist architecture—the what and where vision tasks. Cogn Sci 15(2): 219–250
Article Google Scholar
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1): 79–87
Article Google Scholar
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the Em algorithm. Neural Comput 6(2): 181–214
Article Google Scholar
Kamimura R (2004) Teacher-directed learning with Gaussian and sigmoid activation functions. In: Springer, New York, pp 530–536
Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. The MIT press, Cambridge
Google Scholar
Kim SP, Sanchez JC, Erdogmus D, Rao YN, Wessberg J, Principe JC, Nicolelis M (2003) Divide- and-conquer approach for brain machine interfaces: nonlinear mixture of competitive linear models. Neural Netw 16(5–6): 865–871. doi:10.1016/S0893-6080(03)00108-4
Article Google Scholar
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal 20(3): 226–239
Article Google Scholar
Kotsiantis S (2011a) Combining bagging, boosting, rotation forest and random subspace methods. Artif Intell Rev 35(3): 1–18. doi:10.1007/s10462-010-9192-8
Article Google Scholar
Kotsiantis S (2011b) An incremental ensemble of classifiers. Artif Intell Rev 36(4): 1–18. doi:10.1007/s10462-011-9211-4
Article Google Scholar
Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3): 159–190
Article Google Scholar
Kuncheva LI (2002) Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst Man Cybern B 32(2): 146–156. doi:10.1109/3477.990871
Article Google Scholar
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York
Book Google Scholar
Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10): 1399–1404
Article Google Scholar
Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybern B 29(6): 716–725
Article Google Scholar
Lorena AC, de Carvalho AC, Gama JMP (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1): 19–37
Article Google Scholar
Masoudnia S, Ebrahimpour R, Arani SAAA (2012) Incorporation of a regularization term to control negative correlation in mixture of experts. Neural Process Lett:1-17
Mangiameli P, West D (1999) An improved neural classification network for the two-group problem. Comput Oper Res 26(5): 443–460
Article MATH Google Scholar
Nguyen MH (2006) Cooperative coevolutionary mixture of experts: a neuro ensemble approach for automatic decomposition of classification problems. University of New South Wales, New South Wales
Google Scholar
Nguyen MH, Abbass HA, Mckay RI (2006) A novel mixture of experts model based on cooperative coevolution. Neurocomputing 70(1–3): 155–163. doi:10.1016/j.neucom.2006.04.009
Article Google Scholar
Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3): 21–45
Article Google Scholar
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2): 1–39. doi:10.1007/s10462-009-9124-7
Article Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227
Google Scholar
Sharkey AJC, Sharkey NE (1997) Combining diverse neural nets. Knowl Eng Rev 12(03): 231–247
Article Google Scholar
Tang B, Heywood MI, Shepherd M (2002) Input partitioning to mixture of experts. In: IEEE, pp 227–232
Tran TP, Nguyen TTS, Tsai P, Kong X (2011) BSPNN: boosted subspace probabilistic neural network for email security. Artif Intell Rev 35(4): 1–14
Article Google Scholar
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3): 385–404
Article Google Scholar
Ubeyli ED (2009) Modified mixture of experts employing eigenvector methods and Lyapunov exponents for analysis of electroencephalogram signals. Expert Syst 26(4): 339–354. doi:10.1111/j.1468-0394.2009.00490.x
Article Google Scholar
Ubeyli ED, Ilbay K, Ilbay G, Sahin D, Akansel G (2010) Differentiation of two subtypes of adult hydrocephalus by mixture of experts. J Med Syst 34(3): 281–290. doi:10.1007/s10916-008-9239-4
Article Google Scholar
Viardot G, Lengelle R, Richard C (2002) Mixture of experts for automated detection of phasic arousals in sleep signals. In: IEEE International Conference on Systems, Man and Cybernetics, pp 551–555
Waterhouse S, Cook G (1997) Ensemble methods for phoneme classification. Adv Neural Inf Processing Syst 9: 800–806
Google Scholar
Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixtures of experts. Citeseer
Waterhouse SR (1997) Classification and regression using mixtures of experts. Unpublished doctoral dissertation, Cambridge University
Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal 19(4): 405–410
Article Google Scholar
Xing HJ, Hua BG (2008) An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification. Neurocomputing 71(4–6): 1008–1021. doi:10.1016/j.neucom.2007.02.010
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Statistics and Computer Science, University of Tehran, Tehran, Iran
Saeed Masoudnia
Department of Electrical and Computer Engineering, Brain and Intelligent Systems Research Laboratory, Shahid Rajaee Teacher Training University, P.O. Box:16785-163, Tehran, Iran
Reza Ebrahimpour

Authors

Saeed Masoudnia
View author publications
You can also search for this author in PubMed Google Scholar
Reza Ebrahimpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Ebrahimpour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masoudnia, S., Ebrahimpour, R. Mixture of experts: a literature survey. Artif Intell Rev 42, 275–293 (2014). https://doi.org/10.1007/s10462-012-9338-y

Download citation

Published: 12 May 2012
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10462-012-9338-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture of experts: a literature survey

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mixture of experts: a literature survey

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation