Skip to main content
Log in

Mixture of experts: a literature survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Mixture of experts (ME) is one of the most popular and interesting combining methods, which has great potential to improve performance in machine learning. ME is established based on the divide-and-conquer principle in which the problem space is divided between a few neural network experts, supervised by a gating network. In earlier works on ME, different strategies were developed to divide the problem space between the experts. To survey and analyse these methods more clearly, we present a categorisation of the ME literature based on this difference. Various ME implementations were classified into two groups, according to the partitioning strategies used and both how and when the gating network is involved in the partitioning and combining procedures. In the first group, The conventional ME and the extensions of this method stochastically partition the problem space into a number of subspaces using a special employed error function, and experts become specialised in each subspace. In the second group, the problem space is explicitly partitioned by the clustering method before the experts’ training process starts, and each expert is then assigned to one of these sub-spaces. Based on the implicit problem space partitioning using a tacit competitive process between the experts, we call the first group the mixture of implicitly localised experts (MILE), and the second group is called mixture of explicitly localised experts (MELE), as it uses pre-specified clusters. The properties of both groups are investigated in comparison with each other. Investigation of MILE versus MELE, discussing the advantages and disadvantages of each group, showed that the two approaches have complementary features. Moreover, the features of the ME method are compared with other popular combining methods, including boosting and negative correlation learning methods. As the investigated methods have complementary strengths and limitations, previous researches that attempted to combine their features in integrated approaches are reviewed and, moreover, some suggestions are proposed for future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alpaydin E, Jordan MI (1996) Local linear perceptrons for classification. IEEE Trans Neural Netw 7(3): 788–792

    Article  Google Scholar 

  • Avnimelech R, Intrator N (1999) Boosted mixture of experts: an ensemble learning scheme. Neural Comput 11(2): 483–497

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140

    MATH  MathSciNet  Google Scholar 

  • Chen K, Xu L, Chi H (1999) Improved learning algorithms for mixture of experts in multiclass classification. Neural Netw 12(9): 1229–1252

    Article  Google Scholar 

  • Dailey MN, Cottrell GW (1999) Organization of face and object recognition in modular neural network models. Neural Netw 12(7–8): 1053–1074

    Article  Google Scholar 

  • Dietterich T (2000) Ensemble methods in machine learning. Multiple classifier systems, Cagliari, Italy. Springer, LNCS, pp 1–15

  • Ebrahimpour R (2007) View-independent face recognition with mixture of experts. PhD thesis, Institute for studies in Theoretical Physics and Mathematics (IPM)

  • Ebrahimpour R, Kabir E, Yousefi MR (2007) Face detection using mixture of MLP experts. Neural Process Lett 26(1): 69–82. doi:10.1007/s11063-007-9043-z

    Article  Google Scholar 

  • Ebrahimpour R, Kabir E, Esteky H, Yousefi MR (2008a) A mixture of multilayer perceptron experts network for modeling face/nonface recognition in cortical face processing regions. Intell Autom Soft Comput 14(2): 151–162

    Article  Google Scholar 

  • Ebrahimpour R, Kabir E, Yousefi MR (2008b) Teacher-directed learning in view-independent face recognition with mixture of experts using overlapping eigenspaces. Comput Vis Image Underst 111(2): 195–206. doi:10.1016/j.cviu.2007.10.003

    Article  Google Scholar 

  • Ebrahimpour R, Kabir E, Yousefi MR (2008c) Teacher-directed learning in view-independent face recognition with mixture of experts using single-view eigenspaces. J Franklin Inst 345(2): 87–101. doi:10.1016/j.jfranklin.2007.06.004

    Article  MATH  Google Scholar 

  • Ebrahimpour R, Kabire E, Esteky H, Yousefi MR (2008d) View-independent face recognition with mixture of experts. Neurocomputing 71(4–6): 1103–1107. doi:10.1016/j.neucom.2007.08.021

    Article  Google Scholar 

  • Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2010) Mixture of MLP-experts for trend forecasting of time series: a case study of the Tehran stock exchange. Int J Forecast

  • Ebrahimpour R, Arani SAAA, Masoudnia S (2011a) Improving combination method of NCL experts using gating network. Neural Comput Appl 1–7. doi:10.1007/s00521-011-0746-8.

  • Ebrahimpour R, Kabir E, Yousefi MR (2011b) Improving mixture of experts for view-independent face recognition using teacher-directed learning. Mach Vis Appl 22(2): 421–432

    Article  Google Scholar 

  • Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2011) Mixture of MLP-experts for trend forecasting of time series: a case study of the Tehran stock exchange. Int J Forecast 27(3): 804–816. doi:10.1016/j.ijforecast.2010.02.015

    Article  Google Scholar 

  • Ebrahimpour R, Sadeghnejad N, Arani SAAA, Mohammadi N (2012) Boost-wise pre-loaded mixture of experts for classification tasks. Neural Comput Appl:1-13. doi:10.1007/s00521-012-0909-2

  • Goodband JH, Haas OCL, Mills JA (2006) A mixture of experts committee machine to design compensators for intensity modulated radiation therapy. Pattern Recogn 39(9): 1704–1714. doi:10.1016/j.patcog.2006.03.018

    Article  MATH  Google Scholar 

  • Guler I, Ubeyli ED (2005) A mixture of experts network structure for modelling Doppler ultrasound blood flow signals. Comput Biol Med 35(7): 565–582. doi:10.1016/j.compbiomed.2004.04.001

    Article  Google Scholar 

  • Gutta S, Huang JRJ, Jonathon P, Wechsler H (2000) Mixture of experts for classification of gender, ethnic origin, and pose of human faces. IEEE Trans Neural Netw 11(4): 948–960

    Article  Google Scholar 

  • Hansen JV (1999) Combining predictors: comparison of five meta machine learning methods. Inform Sci 119(1–2): 91–105

    Article  Google Scholar 

  • Hansen JV (2000) Combining predictors: meta machine learning methods and bias/variance and ambiguity decompositions. Computer Science Dept., Aarhus Univ, Aarhus

    Google Scholar 

  • Hong X, Harris CJ (2001) A mixture of experts network structure construction algorithm for modelling and control. Appl Intell 16(1): 59–69

    Article  Google Scholar 

  • Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4): 820–834

    Article  Google Scholar 

  • Islam MM, Yao X, Nirjon SMS, Islam MA, Murase K (2008) Bagging and boosting negatively correlated neural networks. IEEE Trans Syst Man Cybern B 38(3): 771–784. doi:10.1109/Tsmcb.2008.922055

    Article  Google Scholar 

  • Jacobs RA (1997) Bias/variance analyses of mixtures-of-experts architectures. Neural Comput 9(2): 369–383

    Article  MATH  Google Scholar 

  • Jacobs RA, Jordan MI, Barto AG (1991) Task decomposition through competition in a modular connectionist architecture—the what and where vision tasks. Cogn Sci 15(2): 219–250

    Article  Google Scholar 

  • Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1): 79–87

    Article  Google Scholar 

  • Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the Em algorithm. Neural Comput 6(2): 181–214

    Article  Google Scholar 

  • Kamimura R (2004) Teacher-directed learning with Gaussian and sigmoid activation functions. In: Springer, New York, pp 530–536

  • Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. The MIT press, Cambridge

    Google Scholar 

  • Kim SP, Sanchez JC, Erdogmus D, Rao YN, Wessberg J, Principe JC, Nicolelis M (2003) Divide- and-conquer approach for brain machine interfaces: nonlinear mixture of competitive linear models. Neural Netw 16(5–6): 865–871. doi:10.1016/S0893-6080(03)00108-4

    Article  Google Scholar 

  • Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal 20(3): 226–239

    Article  Google Scholar 

  • Kotsiantis S (2011a) Combining bagging, boosting, rotation forest and random subspace methods. Artif Intell Rev 35(3): 1–18. doi:10.1007/s10462-010-9192-8

    Article  Google Scholar 

  • Kotsiantis S (2011b) An incremental ensemble of classifiers. Artif Intell Rev 36(4): 1–18. doi:10.1007/s10462-011-9211-4

    Article  Google Scholar 

  • Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3): 159–190

    Article  Google Scholar 

  • Kuncheva LI (2002) Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst Man Cybern B 32(2): 146–156. doi:10.1109/3477.990871

    Article  Google Scholar 

  • Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York

    Book  Google Scholar 

  • Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10): 1399–1404

    Article  Google Scholar 

  • Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybern B 29(6): 716–725

    Article  Google Scholar 

  • Lorena AC, de Carvalho AC, Gama JMP (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1): 19–37

    Article  Google Scholar 

  • Masoudnia S, Ebrahimpour R, Arani SAAA (2012) Incorporation of a regularization term to control negative correlation in mixture of experts. Neural Process Lett:1-17

  • Mangiameli P, West D (1999) An improved neural classification network for the two-group problem. Comput Oper Res 26(5): 443–460

    Article  MATH  Google Scholar 

  • Nguyen MH (2006) Cooperative coevolutionary mixture of experts: a neuro ensemble approach for automatic decomposition of classification problems. University of New South Wales, New South Wales

    Google Scholar 

  • Nguyen MH, Abbass HA, Mckay RI (2006) A novel mixture of experts model based on cooperative coevolution. Neurocomputing 70(1–3): 155–163. doi:10.1016/j.neucom.2006.04.009

    Article  Google Scholar 

  • Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3): 21–45

    Article  Google Scholar 

  • Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2): 1–39. doi:10.1007/s10462-009-9124-7

    Article  Google Scholar 

  • Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227

    Google Scholar 

  • Sharkey AJC, Sharkey NE (1997) Combining diverse neural nets. Knowl Eng Rev 12(03): 231–247

    Article  Google Scholar 

  • Tang B, Heywood MI, Shepherd M (2002) Input partitioning to mixture of experts. In: IEEE, pp 227–232

  • Tran TP, Nguyen TTS, Tsai P, Kong X (2011) BSPNN: boosted subspace probabilistic neural network for email security. Artif Intell Rev 35(4): 1–14

    Article  Google Scholar 

  • Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3): 385–404

    Article  Google Scholar 

  • Ubeyli ED (2009) Modified mixture of experts employing eigenvector methods and Lyapunov exponents for analysis of electroencephalogram signals. Expert Syst 26(4): 339–354. doi:10.1111/j.1468-0394.2009.00490.x

    Article  Google Scholar 

  • Ubeyli ED, Ilbay K, Ilbay G, Sahin D, Akansel G (2010) Differentiation of two subtypes of adult hydrocephalus by mixture of experts. J Med Syst 34(3): 281–290. doi:10.1007/s10916-008-9239-4

    Article  Google Scholar 

  • Viardot G, Lengelle R, Richard C (2002) Mixture of experts for automated detection of phasic arousals in sleep signals. In: IEEE International Conference on Systems, Man and Cybernetics, pp 551–555

  • Waterhouse S, Cook G (1997) Ensemble methods for phoneme classification. Adv Neural Inf Processing Syst 9: 800–806

    Google Scholar 

  • Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixtures of experts. Citeseer

  • Waterhouse SR (1997) Classification and regression using mixtures of experts. Unpublished doctoral dissertation, Cambridge University

  • Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal 19(4): 405–410

    Article  Google Scholar 

  • Xing HJ, Hua BG (2008) An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification. Neurocomputing 71(4–6): 1008–1021. doi:10.1016/j.neucom.2007.02.010

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Ebrahimpour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masoudnia, S., Ebrahimpour, R. Mixture of experts: a literature survey. Artif Intell Rev 42, 275–293 (2014). https://doi.org/10.1007/s10462-012-9338-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9338-y

Keywords

Navigation