Abstract
I have researched in the field of discriminant analysis for over 40 years and for nearly as long in the field of cluster analysis. Thus, I think it is fair to say that I have had an enduring interest in discriminant and cluster analyses, that is, in classification both supervised and unsupervised. The latter terminology is used outside of statistics in fields such as artificial intelligence, machine learning, and pattern recognition. However, the gap between these fields and statistics has narrowed appreciably over the years, and discriminant analysis and cluster analysis are also often referred in statistics as supervised classification and unsupervised classification, respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
M. Aitkin, D. Anderson, J. Hinde, Statistical modelling of data on teaching styles (with discussion). J. R. Stat. Soc. B 144, 419–461 (1981)
C. Ambroise, G.J. McLachlan, Selection bias in gene extraction on basis of microarray gene expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566 (2002)
J. Baek, G.J. McLachlan, L. Flack, Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)
J.D. Banfield, A.E. Raftery, Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
M.M. Barnard, The secular variations of skull characters in four series of Egyptian skulls. Ann. Eugen. 6, 352–371 (1935)
Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (2005)
P.J. Bickel, E. Levina, Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)
C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995)
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2007)
L. Breiman, Statistical modeling: the two cultures (with discussion). Stat. Sci. 16(2001), 199–231 (2001)
E. Candes, T. Tao, The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Stat. 35, 2313–2404 (2007)
Y.B. Chan, P. Hall, Using evidence of mixed populations to select variables for clustering very high-dimensional data. J. Am. Stat. Assoc. 105, 798–809 (2010)
B. Cheng, D.M. Titterington, Neural networks: a review from a statistical perspective (with discussion). Stat. Sci. 9, 2–54 (1994)
V. Cherkassky, J.H. Friedman, H. Wechsler (eds.), From Statistics to Neural Networks: Theory and Pattern Recognition Applications (Springer-Verlag, Berlin, 1994)
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)
D. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality, in Aide-Memoire of the Lecture in AMS Conference “Math Challenges of 21st Century”, 2000
D. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts? in Advances in Neural Information Processing Systems, ed. by S. Thrun, L. Saul, B. Schölkopf, vol. 16 (MIT, Cambridge, MA, 2004)
D. Donoho, J. Jin, Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105, 14790–14795 (2008)
B. Efron, Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)
B. Efron, Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78, 316–331 (1983)
B. Efron, Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99, 96–104 (2004)
B. Efron, Large-Scale Inference (Cambridge University Press, Cambridge, MA, 2010)
B. Efron, A life in statistics – Bradley Efron. Significance 7, 178–181 (2010)
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression (with discussion). Ann. Stat. 32, 409–499 (2004)
B. Efron, R. Tibshirani, Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 36, 70–86 (2002)
J. Fan, Y. Fan, High dimensional classification using features annealed independence rules. Ann. Stat. 70, 2605–2637 (2008)
J. Fan, J. Lv, Sure independence screening for ultra-high dimensional feature space (with discussion). J. R. Stat. Soc. B 70, 849–911 (2008)
J. Fan, J. Lv, A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20, 101–148 (2010)
J. Fan, R. Samworth, Y. Wu, Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 2013–2038 (2009)
R.A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
C. Fraley, A.E. Raftery, MCLUST: software for model-based cluster analysis. J. Classif. 16, 297–306 (1999)
J.H. Friedman, Regularized discriminant analysis. J. Am. Stat. Assoc. 84, 165–175 (1989)
S. Ganesalingam, G.J. McLachlan, The efficiency of a linear discriminant function based on unclassified initial samples. Biometrika 65, 658–662 (1978)
D. Geman, C. d’Avignon, D.Q. Naiman, R.L. Winslow, Classifying gene expression profiles from pairwise mRNA comparison. Stat. Appl. Genet. Mol. Biol. 3(1), Article 19 (2004)
S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma. Neural. Comput. 4, 1–58 (1992)
G. Golub, C. van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, MD, 1983)
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Y. Guo, T. Hastie, R. Tibshirani, Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)
P. Hall, J.S. Marron, A. Neeman, Geometric representation of high-dimension, low-sample size data. J. R. Stat. Soc. B 67, 427–444 (2005)
P. Hall, Y. Pittelkow, M. Ghosh, Theoretic measures of relative performance of classifiers for high-dimensional data with small sample sizes. J. R. Stat. Soc. B 70, 158–173 (2008)
P. Hall, D.M. Titterington, J.-H. Xue, Tilting methods for assessing the influence of components in a classifier. J. R. Stat. Soc. B 71, 783–803 (2009)
D.J. Hand, H. Mannila, P. Smyth, Principles of Data Mining (MIT, Cambridge, MA, 2001)
D.J. Hand, K. Yu, Idiot’s Bayes – not so stupid after all? Int. Stat. Rev. 69, 385–399 (2001)
T. Hastie, R. Tibshirani, J.H. Friedman (1st edn.) (2001) Elements of Statistical Learning, 2nd edn. (Springer, New York, 2009)
M. Hills, Allocation rules and their error rates (with discussion). J. R. Stat. Soc. B 28, 1–31 (1966)
G.E. Hinton, P. Dayan, M. Revow, Modeling the manifolds of images of handwritten digits. IEEE Trans. Neural Netw. 8, 65–73 (1997)
I.M. Johnstone, A.U. Lu, On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)
A. Khalili, J. Chen, S. Lin, Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space. Biostatistics 12, 156–172 (2011)
Y. Koren, The BellKor solution to the Netflix Grand Prize, 2009. http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
A.V. Kossenkov, M.F. Ochs, Matrix factorization for recovery of biological processes from microarray data, in Methods in Enzymology, ed. by M.L. Johnson, L. Brand, vol. 467 (Academic, New York, 2009), pp. 59–77
P.A. Lachenbruch, Estimation of error rates in discriminant analysis, Unpublished Ph.D. thesis, University of Los Angeles, 1965
P.A. Lachenbruch, M.R. Mickey, Estimation of error rates in discriminant analysis. Technometrics 10, 1–11 (1968)
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
B.G. Lindsay, Mixture Models: Theory, Geometry and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5 (Institute of Mathematical Statistics and the American Statistical Association, Alexandria, VA, 1995)
R.J.A. Little, Contribution to the discussion of the paper by A.P. Dempster, N.M. Laird, D.B. Rubin. J. R. Stat. Soc. B 39, 25 (1975)
F.H.C. Marriott, The Interpretation of Multiple Observations (Academic, London, 1974)
C. Maugis, G. Celeux, M.-L. Martin-Magniette, Variable selection for clustering with Gaussian mixture models. Biometrics 65, 701–709 (2009)
G.J. McLachlan, Asymptotic results for discriminant analysis when the initial samples are misclassified. Technometrics 14, 415–422 (1972)
G.J. McLachlan, The Errors of Allocation and their Estimators in the Two-Population Discrimination Problem, Abstract of unpublished Ph.D. thesis, University of Queensland. Bull. Aust. Math. Soc. 9, 149–150 (1973)
G.J. McLachlan, Estimation of the errors of misclassification on the criterion of asymptotic mean square error. Technometrics 16, 255–260 (1974)
G.J. McLachlan, The relationship in terms of asymptotic mean square error between the separate problems of estimating each of the three types of error rate of the linear discriminant function. Technometrics 16, 569–575 (1974)
G.J. McLachlan, An asymptotic unbiased technique for estimating the error rates in discriminant analysis. Biometrics 30, 239–249 (1974)
G.J. McLachlan, Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Am. Stat. Assoc. 70, 365–369 (1975)
G.J. McLachlan, The bias of the apparent error rate in discriminant analysis. Biometrika 63, 239–244 (1976)
G.J. McLachlan, Estimating the linear discriminant function from initial samples containing a small number of unclassified observations. J. Am. Stat. Assoc. 72, 403–406 (1977)
G.J. McLachlan, A note on the choice of a weighting function to give an efficient method for estimating the probability of misclassification. Pattern Recogn. 8, 147–149 (1977)
G.J. McLachlan, The classification and mixture maximum likelihood approaches to cluster analysis, in Handbook of Statistics, ed. by P.R. Krishnaiah, L. Kanal, vol. 2 (North-Holland, Amsterdam, 1982), pp. 199–208
G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition (Wiley, New York, 1992)
G.J. McLachlan, J. Baek, S.I. Rathnayake, Mixtures of factor analyzers for the analysis of high-dimensional data, in Mixture Estimation and Applications, ed. by K. Mengersen, C. Robert, D.M. Titterington (Wiley, Hoboken, NJ, 2011)
G.J. McLachlan, K.E. Basford, Mixture Models: Inference and Applications to Clustering (Dekker, New York, 1988)
G.J. McLachlan, R.W. Bean, L. Ben-Tovim Jones, A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22, 1608–1615 (2006)
G.J. McLachlan, R.W. Bean, L. Ben-Tovim Jones, Extension of the mixture of factor analyzers model to incorporate the multivariate t distribution. Comput. Stat. Data Anal. 51, 5327–5338 (2007)
G.J. McLachlan, R.W. Bean, D. Peel, A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18, 413–422 (2002)
G.J. McLachlan, J. Chevelu, J. Zhu, Correcting for selection bias via cross-validation in the classification of microarray data, in Beyond Parametrics in Interdisciplinary Research: A Festschrift to P.K. Sen, ed. by N. Balakrishnan, E. Pena, M.J. Silvapulle (IMS Lecture Notes-Monograph Series, Hayward, CA, 2008), pp. 383–395
G.J. McLachlan, K.-A. Do, C. Ambroise, Analyzing Microarray Gene Expression Data (Wiley, Hoboken, NJ, 2004)
G.J. McLachlan, T. Krishnan (1997) The EM Algorithm and Extensions, 2nd edn. (Wiley, New York, 2008)
G.J. McLachlan, S.K. Ng, The EM algorithm, in The Top-Ten Algorithms in Data Mining, ed. by X. Wu, V. Kumar (Chapman & Hall, Boca Raton, FL, 2009), pp. 93–115
G.J. McLachlan, D. Peel, Robust cluster analysis via mixtures of multivariate t-distributions. Lect. Notes Comput. Sci. 1451, 658–666 (1998)
G.J. McLachlan, D. Peel, Finite Mixture Models (Wiley, New York, 2000)
G.J. McLachlan, D. Peel, Mixtures of factor analyzers, in Proceedings of the Seventeenth International Conference on Machine Learning, ed. by P. Langley (Morgan Kaufmann, San Francisco, CA, 2000), pp. 599–606
G.J. McLachlan, D. Peel, K.E. Basford, P. Adams, The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Software 4(2), 1–14 (1999)
G.J. McLachlan, D. Peel, R.W. Bean, Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)
T.M. Mitchell, Mining our reality. Science 326, 1644–1645 (2010)
V. Nikulin, G.J. McLachlan, On a general method for matrix factorisation applied to supervised classification, in Proceedings of 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop, Washington, DC, ed. by J. Chen et al. (IEEE Computer Society, Los Alamitos, CA, 2009), pp. 43–48
V. Nikulin, T.-H. Huang, S.K. Ng, S.I. Rathnayake, G.J. McLachlan, A very fast algorithm for matrix factorization. Stat. Probab. Lett. 81, 773–782 (2010)
V. Nikulin, G.J. McLachlan, Penalized principal component analysis of microarray data, in Lecture Notes in Bioinformatics, ed. by F. Masulli, L. Peterson, R. Tagliaferri, vol. 6160 (Springer, Berlin, 2010), pp. 82–96
S.K. Ng, G.J. McLachlan, Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification. IEEE Trans. Neural Netw. 15, 738–749 (2004)
S.K. Ng, G.J. McLachlan, K. Wang, L. Ben-Tovim, S.W. Ng, A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22, 1745–1752 (2006)
T.J. O’Neill, Efficiency Calculations in Discriminant Analysis, Unpublished Ph.D. thesis, Stanford University, Stanford, CA, 1976
T.J. O’Neill, Normal discrimination with unclassified observations. J. Am. Stat. Assoc. 73, 821–826 (1978)
D. Peel, G.J. McLachlan, Robust mixture modelling using the t distribution. Stat. Comput. 10, 335–344 (2000)
S. Pyne, X. Hu, K. Wang, E. Rossin, T.-I. Lin, L.M. Maier, C. Baecher-Allan, G.J. McLachlan, P. Tamayo, D.A. Hafler, P.L. De Jager, J.P. Mesirov, Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
A.E. Raftery, N. Dean, Variable selection for model-based clustering. J. Am. Stat. Assoc. 101, 168–178 (2006)
B.D. Ripley, Neural networks and related methods for classification (with discussion). J. R. Stat. Soc. B 56, 409–456 (1994)
B.D. Ripley, Pattern Recognition and Neural Networks (Cambridge University Press, Cambridge, 1996)
R. Tibshirani, Regression shrinkage and selection via lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
R. Tibshirani, T. Hastie, B. Narasimhan, G. Chu, Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18, 104–117 (2003)
D.M. Titterington, A.F.M. Smith, U.E. Makov, Statistical Analysis of Finite Mixture Distributions (Wiley, New York, 1985)
G.T. Toussaint, P.M. Sharpe, An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. Comput Biol Med 4, 269–278 (1975)
M.J. van der Laan, S. Rose, Statistics ready for a revolution: next generation of statisticians must build tools for massive data sets. Amstat News, September Issue, 2010
D.M. Witten, R. Tibshirani, T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)
M. Wojnarski, A. Janusz, H.S. Nyugen, J. Bazan, C.J. Luo, Z. Chen, F. Hu, G. Wang, L. Guan, H. Luo, J. Gao, Y. Shen, V. Nikulin, T.-H. Huang, G.J. McLachlan, M. Bosnjak, D. Gamberger, RSCTC 2010 discovery challenge: mining DNA microarray data for medical diagnosis and treatment, in Lecture Notes in Artificial Intelligence 6086 (Proceedings of RSCT 2010), ed. by M. Szczuka et al. (Springer, Berlin, 2010), pp. 4–19
X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, S.K. Ng, B. Liu, P.S. Yu, Z.-H. Zhou, M. Steinbach, D.J. Hand, D. Steinberg, Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)
X. Zhu, C. Ambroise, G.J. McLachlan, Selection bias in working with the top genes in supervised classification of tissue samples. Stat. Methodol. 3, 29–41 (2006)
J.X. Zhu, G.J. McLachlan, L. Ben-Tovim, I. Wood, On selection biases with prediction rules formed from gene expression data. J. Stat. Plan. Inference 38, 374–386 (2008)
Acknowledgements
The work for this chapter was supported by a grant from the Australian Research Council.
I would like to thank my collaborators on various aspects of my research relevant to data mining over the years, including Peter Adams, Christophe Ambroise, Jangsun Baek, Kaye Basford, Richard Bean, Liat Ben-Tovim Jones, Karen Byth, Igor Cadez, Kim-Anh Le Cao, Soong Chang, Jonathan Chevelu, Kim-Anh Do, Lloyd Flack, S. Ganesalingam, Doug Hawkins, Tian-Hsiang Huang, Peter Jones, Murray Jorgensen, Nazim Khan, Thriyambakam Krishnan, Charles Lawoko, Andy Lee, Jess Mar, Camille Maumet, Christine McLaren, Emmanuelle Meugnier, Katrina Monico, Angus Ng, Vladimir Nikulin, David Peel, Saumyadipta Pyne, Barry Quinn, Suren Rathnayake, Mohamed Shoukri, Padhraic Smyth, Erick Suarez, Deming Wang, Kui (Sam) Wang, Bill Whiten, Leesa Wockner, Ian Wood, Kelvin Yau, and Justin Zhu.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
McLachlan, G.J. (2012). An Enduring Interest in Classification: Supervised and Unsupervised. In: Gaber, M. (eds) Journeys to Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28047-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-28047-4_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28046-7
Online ISBN: 978-3-642-28047-4
eBook Packages: Computer ScienceComputer Science (R0)