An Enduring Interest in Classification: Supervised and Unsupervised

McLachlan, G. J.

doi:10.1007/978-3-642-28047-4_12

An Enduring Interest in Classification: Supervised and Unsupervised

G. J. McLachlan²

Chapter
First Online: 01 January 2012

1714 Accesses
1 Altmetric

Abstract

I have researched in the field of discriminant analysis for over 40 years and for nearly as long in the field of cluster analysis. Thus, I think it is fair to say that I have had an enduring interest in discriminant and cluster analyses, that is, in classification both supervised and unsupervised. The latter terminology is used outside of statistics in fields such as artificial intelligence, machine learning, and pattern recognition. However, the gap between these fields and statistics has narrowed appreciably over the years, and discriminant analysis and cluster analysis are also often referred in statistics as supervised classification and unsupervised classification, respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

M. Aitkin, D. Anderson, J. Hinde, Statistical modelling of data on teaching styles (with discussion). J. R. Stat. Soc. B 144, 419–461 (1981)
Google Scholar
C. Ambroise, G.J. McLachlan, Selection bias in gene extraction on basis of microarray gene expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566 (2002)
Article MATH Google Scholar
J. Baek, G.J. McLachlan, L. Flack, Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)
Article Google Scholar
J.D. Banfield, A.E. Raftery, Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MathSciNet MATH Google Scholar
M.M. Barnard, The secular variations of skull characters in four series of Egyptian skulls. Ann. Eugen. 6, 352–371 (1935)
Google Scholar
Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (2005)
MathSciNet Google Scholar
P.J. Bickel, E. Levina, Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)
Article MathSciNet MATH Google Scholar
C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995)
Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2007)
Google Scholar
L. Breiman, Statistical modeling: the two cultures (with discussion). Stat. Sci. 16(2001), 199–231 (2001)
Article MathSciNet MATH Google Scholar
E. Candes, T. Tao, The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Stat. 35, 2313–2404 (2007)
Article MathSciNet MATH Google Scholar
Y.B. Chan, P. Hall, Using evidence of mixed populations to select variables for clustering very high-dimensional data. J. Am. Stat. Assoc. 105, 798–809 (2010)
Article MathSciNet Google Scholar
B. Cheng, D.M. Titterington, Neural networks: a review from a statistical perspective (with discussion). Stat. Sci. 9, 2–54 (1994)
Article MathSciNet MATH Google Scholar
V. Cherkassky, J.H. Friedman, H. Wechsler (eds.), From Statistics to Neural Networks: Theory and Pattern Recognition Applications (Springer-Verlag, Berlin, 1994)
Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
D. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality, in Aide-Memoire of the Lecture in AMS Conference “Math Challenges of 21st Century”, 2000
Google Scholar
D. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts? in Advances in Neural Information Processing Systems, ed. by S. Thrun, L. Saul, B. Schölkopf, vol. 16 (MIT, Cambridge, MA, 2004)
Google Scholar
D. Donoho, J. Jin, Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105, 14790–14795 (2008)
Article Google Scholar
B. Efron, Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)
Article MathSciNet MATH Google Scholar
B. Efron, Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78, 316–331 (1983)
Article MathSciNet MATH Google Scholar
B. Efron, Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99, 96–104 (2004)
Article MathSciNet MATH Google Scholar
B. Efron, Large-Scale Inference (Cambridge University Press, Cambridge, MA, 2010)
MATH Google Scholar
B. Efron, A life in statistics – Bradley Efron. Significance 7, 178–181 (2010)
Article Google Scholar
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression (with discussion). Ann. Stat. 32, 409–499 (2004)
MathSciNet Google Scholar
B. Efron, R. Tibshirani, Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 36, 70–86 (2002)
Article Google Scholar
J. Fan, Y. Fan, High dimensional classification using features annealed independence rules. Ann. Stat. 70, 2605–2637 (2008)
Article Google Scholar
J. Fan, J. Lv, Sure independence screening for ultra-high dimensional feature space (with discussion). J. R. Stat. Soc. B 70, 849–911 (2008)
Article MathSciNet Google Scholar
J. Fan, J. Lv, A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20, 101–148 (2010)
MathSciNet MATH Google Scholar
J. Fan, R. Samworth, Y. Wu, Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 2013–2038 (2009)
MathSciNet MATH Google Scholar
R.A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Google Scholar
C. Fraley, A.E. Raftery, MCLUST: software for model-based cluster analysis. J. Classif. 16, 297–306 (1999)
Article MATH Google Scholar
J.H. Friedman, Regularized discriminant analysis. J. Am. Stat. Assoc. 84, 165–175 (1989)
Article Google Scholar
S. Ganesalingam, G.J. McLachlan, The efficiency of a linear discriminant function based on unclassified initial samples. Biometrika 65, 658–662 (1978)
Article MathSciNet MATH Google Scholar
D. Geman, C. d’Avignon, D.Q. Naiman, R.L. Winslow, Classifying gene expression profiles from pairwise mRNA comparison. Stat. Appl. Genet. Mol. Biol. 3(1), Article 19 (2004)
Google Scholar
S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma. Neural. Comput. 4, 1–58 (1992)
Article Google Scholar
G. Golub, C. van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, MD, 1983)
MATH Google Scholar
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article MATH Google Scholar
Y. Guo, T. Hastie, R. Tibshirani, Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)
Article MATH Google Scholar
P. Hall, J.S. Marron, A. Neeman, Geometric representation of high-dimension, low-sample size data. J. R. Stat. Soc. B 67, 427–444 (2005)
Article MathSciNet MATH Google Scholar
P. Hall, Y. Pittelkow, M. Ghosh, Theoretic measures of relative performance of classifiers for high-dimensional data with small sample sizes. J. R. Stat. Soc. B 70, 158–173 (2008)
MathSciNet Google Scholar
P. Hall, D.M. Titterington, J.-H. Xue, Tilting methods for assessing the influence of components in a classifier. J. R. Stat. Soc. B 71, 783–803 (2009)
Article MathSciNet Google Scholar
D.J. Hand, H. Mannila, P. Smyth, Principles of Data Mining (MIT, Cambridge, MA, 2001)
Google Scholar
D.J. Hand, K. Yu, Idiot’s Bayes – not so stupid after all? Int. Stat. Rev. 69, 385–399 (2001)
Article MATH Google Scholar
T. Hastie, R. Tibshirani, J.H. Friedman (1st edn.) (2001) Elements of Statistical Learning, 2nd edn. (Springer, New York, 2009)
Google Scholar
M. Hills, Allocation rules and their error rates (with discussion). J. R. Stat. Soc. B 28, 1–31 (1966)
MathSciNet MATH Google Scholar
G.E. Hinton, P. Dayan, M. Revow, Modeling the manifolds of images of handwritten digits. IEEE Trans. Neural Netw. 8, 65–73 (1997)
Article Google Scholar
I.M. Johnstone, A.U. Lu, On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)
Article MathSciNet Google Scholar
A. Khalili, J. Chen, S. Lin, Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space. Biostatistics 12, 156–172 (2011)
Article Google Scholar
Y. Koren, The BellKor solution to the Netflix Grand Prize, 2009. http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
A.V. Kossenkov, M.F. Ochs, Matrix factorization for recovery of biological processes from microarray data, in Methods in Enzymology, ed. by M.L. Johnson, L. Brand, vol. 467 (Academic, New York, 2009), pp. 59–77
Google Scholar
P.A. Lachenbruch, Estimation of error rates in discriminant analysis, Unpublished Ph.D. thesis, University of Los Angeles, 1965
Google Scholar
P.A. Lachenbruch, M.R. Mickey, Estimation of error rates in discriminant analysis. Technometrics 10, 1–11 (1968)
Article MathSciNet Google Scholar
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
B.G. Lindsay, Mixture Models: Theory, Geometry and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5 (Institute of Mathematical Statistics and the American Statistical Association, Alexandria, VA, 1995)
Google Scholar
R.J.A. Little, Contribution to the discussion of the paper by A.P. Dempster, N.M. Laird, D.B. Rubin. J. R. Stat. Soc. B 39, 25 (1975)
Google Scholar
F.H.C. Marriott, The Interpretation of Multiple Observations (Academic, London, 1974)
Google Scholar
C. Maugis, G. Celeux, M.-L. Martin-Magniette, Variable selection for clustering with Gaussian mixture models. Biometrics 65, 701–709 (2009)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, Asymptotic results for discriminant analysis when the initial samples are misclassified. Technometrics 14, 415–422 (1972)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, The Errors of Allocation and their Estimators in the Two-Population Discrimination Problem, Abstract of unpublished Ph.D. thesis, University of Queensland. Bull. Aust. Math. Soc. 9, 149–150 (1973)
Google Scholar
G.J. McLachlan, Estimation of the errors of misclassification on the criterion of asymptotic mean square error. Technometrics 16, 255–260 (1974)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, The relationship in terms of asymptotic mean square error between the separate problems of estimating each of the three types of error rate of the linear discriminant function. Technometrics 16, 569–575 (1974)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, An asymptotic unbiased technique for estimating the error rates in discriminant analysis. Biometrics 30, 239–249 (1974)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Am. Stat. Assoc. 70, 365–369 (1975)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, The bias of the apparent error rate in discriminant analysis. Biometrika 63, 239–244 (1976)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, Estimating the linear discriminant function from initial samples containing a small number of unclassified observations. J. Am. Stat. Assoc. 72, 403–406 (1977)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, A note on the choice of a weighting function to give an efficient method for estimating the probability of misclassification. Pattern Recogn. 8, 147–149 (1977)
Article Google Scholar
G.J. McLachlan, The classification and mixture maximum likelihood approaches to cluster analysis, in Handbook of Statistics, ed. by P.R. Krishnaiah, L. Kanal, vol. 2 (North-Holland, Amsterdam, 1982), pp. 199–208
Google Scholar
G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition (Wiley, New York, 1992)
Book Google Scholar
G.J. McLachlan, J. Baek, S.I. Rathnayake, Mixtures of factor analyzers for the analysis of high-dimensional data, in Mixture Estimation and Applications, ed. by K. Mengersen, C. Robert, D.M. Titterington (Wiley, Hoboken, NJ, 2011)
Google Scholar
G.J. McLachlan, K.E. Basford, Mixture Models: Inference and Applications to Clustering (Dekker, New York, 1988)
MATH Google Scholar
G.J. McLachlan, R.W. Bean, L. Ben-Tovim Jones, A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22, 1608–1615 (2006)
Article Google Scholar
G.J. McLachlan, R.W. Bean, L. Ben-Tovim Jones, Extension of the mixture of factor analyzers model to incorporate the multivariate t distribution. Comput. Stat. Data Anal. 51, 5327–5338 (2007)
Article MathSciNet MATH Google Scholar
G.J. McLachlan, R.W. Bean, D. Peel, A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18, 413–422 (2002)
Article Google Scholar
G.J. McLachlan, J. Chevelu, J. Zhu, Correcting for selection bias via cross-validation in the classification of microarray data, in Beyond Parametrics in Interdisciplinary Research: A Festschrift to P.K. Sen, ed. by N. Balakrishnan, E. Pena, M.J. Silvapulle (IMS Lecture Notes-Monograph Series, Hayward, CA, 2008), pp. 383–395
Google Scholar
G.J. McLachlan, K.-A. Do, C. Ambroise, Analyzing Microarray Gene Expression Data (Wiley, Hoboken, NJ, 2004)
Book MATH Google Scholar
G.J. McLachlan, T. Krishnan (1997) The EM Algorithm and Extensions, 2nd edn. (Wiley, New York, 2008)
Google Scholar
G.J. McLachlan, S.K. Ng, The EM algorithm, in The Top-Ten Algorithms in Data Mining, ed. by X. Wu, V. Kumar (Chapman & Hall, Boca Raton, FL, 2009), pp. 93–115
Chapter Google Scholar
G.J. McLachlan, D. Peel, Robust cluster analysis via mixtures of multivariate t-distributions. Lect. Notes Comput. Sci. 1451, 658–666 (1998)
Article MathSciNet Google Scholar
G.J. McLachlan, D. Peel, Finite Mixture Models (Wiley, New York, 2000)
Book MATH Google Scholar
G.J. McLachlan, D. Peel, Mixtures of factor analyzers, in Proceedings of the Seventeenth International Conference on Machine Learning, ed. by P. Langley (Morgan Kaufmann, San Francisco, CA, 2000), pp. 599–606
Google Scholar
G.J. McLachlan, D. Peel, K.E. Basford, P. Adams, The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Software 4(2), 1–14 (1999)
Google Scholar
G.J. McLachlan, D. Peel, R.W. Bean, Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)
Article MathSciNet Google Scholar
T.M. Mitchell, Mining our reality. Science 326, 1644–1645 (2010)
Article Google Scholar
V. Nikulin, G.J. McLachlan, On a general method for matrix factorisation applied to supervised classification, in Proceedings of 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop, Washington, DC, ed. by J. Chen et al. (IEEE Computer Society, Los Alamitos, CA, 2009), pp. 43–48
Google Scholar
V. Nikulin, T.-H. Huang, S.K. Ng, S.I. Rathnayake, G.J. McLachlan, A very fast algorithm for matrix factorization. Stat. Probab. Lett. 81, 773–782 (2010)
Article MathSciNet Google Scholar
V. Nikulin, G.J. McLachlan, Penalized principal component analysis of microarray data, in Lecture Notes in Bioinformatics, ed. by F. Masulli, L. Peterson, R. Tagliaferri, vol. 6160 (Springer, Berlin, 2010), pp. 82–96
Google Scholar
S.K. Ng, G.J. McLachlan, Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification. IEEE Trans. Neural Netw. 15, 738–749 (2004)
Article Google Scholar
S.K. Ng, G.J. McLachlan, K. Wang, L. Ben-Tovim, S.W. Ng, A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22, 1745–1752 (2006)
Article Google Scholar
T.J. O’Neill, Efficiency Calculations in Discriminant Analysis, Unpublished Ph.D. thesis, Stanford University, Stanford, CA, 1976
Google Scholar
T.J. O’Neill, Normal discrimination with unclassified observations. J. Am. Stat. Assoc. 73, 821–826 (1978)
Article MathSciNet MATH Google Scholar
D. Peel, G.J. McLachlan, Robust mixture modelling using the t distribution. Stat. Comput. 10, 335–344 (2000)
Article Google Scholar
S. Pyne, X. Hu, K. Wang, E. Rossin, T.-I. Lin, L.M. Maier, C. Baecher-Allan, G.J. McLachlan, P. Tamayo, D.A. Hafler, P.L. De Jager, J.P. Mesirov, Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
Article Google Scholar
A.E. Raftery, N. Dean, Variable selection for model-based clustering. J. Am. Stat. Assoc. 101, 168–178 (2006)
Article MathSciNet MATH Google Scholar
B.D. Ripley, Neural networks and related methods for classification (with discussion). J. R. Stat. Soc. B 56, 409–456 (1994)
MathSciNet MATH Google Scholar
B.D. Ripley, Pattern Recognition and Neural Networks (Cambridge University Press, Cambridge, 1996)
Google Scholar
R. Tibshirani, Regression shrinkage and selection via lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
R. Tibshirani, T. Hastie, B. Narasimhan, G. Chu, Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18, 104–117 (2003)
Article MathSciNet MATH Google Scholar
D.M. Titterington, A.F.M. Smith, U.E. Makov, Statistical Analysis of Finite Mixture Distributions (Wiley, New York, 1985)
MATH Google Scholar
G.T. Toussaint, P.M. Sharpe, An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. Comput Biol Med 4, 269–278 (1975)
Article Google Scholar
M.J. van der Laan, S. Rose, Statistics ready for a revolution: next generation of statisticians must build tools for massive data sets. Amstat News, September Issue, 2010
Google Scholar
D.M. Witten, R. Tibshirani, T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)
Article Google Scholar
M. Wojnarski, A. Janusz, H.S. Nyugen, J. Bazan, C.J. Luo, Z. Chen, F. Hu, G. Wang, L. Guan, H. Luo, J. Gao, Y. Shen, V. Nikulin, T.-H. Huang, G.J. McLachlan, M. Bosnjak, D. Gamberger, RSCTC 2010 discovery challenge: mining DNA microarray data for medical diagnosis and treatment, in Lecture Notes in Artificial Intelligence 6086 (Proceedings of RSCT 2010), ed. by M. Szczuka et al. (Springer, Berlin, 2010), pp. 4–19
Google Scholar
X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, S.K. Ng, B. Liu, P.S. Yu, Z.-H. Zhou, M. Steinbach, D.J. Hand, D. Steinberg, Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)
Article Google Scholar
X. Zhu, C. Ambroise, G.J. McLachlan, Selection bias in working with the top genes in supervised classification of tissue samples. Stat. Methodol. 3, 29–41 (2006)
Article MathSciNet Google Scholar
J.X. Zhu, G.J. McLachlan, L. Ben-Tovim, I. Wood, On selection biases with prediction rules formed from gene expression data. J. Stat. Plan. Inference 38, 374–386 (2008)
Article Google Scholar

Download references

Acknowledgements

The work for this chapter was supported by a grant from the Australian Research Council.

I would like to thank my collaborators on various aspects of my research relevant to data mining over the years, including Peter Adams, Christophe Ambroise, Jangsun Baek, Kaye Basford, Richard Bean, Liat Ben-Tovim Jones, Karen Byth, Igor Cadez, Kim-Anh Le Cao, Soong Chang, Jonathan Chevelu, Kim-Anh Do, Lloyd Flack, S. Ganesalingam, Doug Hawkins, Tian-Hsiang Huang, Peter Jones, Murray Jorgensen, Nazim Khan, Thriyambakam Krishnan, Charles Lawoko, Andy Lee, Jess Mar, Camille Maumet, Christine McLaren, Emmanuelle Meugnier, Katrina Monico, Angus Ng, Vladimir Nikulin, David Peel, Saumyadipta Pyne, Barry Quinn, Suren Rathnayake, Mohamed Shoukri, Padhraic Smyth, Erick Suarez, Deming Wang, Kui (Sam) Wang, Bill Whiten, Leesa Wockner, Ian Wood, Kelvin Yau, and Justin Zhu.

Author information

Authors and Affiliations

Department of Mathematics, University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
G. J. McLachlan

Authors

G. J. McLachlan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. J. McLachlan .

Editor information

Editors and Affiliations

, School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Mohamed Medhat Gaber

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McLachlan, G.J. (2012). An Enduring Interest in Classification: Supervised and Unsupervised. In: Gaber, M. (eds) Journeys to Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28047-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-28047-4_12
Published: 02 April 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28046-7
Online ISBN: 978-3-642-28047-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics