Abstract
Sometimes a larger dataset needs to be reduced to just a few points, and it is desirable that these points be representative of the whole dataset. If the future uses of these points are not fully specified in advance, standard decision-theoretic approaches will not work. We present here methodology for choosing a small representative sample based on a mixture modeling approach.
Similar content being viewed by others
References
ARABIE, P., HUBERT, L.J., and DE SOETE, G. (eds.) (1996), Clustering and Classification, Singapore: World Scientific.
CAMPBELL, K. (2002), “A Brief Survey Of StatisticalModel Calibration Ideas”, Technical Report LA-UR-02-3157, Los Alamos National Laboratory.
DUDA, R.O., HART, P.E., and STORK, D.G.(2001), Pattern Classification, New York: John Wiley and Sons.
DUMOUCHEL, W., VOLINSKY, C., JOHNSON, T., CORTES, C., and PREGIBON, D. (1999), “Squashing Flat Files Flatter”, in Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 6–15.
FRALEY, C., and RAFTERY, A.E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association, 97, 611–631.
GRAY, G.A., MARTINEZ-CANALES, M., LAM, C., OWENS, B.E., HEMBREE, C., BEUTLER, D., and COVERDALE, C. (2007), “Designing Dedicated Experiments to Support Validation and Calibration Activities for the Qualification of Weapons Electronics”, in Proceedings of the 14th NECDC, also available as Sandia National Laboratories Technical Report SAND2007-0553C.
GRAY, G.A., TADDY, M., GRIFFIN, J.D., MARTINEZ-CANALES,M., and LEE, H.K.H. (2008), “Hybrid Optimization: A Tool for Model Calibration”, Technical Report SAND2008-0145J, Sandia National Laboratories, Livermore, CA.
HARTIGAN, J. (1975), Clustering Algorithms, New York: John Wiley and Sons.
JOLLIFFE, I.T. (1986), Principal Component Analysis, New York: Springer-Verlag.
KENNEDY, M.C., and O’HAGAN, A. (2001), “Bayesian Calibration of Computer Models”, Journal of the Royal Statistical Society, 63, 425–464.
KRUSKAL, J.B. (1976), “The Relationship Between Multi-dimensional Scaling and Clustering”, in Classification and Clustering: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin-Madison, May 3-5, 1976, ed. J.V. Ryzin, New York: Academic Press, pp. 7–44.
MADIGAN, D., RAGHAVAN, I., DUMOUCHEL, W., NASON, M., POSSE, C., and RIDGEWAY, G. (2002), “Likelihood-based Data Squashing: A Modeling Approach to Instance Construction”, Data Mining and Knowledge Discovery, 6, 2002.
MOORE, C., and DOHERTY, J. (2005), “Role of Calibration in Reducing Model Predictive Error”, Water Resources Research, 41(W05020).
MÜLLER, P., SANSÓ, B., and DE IORIO, M. (2004), “Optimal Bayesian Design by Inhomogeneous Markov Chain Simulation”, Journal of the American Statistical Association, 99, 788–798.
OBERKAMPF, W.L., TRUCANO, T.G., and HIRSCH, C. (2003), “Verification, Validation, and Predictive Capability”, Technical Report SAND2003-3769, Sandia National Laboratories, Albuquerque, NM.
OWEN, A. (2003), “Data Squashing by Empirical Likelihood”, Data Mining and Knowledge Discovery, 7, 101–113.
ROEDER, K., and WASSERMAN, L. (1997), “Practical Bayesian Density Estimation Using Mixtures of Normals”, Journal of the American Statistical Association, 92, 894–902.
SPÄTH, H. (1980), Cluster Analysis Algorithms for Data Reduction and Classification of Objects, New York: John Wiley & Sons.
TRUCANO, T. SWILER, L., IGUSA, T., OBERKAMPF,W., and PILCH,M. (2006), “Calibration, Validation, and Sensitivity Analysis: What’SWhat”, Reliability Engineering and System Safety, 91, 1331–1357.
TRUCANO, T.G., PILCH, M., and OBERKAMPF, W.L. (2002), “General Concepts for Experimental Validation of ASCI Code Applications”, Technical Report SAND2002-0341, Sandia National Laboratories, Albuquerque, NM.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by Sandia grant 673400. The authors would like to thank the editor and two reviewers for their helpful suggestions that have improved this paper.
Rights and permissions
About this article
Cite this article
Lee, H.K.H., Taddy, M. & Gray, G.A. Selection of a Representative Sample. J Classif 27, 41–53 (2010). https://doi.org/10.1007/s00357-010-9044-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-010-9044-x