Skip to main content
Log in

Selection of a Representative Sample

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Sometimes a larger dataset needs to be reduced to just a few points, and it is desirable that these points be representative of the whole dataset. If the future uses of these points are not fully specified in advance, standard decision-theoretic approaches will not work. We present here methodology for choosing a small representative sample based on a mixture modeling approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • ARABIE, P., HUBERT, L.J., and DE SOETE, G. (eds.) (1996), Clustering and Classification, Singapore: World Scientific.

    MATH  Google Scholar 

  • CAMPBELL, K. (2002), “A Brief Survey Of StatisticalModel Calibration Ideas”, Technical Report LA-UR-02-3157, Los Alamos National Laboratory.

  • DUDA, R.O., HART, P.E., and STORK, D.G.(2001), Pattern Classification, New York: John Wiley and Sons.

    MATH  Google Scholar 

  • DUMOUCHEL, W., VOLINSKY, C., JOHNSON, T., CORTES, C., and PREGIBON, D. (1999), “Squashing Flat Files Flatter”, in Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 6–15.

  • FRALEY, C., and RAFTERY, A.E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association, 97, 611–631.

    Article  MATH  MathSciNet  Google Scholar 

  • GRAY, G.A., MARTINEZ-CANALES, M., LAM, C., OWENS, B.E., HEMBREE, C., BEUTLER, D., and COVERDALE, C. (2007), “Designing Dedicated Experiments to Support Validation and Calibration Activities for the Qualification of Weapons Electronics”, in Proceedings of the 14th NECDC, also available as Sandia National Laboratories Technical Report SAND2007-0553C.

  • GRAY, G.A., TADDY, M., GRIFFIN, J.D., MARTINEZ-CANALES,M., and LEE, H.K.H. (2008), “Hybrid Optimization: A Tool for Model Calibration”, Technical Report SAND2008-0145J, Sandia National Laboratories, Livermore, CA.

  • HARTIGAN, J. (1975), Clustering Algorithms, New York: John Wiley and Sons.

    MATH  Google Scholar 

  • JOLLIFFE, I.T. (1986), Principal Component Analysis, New York: Springer-Verlag.

    Google Scholar 

  • KENNEDY, M.C., and O’HAGAN, A. (2001), “Bayesian Calibration of Computer Models”, Journal of the Royal Statistical Society, 63, 425–464.

    Article  MATH  MathSciNet  Google Scholar 

  • KRUSKAL, J.B. (1976), “The Relationship Between Multi-dimensional Scaling and Clustering”, in Classification and Clustering: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin-Madison, May 3-5, 1976, ed. J.V. Ryzin, New York: Academic Press, pp. 7–44.

    Google Scholar 

  • MADIGAN, D., RAGHAVAN, I., DUMOUCHEL, W., NASON, M., POSSE, C., and RIDGEWAY, G. (2002), “Likelihood-based Data Squashing: A Modeling Approach to Instance Construction”, Data Mining and Knowledge Discovery, 6, 2002.

    Article  MathSciNet  Google Scholar 

  • MOORE, C., and DOHERTY, J. (2005), “Role of Calibration in Reducing Model Predictive Error”, Water Resources Research, 41(W05020).

  • MÜLLER, P., SANSÓ, B., and DE IORIO, M. (2004), “Optimal Bayesian Design by Inhomogeneous Markov Chain Simulation”, Journal of the American Statistical Association, 99, 788–798.

    Article  MATH  MathSciNet  Google Scholar 

  • OBERKAMPF, W.L., TRUCANO, T.G., and HIRSCH, C. (2003), “Verification, Validation, and Predictive Capability”, Technical Report SAND2003-3769, Sandia National Laboratories, Albuquerque, NM.

  • OWEN, A. (2003), “Data Squashing by Empirical Likelihood”, Data Mining and Knowledge Discovery, 7, 101–113.

    Article  MathSciNet  Google Scholar 

  • ROEDER, K., and WASSERMAN, L. (1997), “Practical Bayesian Density Estimation Using Mixtures of Normals”, Journal of the American Statistical Association, 92, 894–902.

    Article  MATH  MathSciNet  Google Scholar 

  • SPÄTH, H. (1980), Cluster Analysis Algorithms for Data Reduction and Classification of Objects, New York: John Wiley & Sons.

    MATH  Google Scholar 

  • TRUCANO, T. SWILER, L., IGUSA, T., OBERKAMPF,W., and PILCH,M. (2006), “Calibration, Validation, and Sensitivity Analysis: What’SWhat”, Reliability Engineering and System Safety, 91, 1331–1357.

    Article  Google Scholar 

  • TRUCANO, T.G., PILCH, M., and OBERKAMPF, W.L. (2002), “General Concepts for Experimental Validation of ASCI Code Applications”, Technical Report SAND2002-0341, Sandia National Laboratories, Albuquerque, NM.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Herbert K. H. Lee.

Additional information

This work was partially supported by Sandia grant 673400. The authors would like to thank the editor and two reviewers for their helpful suggestions that have improved this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, H.K.H., Taddy, M. & Gray, G.A. Selection of a Representative Sample. J Classif 27, 41–53 (2010). https://doi.org/10.1007/s00357-010-9044-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-010-9044-x

Key Words

Navigation