ABSTRACT
Computer scientists and software engineers seldom rely on using experimental methods despite frequent calls to do so. The problem may lie with the shortcomings of traditional experimental methods. We introduce a new form of experimental designs, synthetic designs, which address these shortcomings. Compared with classical experimental designs (between-subjects, within-subjects, and matched-subjects), synthetic designs can offer substantial reductions in sample sizes, cost, time and effort expended, increased statistical power, and fewer threats to validity (internal, external, and statistical conclusion). This new design is a variation of within-subjects design in which each system user serves in only a single treatment condition. System performance scores for all other treatment conditions are derived synthetically without repeated testing of each subject. This design, though not applicable in all situations, can be used in the development and testing of some computer systems provided that user behavior is unaffected by the version of computer system being used. We justify synthetic designs on three grounds: this design has been used successfully in the development of computerized mug shot systems, showing marked advantages over traditional designs; a detailed comparison with traditional designs showing their advantages on 17 of the 18 criteria considered; and an assessment showing these designs satisfy all the requirements of true experiments (albeit in a novel way).
- Alison, D. B., Allison, R. L., Faith, M. S., Paultre, F., and Pi-Sunyer, F. Power and money: Designing statistically powerful studies while minimizing financial costs. Psychological Methods, 2, pages 20--33, 1997.Google Scholar
- Basili, V. The role of experimentation in software engineering: Past, current, and future. Proc. of the 18th Conf. on Soft. Eng., IEEE Society, pages 442--449, 1996. Google ScholarDigital Library
- Beck, K. Extreme Programming Explained: Embrace Change. Addison-Wesley, 2000. Google ScholarDigital Library
- Campbell, D. T., and Stanley, J. C. Experimental and Quasi-Experimental Designs for Research. Boston: Houghton Mifflin, 1963.Google Scholar
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Erlbaum, 1988.Google Scholar
- Cohen, J. A power primer. Psychological Bulletin, 112, pages 155--159, 1992.Google Scholar
- Cook, T. D., and Campbell, D. T. Quasi Experimentation: Design and Analysis Issues for Field Settings. Boston: Houghton Mifflin, 1979.Google Scholar
- Denning, P. J. What is experimental computer science? Comm. of the ACM, 23, pages 543--544, 1980. Google ScholarDigital Library
- Denning, P. J. Is computer science science? Comm. Of the ACM, 48, pages 27--31, 2005. Google ScholarDigital Library
- Ellis, H. D., Shepherd, J. W., Shepherd, J., Klin, R. H., and Davies, G. M. Identification from a computer-driven retrieval system compared with a traditional mug-shot album: A new tool for police investigations. Ergonomics, 32, pages 167--177, 1989.Google Scholar
- Feitelson, D. G. Experimental computer science: The need for a culture change. Unpublished manuscript (version of 15 May available on the web), pages 1--35, 2006.Google Scholar
- Fisher, R. A. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd, 1925.Google Scholar
- Fisher, R. A. The Design of Experiments. New York: Hafner Publishing, 1935.Google Scholar
- Gauch, H. G. Winning the accuracy game. American Scientist, 94, pages 133--141, 2006.Google Scholar
- Gravetter, F. J. and Forzano, L. B. Research Methods for the Behavioral Sciences. Belmont, CA: Wadsworth/Thomson Learning, 2006.Google Scholar
- Gregson, R. A. M. Psychometrics of Similarity. New York: Academic Press, 1975.Google Scholar
- Harmon, L. D. The recognition of faces, Scientific American, 229, pages 70--82, 1973.Google Scholar
- Hoskins, D., Colbourn, C., and Montgomery, D. Software performance testing using covering arrays. WOSP'05, pages 131--136, 2005. Google ScholarDigital Library
- Kelton, W. D., and Barton, R. Experimental design for simulation. In S. Chick, P. Sanchez, D. Ferrin, and D. Morrice (Eds.), Proc. Of the 2003 Winter Simulation Conf., pages 59--65, 2003. Google ScholarDigital Library
- Keselman, H. J., Algina, J., and Kowalchuk, R. K. The analysis of repeated measures designs: A review. British J. of Math. and Statistical Psychology, 54, pages 1--20, 2001.Google Scholar
- Kirk, R. E. Experimental Design: Procedures for the Behavioral Sciences (3rd ed.). Pacific Grove, CA: Brooks/Cole, 1995.Google Scholar
- Kirk, R. E. Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), pages 746--759, 1996.Google Scholar
- Kraemer, H. C. To increase power in randomized clinical trials without increasing sample size. Psychopharmacology Bulletin, 27, pages 217--224, 1991.Google Scholar
- Kujala, S. User involvement: a review of the benefits and challenges. Behavior & Info. Technology, 22, 1--16, 2003.Google ScholarCross Ref
- Landauer, T. K. The role of laboratory experiments in HCI: Help, hindrance, or ho-hum? (Panel) CHI'89 Conf. Proc., pages 265--268, 1989. Google ScholarDigital Library
- Laughery, K. R. and Wogalter, M. S. Forensic applications of facial memory research, in A. W. Young and H. D. Ellis (eds.), Handbook of Research on Face Processing. (London: Elsevier), 1989.Google Scholar
- Lee, B., Barua, A., and Whinston, A. B. Discovery and representation of causal relationships in MIS research: A methodological framework. MIS Quarterly, 21, 1997. Google ScholarDigital Library
- Lee, E. S., Whalen, T., Sakalauskas, J., Baigent, G., Bisesar, C., McCarthy, A., Reid, G., and Wotton, C. Suspect identification by facial features. Ergonomics, 47, pages 719--747, 2004.Google Scholar
- Lindsay, R. C. L., Nosworthy, G. J., Martin, R., and Martynuck, C. Using mug shots to find suspects. J. of Applied Psychology, 79, 121--130, 1994.Google ScholarCross Ref
- McClelland, G. H. Optimal design in psychological research. Psychological Methods, 2, pages 3--19, 1997.Google Scholar
- Martin, D. W. Doing Psychology Experiments (1st ed.). Pacific Belmont, CA: Wadsworth/Thomson Learning, 1977.Google Scholar
- Maxwell, S. E., and Delaney, H. D. Designing Experiments and Analyzing Data: A Model Comparison Procedure (2nd ed.), 2004.Google Scholar
- Perry, D., Porter, A. and Votta, L. Empirical studies of software engineering: A roadmap. Proc. of the Conf. on the Future of Soft. Eng., ACM, pages 347--355, 2000. Google ScholarDigital Library
- Roberts, B., and Morrissey, D. The use of statistical experimental design techniques for the systematic improvement of an automated, heuristic targeting algorithm. In J. Wilson, J. Henriksen, and S. Roberts (Eds.), Proc. Of the 1986 Winter Simulation Conf., pages 802--807, 1986. Google ScholarDigital Library
- Sachs, J., Welch, W., Mitchell, T., and Wynn, H. Design and analysis of computer experiments. Statistical Science, 4, pages 409--435, 1989.Google Scholar
- Salvendy, G. (Ed.), Handbook of Human Factors and Ergonomics. Toronto: Wiley, 1997. Google ScholarDigital Library
- Shaughnessy, J. J., and Zechmeister, E. B. Research Methods in Psychology (2nd ed.). New York: McGraw-Hill, 1990.Google Scholar
- Sjøberg, D. I. K., Arisholm, E., and Jorgensen, M. Conducting experiments on software evolution. Proc. of the 4th Int'l Workshop on Principles of Software Evolution, pages 142--145, 2001. Google ScholarDigital Library
- Tichy, W., Lukowicz, P., Prechelt, L., and Heinz, E. Experimental evaluation in computer science: A quantitative study. J. Systems Soft., 18, pages 9--18, 1995. Google ScholarDigital Library
- Tichy, W. Should computer scientists experiment more. IEEE Computer, 31, pages 32--40, 1998. Google ScholarDigital Library
- Venter, A., Maxwell, S. E., and Bolig, E. Power in randomized group comparisons: the value of adding a single intermediate time point to a traditional pretest-posttest design, Psychological Methods, 7, pages 194--209, 2002.Google Scholar
- Votta, L. Porter, A., and Perry, D. Experimental software engineering: A report on the state of the art. Proc. of the 17th Intl. Conf. on Soft. Eng., ACM, 1995. Google ScholarDigital Library
- Wade, M. R. and Tingling, P. A new twist on an old method: A guide to the applicability and use of web experiments in information systems research. The Database for Advances in Information Systems, 36, pages 69--88, 2005. Google ScholarDigital Library
- Wilkinson, L. Statistical methods in psychology journals. American Psychologist, 54, pages 594--604, 1999.Google Scholar
- Winer, B. J., Brown, D. R., and Michels, K. M. Statistical Principles in Experimental Design. New York: McGraw-Hill, 1991.Google Scholar
- Zannier, C., Melnik, G., and Maurer, F. On the success of empirical studies in the International Conference on Software Engineering, Proc. of the 28th Intl. Conf. on Soft. Eng., ACM, pages 341--350, 2006. Google ScholarDigital Library
- Zelkowitz, M., and Wallace, D. Experimental models for validating technology. IEEE Computer, 31, pages 23--31, 1998. Google ScholarDigital Library
Index Terms
- Synthetic designs: a new form of true experimental design for use in information systems development
Recommendations
Synthetic designs: a new form of true experimental design for use in information systems development
SIGMETRICS '07 Conference ProceedingsComputer scientists and software engineers seldom rely on using experimental methods despite frequent calls to do so. The problem may lie with the shortcomings of traditional experimental methods. We introduce a new form of experimental designs, ...
A search of maximum generalized resolution quaternary-code designs via integer linear programming
Quaternary-code (QC) designs, an attractive class of nonregular fractional factorial designs, have received much attention due to their theoretical elegance and practical applicability. Some recent works of QC designs revealed their good properties over ...
On Minimal Defining Sets of Full Designs and Self-Complementary Designs, and a New Algorithm for Finding Defining Sets of t-Designs
A defining set of a t-(v, k, λ) design is a partial design which is contained in a unique t-design with the given parameters. A minimal defining set is a defining set, none of whose proper partial designs is a defining set. This paper proposes a new and ...
Comments