Skip to main content

Experimental Research in HCI

  • Chapter
  • First Online:
Ways of Knowing in HCI

Abstract

In Experiments, researchers set up comparable situations in which they carefully manipulate variables and collect people’s behavior in each condition. Experiments are very effective in determining causation in controlled situations and complement techniques that investigate ongoing behavior in more natural settings. For example, experiments are excellent for determining whether increased audio quality reduces blood pressure of participants in a video conference, and can add important insights to the larger question of when people choose video conferences over audio-only ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Much of what makes for good experimental design centers on minimizing what are known as threats to internal validity. Throughout this chapter we address many of these including construct validity, confounds, experimenter biases, selection and dropout biases, and statistical threats.

  2. 2.

    G*Power 3 is a specialized software tool for power analysis that has a wide number of features and is free for noncommercial use. It is available at http://www.gpower.hhu.de

  3. 3.

    Here we present the Neyman–Pearson approach to hypothesis testing as opposed to Fisher’s significance testing approach. Lehmann (1993) details the history and distinctions between these two common approaches.

  4. 4.

    We return to effect sizes and confidence intervals in the section “What constitutes good work,” where we describe how they can be used to better express the magnitude of an effect and its real world implications.

  5. 5.

    When using measures such as education level or test performance, you have to be cautious of regression to the mean and be sure that you are not assigning participants to levels of your independent variable based on their scores on the dependent variable or something strongly correlated with the DV (also known as sampling on the dependent variable) (Galton, 1886).

  6. 6.

    http://www.openstreetmap.org

  7. 7.

    When developing new measures it is important to assess and report their reliability. This can be done using a variety of test–retest assessments.

  8. 8.

    Sara Kiesler and Jonathon Cummings provided this structured way to think about dependent variables and assessing forms of reliability and validity.

  9. 9.

    It should be noted that numerous surveys and questionnaires published in the HCI literature were not validated or did not make use of validated measures. While there is still some benefit to consistency in measurement, it is less clear in these cases that the measures validly capture the stated construct.

  10. 10.

    Lazar and colleagues (Lazar, Feng, & Hochheiser, 2010, pp. 28–30) provide a step-by-step discussion of how to use a random number table to assign participants to conditions in various experimental designs. In addition, numerous online resources exist to generate tables for random assignment to experimental conditions (e.g., http://www.graphpad.com/quickcalcs/randomize1.cfm).

  11. 11.

    There are numerous online resources for obtaining Latin square tables (e.g., http://statpages.org/latinsq.html).

  12. 12.

    This approach only balances for what are known as first-order sequential effects. There are still a number of ways in which repeated measurement can be systematically affected such as nonlinear or asymmetric transfer effects. See (Kirk, 2013, Chap. 14) or other literature on Latin square or combinatorial designs for more details.

  13. 13.

    If your experiment has an odd number of conditions, then two balanced Latin squares are needed. The first square is generated using the same method described in the text, and the second square is a reversal of the first square.

  14. 14.

    As a side note, Latin square designs are a within-subject version of a general class of designs known as fractional factorial designs. Fractional factorial designs are useful when you want to explore numerous factors at once but do not have the capacity to run hundreds or thousands of participants to cover the complete factorial (see Collins, Dziak, & Li, 2009).

  15. 15.

    In practice, mixed factorial designs are often used when examining different groups of participants (e.g., demographics, skills). For example, if you are interested in differences in user experience across three different age groups, a between-subjects factor may be age group (teen, adult, elderly), while a within-subjects factor may be three different interaction styles.

  16. 16.

    Note that common transformations of the data (e.g., logarithmic or reciprocal transformations) can affect the detection and interpretation of interactions. Such transformations are performed when the data deviate from the distributional requirements of statistical tests, and researchers need to be cautious when interpreting the results of transformed data.

  17. 17.

    For factorial designs with more factors, higher-order interactions can mask lower-order effects.

  18. 18.

    For more detailed coverage of quasi-experimental designs see (Cook & Campbell, 1979; Shadish et al., 2002).

  19. 19.

    Time-series approaches have particular statistical concerns that must be addressed when analyzing the data. In particular, they often produce data points that exhibit various forms of autocorrelation, whereas many statistical analyses require that the data points are independent. There are numerous books and manuscripts on the proper treatment of time-series data, many of which reside in the domain of econometrics (Gujarati, 1995, pp. 707–754; Kennedy, 1998, pp. 263–287).

  20. 20.

    For a detailed discussion of interrupted time-series designs see (Shadish et al., 2002, pp. 171–206).

  21. 21.

    These are also known as A-B-A or withdrawal designs, and are similar to many approaches used for small-N or single-subject studies with multiple baselines. For further details see (Shadish et al., 2002, pp. 188–190).

  22. 22.

    We use a two-condition example for ease of exposition.

  23. 23.

    While we separate these three areas in order to discuss the relative contributions that are made in each, it is not to suggest that these are mutually exclusive categories. In fact, some of the most influential work has all three dimensions. For a more nuanced discussion of the integration of theoretical (basic) and practical (applied) research in an innovation context see Stokes (1997) Pasteurs Quadrant.

  24. 24.

    Not all of these studies are strict randomized experiments. For example, the SHARK evaluation does not make use of a control or comparison group. However, many use experimental research techniques to effectively demonstrate the feasibility of their approach.

  25. 25.

    The framing questions in this section are drawn from Judy Olson’s “10 questions that every graduate student should be able to answer.” The list of questions and related commentary can be found here: http://beki70.wordpress.com/2010/09/30/judy-olsons-10-questions-and-some-commentary/

References

  • Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ: L. Erlbaum Associates.

    Google Scholar 

  • Accot, J., & Zhai, S. (1997). Beyond Fitts’ law: Models for trajectory-based HCI tasks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 295–302). New York, NY: ACM.

    Google Scholar 

  • American Psychological Association. (2010). APA manual (publication manual of the American Psychological Association). Washington, DC: American Psychological Association.

    Google Scholar 

  • Bao, P., & Gergle, D. (2009). What’s “this” you say?: The use of local references on distant displays. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1029–1032). New York, NY: ACM.

    Google Scholar 

  • Bausell, R. B., & Li, Y.-F. (2002). Power analysis for experimental research: A practical guide for the biological, medical, and social sciences. Cambridge, NY: Cambridge University Press.

    Book  Google Scholar 

  • Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The compleat academic: A practical guide for the beginning social scientist (2nd ed.). Washington, DC: American Psychological Association.

    Google Scholar 

  • Borenstein, D. M., Hedges, L. V., & Higgins, J. (2009). Introduction to meta-analysis. Chichester: Wiley.

    Book  MATH  Google Scholar 

  • Bradley, J. V. (1958). Complete counterbalancing of immediate sequential effects in a Latin square design. Journal of the American Statistical Association, 53(282), 525–528.

    Article  MATH  Google Scholar 

  • Campbell, D. T., Stanley, J. C., & Gage, N. L. (1963). Experimental and quasi-experimental designs for research. Boston, MA: Houghton Mifflin.

    Google Scholar 

  • Carter, S., Mankoff, J., Klemmer, S., & Matthews, T. (2008). Exiting the cleanroom: On ecological validity and ubiquitous computing. Human–Computer Interaction, 23(1), 47–99.

    Article  Google Scholar 

  • Carver, R. P. (1993). The case against statistical significance testing, revisited. The Journal of Experimental Education, 61(4), 287–292.

    Article  Google Scholar 

  • Cochran, W. G., & Cox, G. M. (1957). Experimental designs. New York, NY: Wiley.

    MATH  Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: L. Erlbaum Associates.

    MATH  Google Scholar 

  • Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.

    Article  Google Scholar 

  • Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14(3), 202–224.

    Article  Google Scholar 

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Chicago: Rand McNally.

    Google Scholar 

  • Cosley, D., Lam, S. K., Albert, I., Konstan, J. A., & Riedl, J. (2003). Is seeing believing?: How recommender system interfaces affect users’ opinions. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 585–592). New York, NY: ACM.

    Google Scholar 

  • Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.

    Google Scholar 

  • Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532–574.

    Article  MathSciNet  Google Scholar 

  • Czerwinski, M., Tan, D. S., & Robertson, G. G. (2002). Women take a wider view. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 195–202). New York, NY: ACM.

    Google Scholar 

  • Dabbish, L., Kraut, R., & Patton, J. (2012). Communication and commitment in an online game team. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 879–888). New York, NY: ACM.

    Google Scholar 

  • Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge, NY: Cambridge University Press.

    Book  Google Scholar 

  • Evans, A., & Wobbrock, J. O. (2012). Taming wild behavior: The input observer for text entry and mouse pointing measures from everyday computer use. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1947–1956). New York, NY: ACM.

    Google Scholar 

  • Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.

    Google Scholar 

  • Fisher, R. A., & Yates, F. (1953). Statistical tables for biological, agricultural and medical research. Edinburgh: Oliver & Boyd.

    Google Scholar 

  • Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.

    Article  Google Scholar 

  • Gergle, D., Kraut, R. E., & Fussell, S. R. (2013). Using visual information for grounding and awareness in collaborative tasks. Human–Computer Interaction, 28(1), 1–39.

    Google Scholar 

  • Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Gujarati, D. N. (1995). Basic econometrics. New York, NY: McGraw-Hill.

    Google Scholar 

  • Gutwin, C., & Penner, R. (2002). Improving interpretation of remote gestures with telepointer traces. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 49–57). New York, NY: ACM.

    Google Scholar 

  • Hancock, J. T., Landrigan, C., & Silver, C. (2007). Expressing emotion in text-based communication. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 929–932). New York, NY: ACM.

    Google Scholar 

  • Hancock, G. R., & Mueller, R. O. (2010). The reviewer’s guide to quantitative methods in the social sciences. New York, NY: Routledge.

    Google Scholar 

  • Harrison, C., Tan, D., & Morris, D. (2010). Skinput: Appropriating the body as an input surface. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 453–462). New York, NY: ACM.

    Google Scholar 

  • Hornbæk, K. (2011). Some whys and hows of experiments in Human–Computer Interaction. Foundations and Trends in Human–Computer Interaction, 5(4), 299–373.

    Article  Google Scholar 

  • Johnson, D. H. (1999). The insignificance of statistical significance testing. The Journal of Wildlife Management, 63, 763–772.

    Article  Google Scholar 

  • Keegan, B., & Gergle, D. (2010). Egalitarians at the gate: One-sided gatekeeping practices in social media. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 131–134). New York, NY: ACM.

    Google Scholar 

  • Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17(2), 137–152.

    Article  Google Scholar 

  • Kenny, D.A. (1987). Statistics for the social and behavioral sciences. Canada: Little, Brown and Company.

    Google Scholar 

  • Kennedy, P. (1998). A guide to econometrics. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Monterey, CA: Brooks/Cole.

    Google Scholar 

  • Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.

    Book  Google Scholar 

  • Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association.

    Book  Google Scholar 

  • Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education, 8, 30–43.

    Google Scholar 

  • Kohavi, R., Henne, R. M., & Sommerfield, D. (2007). Practical guide to controlled experiments on the web: Listen to your customers not to the hippo. In Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (pp. 959–967). New York, NY: ACM.

    Google Scholar 

  • Kohavi, R., & Longbotham, R. (2007). Online experiments: Lessons learned. Computer, 40(9), 103–105.

    Article  Google Scholar 

  • Kohavi, R., Longbotham, R., & Walker, T. (2010). Online experiments: Practical lessons. Computer, 43(9), 82–85.

    Article  Google Scholar 

  • Kristensson, P.-O., & Zhai, S. (2004). SHARK^2: A large vocabulary shorthand writing system for pen-based computers. In Proceedings of the ACM symposium on user interface software and technology (pp. 43–52). New York, NY: ACM.

    Google Scholar 

  • Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14(7), 293–300.

    Article  Google Scholar 

  • Lazar, J., Feng, J. H., & Hochheiser, H. (2010). Research methods in human-computer interaction. Chichester: Wiley.

    Google Scholar 

  • Lehmann, E. L. (1993). The fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249.

    Article  MATH  MathSciNet  Google Scholar 

  • Lieberman, H. (2003). The tyranny of evaluation. Retrieved August 15, 2012, from http://web.media.mit.edu/~lieber/Misc/Tyranny-Evaluation.html

  • MacKenzie, I. S., & Zhang, S. X. (1999). The design and evaluation of a high-performance soft keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 25–31). New York, NY: ACM.

    Google Scholar 

  • Martin, D. W. (2004). Doing psychology experiments. Belmont, CA: Thomson/Wadsworth.

    Google Scholar 

  • McLeod, P. L. (1992). An assessment of the experimental literature on electronic support of group work: Results of a meta-analysis. Human–Computer Interaction, 7(3), 257–280.

    Article  Google Scholar 

  • Nguyen, D., & Canny, J. (2005). MultiView: Spatially faithful group video conferencing. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 799–808). New York, NY: ACM.

    Google Scholar 

  • Olson, J. S., Olson, G. M., Storrøsten, M., & Carter, M. (1993). Groupwork close up: A comparison of the group design process with and without a simple group editor. ACM Transactions on Information Systems, 11(4), 321–348.

    Article  Google Scholar 

  • Oulasvirta, A. (2009). Field experiments in HCI: Promises and challenges. In P. Saariluoma & H. Isomaki (Eds.), Future interaction design II. New York, NY: Springer.

    Google Scholar 

  • Oulasvirta, A., Tamminen, S., Roto, V., & Kuorelahti, J. (2005). Interaction in 4-second bursts: The fragmented nature of attentional resources in mobile HCI. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 919–928). New York, NY: ACM.

    Google Scholar 

  • Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553.

    Article  Google Scholar 

  • Rosenthal, R., & Rosnow, R. L. (2008). Essentials of behavioral research: Methods and data analysis (3rd ed.). New York, NY: McGraw-Hill.

    Google Scholar 

  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.

    Google Scholar 

  • Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Stokes, D. E. (1997). Pasteur’s quadrant: Basic science and technological innovation. Washington, DC: Brookings Institution Press.

    Google Scholar 

  • Tan, D. S., Gergle, D., Scupelli, P., & Pausch, R. (2003). With similar visual angles, larger displays improve spatial performance. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 217–224). New York, NY: ACM.

    Google Scholar 

  • Tan, D. S., Gergle, D., Scupelli, P. G., & Pausch, R. (2004). Physically large displays improve path integration in 3D virtual navigation tasks. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 439–446). New York, NY: ACM.

    Google Scholar 

  • Tan, D. S., Gergle, D., Scupelli, P., & Pausch, R. (2006). Physically large displays improve performance on spatial tasks. ACM Transactions on Computer Human Interaction, 13(1), 71–99.

    Article  Google Scholar 

  • Veinott, E. S., Olson, J., Olson, G. M., & Fu, X. (1999). Video helps remote work: Speakers who need to negotiate common ground benefit from seeing each other. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 302–309). New York, NY: ACM.

    Google Scholar 

  • Weir, P. (1998). The Truman show. Drama, Sci-Fi.

    Google Scholar 

  • Weisband, S., & Kiesler, S. (1996). Self disclosure on computer forms: Meta-analysis and implications. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 3–10). New York, NY: ACM.

    Google Scholar 

  • Weiss, N. A. (2008). Introductory statistics. San Francisco, CA: Pearson Addison-Wesley.

    Google Scholar 

  • Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., & Shen, C. (2007). Lucid touch: A see-through mobile device. In Proceedings of the ACM symposium on user interface software and technology (pp. 269–278). New York, NY: ACM.

    Google Scholar 

  • Williams, E. J. (1949). Experimental designs balanced for the estimation of residual effects of treatments. Australian Journal of Chemistry, 2(2), 149–168.

    Article  Google Scholar 

  • Wilson, M. L., Mackay, W., Chi, E., Bernstein, M., & Nichols, J. (2012). RepliCHI SIG: From a panel to a new submission venue for replication. In Proceedings of the ACM conference extended abstracts on human factors in computing systems (pp. 1185–1188). New York, NY: ACM.

    Google Scholar 

  • Wobbrock, J. O. (2011). Practical statistics for human-computer interaction: An independent study combining statistics theory and tool know-how. Presented at the Annual workshop of the Human-Computer Interaction Consortium (HCIC ’11). Pacific Grove, CA.

    Google Scholar 

  • Wobbrock, J. O., Cutrell, E., Harada, S., & MacKenzie, I. S. (2008). An error model for pointing based on Fitts’ law. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1613–1622). New York, NY: ACM.

    Google Scholar 

  • Yee, N., Bailenson, J. N., & Rickertsen, K. (2007). A meta-analysis of the impact of the inclusion and realism of human-like faces on user experiences in interfaces. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1–10). New York, NY: ACM.

    Google Scholar 

  • Zhai, S. (2003). Evaluation is the worst form of HCI research except all those other forms that have been tried. Retrieved February 18, 2014, from http://shuminzhai.com/papers/EvaluationDemocracy.htm

  • Zhai, S., & Kristensson, P.-O. (2003). Shorthand writing on stylus keyboard. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 97–104). New York, NY: ACM.

    Google Scholar 

  • Zhu, H., Kraut, R., & Kittur, A. (2012). Effectiveness of shared leadership in online communities. In Proceedings of the ACM SIGCHI conference on computer supported cooperative work (pp. 407–416). New York, NY: ACM.

    Google Scholar 

Download references

Acknowledgements

We would like to thank Wendy Kellogg, Robert Kraut, Anne Oeldorf-Hirsch, Gary Olson, Judy Olson, and Lauren Scissors for their thoughtful reviews and comments on the chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darren Gergle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Gergle, D., Tan, D.S. (2014). Experimental Research in HCI. In: Olson, J., Kellogg, W. (eds) Ways of Knowing in HCI. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0378-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0378-8_9

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-0377-1

  • Online ISBN: 978-1-4939-0378-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics