Abstract
HCI researchers are increasingly collecting rich behavioral traces of user interactions with online systems in situ at a scale not previously possible. These logs can be used to characterize user interactions with existing systems and compare different designs. Large-scale log studies give rise to new challenges in experimental design, data collection and interpretation, and ethics. The chapter discusses how to address these challenges using search engine logs, but the methods are applicable to other types of log data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adar, E., Teevan, J., & Dumais, S. T. (2008). Large scale analysis of web revisitation patterns. In Proceedings of CHI 2008 (pp. 1197–1206). New York: ACM.
Baeza-Yates, R., Dupret, G., & Velasco, J. (2007). A study of mobile search queries in Japan. In Proceedings of WWW 2007 workshop on query log analysis: Social and technical challenges. New York, NY: ACM.
Barbaro, M. & Zeller, T. (2006). A face is exposed for AOL searcher No. 4417749, New York Times, Retrieved on August 9, 2006, from http://www.nytimes.com/2006/08/09/technology/09aol.html?_r=1
Barnett, V., & Lewis, S. (1994). Outliers in statistical data. New York, NY: Wiley & Sons.
Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D. A., & Frieder, O. (2004). Hourly analysis of a very large topically categorized web query log. In Proceedings of SIGIR 2004 (pp. 321–328). New York, NY: ACM.
Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.
Brown, C. (2012). Split testing with Google analytics experiments. Retrieved on December 16, 2012, from http://webdesign.tutsplus.com/tutorials/applications/split-testing-with-google-analytics-experiments/
Capra, R. (2011). HCI browser: A tool for administration and data collection for studies of web search behavior. In Proceedings of HCIHCI 2011 (pp. 259–268). New York, NY: Springer.
Crook, T., Frasca, B., Kohavi, R., & Longbotham, R. (2009). Seven pitfalls to avoid when running controlled experiments on the web. In Proceedings of KDD 2009 (pp. 1105–1114). New York, NY: ACM.
Dell, N., Vaidyanathan, V., Medhi, I., Cutrell, E., & Thies, W. (2012). “Yours is better!”: Participant response bias in HCI. In Proceedings of CHI 2012 (pp. 1321–1330). New York, NY: ACM.
Dumais, S. T., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., & Robbins, D. C. (2003). Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR 2003 (pp. 72–79). New York, NY: ACM.
Efthimiadis, E. N. (2008). How do Greeks search the web?: A query log analysis study. In Proceedings iNews 2008 (pp. 81–84). New York, NY: ACM.
Fetterly, D., Manasse, M., & Najork, M. (2004). Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proceedings WebDB 2004 (pp. 1–6). New York, NY: ACM.
Fox, S., Karnawat, K., Mydland, M., Dumais, S. T., & White, T. (2005). Evaluating implicit measures to improve web search. ACM: Transactions on Information Systems (TOIS), 23(2), 147–168.
Ghorab, M. R., Leveling, J., Zhou, D., Jones, G. J. F., & Wade, V. (2009). Identifying common user behaviour in multilingual search logs. In Proceedings of CLEF 2009, pp. 518–525.
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457, 1012–1014.
Google. (2012). Google analytics. Retrieved on December 16, 2012, from http://www.google.com/analytics/
Huck, S. (2011). Reading statistics and research (6th ed.). Boston, MA: Pearson.
Jansen, B. J. (2006). Search log analysis: What it is, what’s been done, how to do it. Library and Information Science Research, 28(3), 407–432.
Jupiter Research Corporation. (2005, March 9). Measuring unique visitors: Addressing the dramatic decline in the accuracy of cookie-based measurement
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., & Xu, Y. (2012). Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proceedings of KDD 2012 (pp. 786–794). New York, NY: ACM.
Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled experiments on the web: Survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 140–181.
Kotov, A., Bennett, P., White, R. W., Dumais, S. T., & Teevan, J. (2011). Modeling and analysis of cross-session search tasks. In Proceedings of SIGIR 2011 (pp. 5–14). New York, NY: ACM.
Lau, T., & Horvitz, E. (1999). Patterns of search: Analyzing and modeling web query refinement. In Proceedings of user modeling 1999 (pp. 119–128). New York, NY: ACM.
Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of IEEE symposium on security and privacy 2008 (pp. 111–125). Washington, DC: IEEE.
Ogbuji, U. (2009). Working with web server logs. Retrieved on December 16, 2012, fromhttp://www.ibm.com/developerworks/web/library/wa-apachelogs/
Osborne, J. W. (2012). Best practices in data cleaning: Everything you need to know before and after collecting your data. Thousand Oak, CA: Sage Publications.
Rodden, K., & Leggett, M. (2010). Best of both worlds: Improving Gmail labels with the affordance of folders. In Proceedings of CHI 2010 (pp. 4587–4596). New York, NY: ACM.
Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1998). Analysis of a very large web search engine query log. Technical Report 1998-014. Digital SRC.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Oxford, England: Appleton-Century.
Spink, A., Ozmutlu, S., Ozmutlu, H. C., & Jansen, B. J. (2002). U.S. versus European web searching trends. ACM SIGIR Forum, 36(2), 32–38.
Starbird, K. & Palen, L. (2010). Pass it on? Retweeting in mass emergencies. In Proceedings of ISCRAM 2010, pp. 1–10.
Tang, D., Agarwal, A., O’Brien, D., & Meyer, M. (2010). Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings KDD 2010 (pp. 17–26). New York, NY: ACM.
Teevan, J., Adar, E., Jones, R., & Potts, M. (2007). Information re-retrieval: Repeat queries in Yahoo’s logs. In Proceedings of SIGIR 2007 (pp. 151–158). New York, NY: ACM.
Teevan, J., Dumais, S. T., & Liebling, D. J. (2008). To personalize or not to personalize: Modeling queries with variation in user intent. In Proceedings of SIGIR 2008 (pp. 163–170). New York, NY: ACM.
Teevan, J., & Hehmeyer, A. (2013). Understanding how the projection of availability state impacts the reception of incoming communication. In Proceedings of CSCW 2013 (pp. 753–758). New York, NY: ACM.
Teevan, J., Ramage, D., & Morris, M. R. (2011). #TwitterSearch: A comparison of microblog search and web search. In Proceedings of WSDM 2011 (pp. 35–44). New York, NY: ACM.
Tyler, S. K., & Teevan, J. (2010). Large scale query log analysis of re-finding. In Proceedings of WSDM 2010 (pp. 191–200). New York, NY: ACM.
White, R., Dumais, S. T., & Teevan, J. (2009). Characterizing the influence of domains expertise on web search behavior. In Proceedings of WSDM 2009 (pp. 132–141). New York, NY: ACM.
White, R., & Morris, D. (2007). Investigating the querying and browsing behavior of advanced search engine users. In Proceedings of SIGIR 2007 (pp. 255–262). New York, NY: ACM.
Wikipedia: AOL search. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/AOL_search_data_scandal
Wikipedia: Delta method. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Delta_method
Wikipedia: Hadoop. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Apache_Hadoop
Wikipedia: Netflix. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Netflix_Prize
Wikipedia: Power. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Statistical_power
Wikipedia: Simpson’s Paradox. Retrieved on December 16, 2012, from http://wikipedia.org/Simpsons_Paradox
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Dumais, S., Jeffries, R., Russell, D.M., Tang, D., Teevan, J. (2014). Understanding User Behavior Through Log Data and Analysis. In: Olson, J., Kellogg, W. (eds) Ways of Knowing in HCI. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0378-8_14
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0378-8_14
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0377-1
Online ISBN: 978-1-4939-0378-8
eBook Packages: Computer ScienceComputer Science (R0)