Skip to main content
Log in

The Price of Privacy

An Evaluation of the Economic Value of Collecting Clickstream Data

  • Research Paper
  • Published:
Business & Information Systems Engineering Aims and scope Submit manuscript

Abstract

The analysis of clickstream data facilitates the understanding and prediction of customer behavior in e-commerce. Companies can leverage such data to increase revenue. For customers and website users, on the other hand, the collection of behavioral data entails privacy invasion. The objective of the paper is to shed light on the trade-off between privacy and the business value of customer information. To that end, the authors review approaches to convert clickstream data into behavioral traits, which we call clickstream features, and propose a categorization of these features according to the potential threat they pose to user privacy. The authors then examine the extent to which different categories of clickstream features facilitate predictions of online user shopping patterns and approximate the marginal utility of using more privacy adverse information in behavioral prediction models. Thus, the paper links the literature on user privacy to that on e-commerce analytics and takes a step toward an economic analysis of privacy costs and benefits. In particular, the results of empirical experimentation with large real-world e-commerce data suggest that the inclusion of short-term customer behavior based on session-related information leads to large gains in predictive accuracy and business performance, while storing and aggregating usage behavior over longer horizons has comparably less value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. For example, see the Health Insurance Portability and Accountability Act of 1996 or the California Online Privacy Protection Act of 2003 for the US or the General Data Protection Regulation for EU regulation.

  2. The calculations are based on the actual number of correctly and incorrectly classified customers across the 50 (2 shops × 5 feature sets × 5 conversion rate) settings. Interested readers find results at this level of detail in the Appendix.

References

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Record 29:439–450. https://doi.org/10.1145/335191.335438

    Article  Google Scholar 

  • Akrivopoulou C, Stylianou A (2009) Navigating in Internet: privacy and the socioeconomic and legal implications of electronic intrusion. IGI Global, Hershey

    Book  Google Scholar 

  • Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequences. In: Proceedings of the web mining workshop at the 1st SIAM conference on data mining

  • Bansal G, Zahedi F, Gefen D (2015) The role of privacy assurance mechanisms in building trust and the moderating role of privacy concern. Eur J Inf Syst 24:624–644. https://doi.org/10.1057/ejis.2014.41

    Article  Google Scholar 

  • Baumer D, Earp J, Poindexter J (2004) Internet privacy law: a comparison between the United States and the European Union. Comput Secur 23:400–412. https://doi.org/10.1016/j.cose.2003.11.001

    Article  Google Scholar 

  • Bennett PN, White RW, Chu W, Dumais ST, Bailey P, Borisyuk F, Cui X (2012) Modeling the impact of short-and long-term behavior on search personalization. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 185–194

  • Boda K, Földes Á, Gulyás G, Imre S (2012). User tracking on the web via cross-browser fingerprinting. In: Information security technology for applications, pp 31–46

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Buckinx W, Van den Poel D (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166:557–575. https://doi.org/10.1016/j.ejor.2004.04.022

    Article  Google Scholar 

  • Chaffey D (2015) Digital business and e-commerce management, 6th edn. Pearson, London

    Google Scholar 

  • Chan T, Joseph I, Macasaet C, Kang D, Hardy RM, Ruiz C, Porras R, Baron B, Qazi K, Hannon P, Honda T (2014) Predictive models for determining if and when to display online lead forms. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence (AAAI), pp 2882–2889

  • comScore (2007) Cookie-based counting overstates size of web site audiences. In: comScore, Inc. http://www.comscore.com/chi/Insights/Press-Releases/2007/04/comScore-Cookie-Deletion-Report. Accessed 22 Dec 2016

  • Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the world wide web. In: Proceedings of the ninth IEEE international conference on tools with artificial intelligence. IEEE, pp 558–567

  • Dinev T, Xu H, Smith JH, Hart P (2013) Information privacy and correlates: an empirical attempt to bridge and distinguish privacy-related concepts. Eur J Inf Syst 22:295–316

    Article  Google Scholar 

  • Eckersley P (2010) How unique is your web browser? In: International symposium on privacy enhancing technologies symposium. Springer, Heidelberg, pp 1–18

  • Elkan C (2001) The foundations of cost-sensitive learning. Int Jt Conf Artif Intell 17:973–978

    Google Scholar 

  • eMarketer (2016) Worldwide retail e-commerce sales will reach $1.915 trillion this year. In: Emarketer.com. https://www.emarketer.com/Article/Worldwide-Retail-Ecommerce-Sales-Will-Reach-1915-Trillion-This-Year/1014369. Accessed 22 Dec 2016

  • Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27:659–678. https://doi.org/10.1007/s11222-016-9646-1

    Article  Google Scholar 

  • Greis F (2016) Browser-Addons: Browserverläufe von Millionen deutschen Nutzern verkauft. In: Golem.de. http://www.golem.de/news/browser-addons-browserverlaeufe-von-millionen-deutschen-nutzern-verkauft-1611-124171.html. Accessed 22 Dec 2016

  • Guo Q, Agichtein E (2010a) Towards predicting web searcher gaze position from mouse movements. In: Proceedings on extended abstracts on human factors in computing systems (CHI), pp 3601–3606

  • Guo Q, Agichtein E (2010b) Ready to buy or just browsing? Detecting web searcher goals from interaction data. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 130–137

  • Hoofnagle C, Urban J, Li S (2012) Privacy and modern advertising: most US internet users want ‘do not track’ to stop collection of data about their online activities. In: Amsterdam privacy conference

  • Iwanaga J, Nishimura N, Sukegawa N, Takano Y (2016) Estimating product-choice probabilities from recency and frequency of page views. Knowl Based Syst 99:157–167. https://doi.org/10.1016/j.knosys.2016.02.006

    Article  Google Scholar 

  • Jiang Q, Tan CH, Wei KK (2012) Cross-website navigation behavior and purchase commitment: a pluralistic field research. In: Proceedings of the Pacific Asia conference on information systems (PACIS)

  • KantarMedia (2016) CPG digital coupon circulation grows by 23.4% in 1H16, reaching 3.7 billion. In: Kantarmedia.com. http://www.kantarmedia.com/us/newsroom/press-releases/cpg-digital-coupon-circulation-grows-by-23-4-in-1h16. Accessed 1 March 2017

  • Khajehzadeh S, Oppewal H, Tojib D (2014) Consumer responses to mobile coupons: the roles of shopping motivation and regulatory fit. J Bus Res 67:2447–2455. https://doi.org/10.1016/j.jbusres.2014.02.012

    Article  Google Scholar 

  • Kim DJ, Ferrin DL, Rao HR (2008) A trust-based consumer decision-making model in electronic commerce: the role of trust, perceived risk, and their antecedents. Decis Support Syst 44:544–564. https://doi.org/10.1016/j.dss.2007.07.001

    Article  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Heidelberg

    Book  Google Scholar 

  • Lee M, Ferguson ME, Garrow LA, Post D (2010) The impact of leisure travelers’ online search and purchase behaviors on promotion effectiveness. Working paper, Georgia Institute of Technology

  • Lessmann S, Voß S (2010) Customer-centric decision support: a benchmarking study of novel versus established classification models. Bus Inf Syst Eng 2:79–93. https://doi.org/10.1007/s12599-010-0094-8

    Article  Google Scholar 

  • Libert T (2015) Privacy implications of health information seeking on the web. Commun ACM 58:68–77

    Article  Google Scholar 

  • Lin E (2002) Prioritizing privacy: a constitutional response to the Internet. Berkeley Technol Law J 17:1085–1154

    Google Scholar 

  • Liu C, Marchewka J, Lu J, Yu C (2005) Beyond concern: a privacy–trust–behavioral intention model of electronic commerce. Inf Manag 42:127–142. https://doi.org/10.1016/j.im.2004.01.002

    Article  Google Scholar 

  • Lu L, Dunham M, Meng Y (2005) Mining significant usage patterns from clickstream data. In: Advances in web mining and web usage analysis. Springer, Heidelberg, pp 1–17

  • Margineantu DD (2001) Methods for cost-sensitive learning. Doctoral dissertation, Department of Computer Science, Oregon State University

  • Masand B., Piatetsky-Shapiro G (1996) A comparison of approaches for maximizing business payoff of prediction models. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, Portland, OR, USA. AAAI Press Menlo Park, pp 195–201

  • Metzger M (2004) Privacy, trust, and disclosure: exploring barriers to electronic commerce. J Comput Med Commun. https://doi.org/10.1111/j.1083-6101.2004.tb00292.x

    Google Scholar 

  • Moe W (2003) Buying, searching, or browsing: differentiating between online shoppers using in-store navigational clickstream. J Consum Psychol 13:29–39. https://doi.org/10.1207/153276603768344762

    Article  Google Scholar 

  • Moe W, Fader P (2004) Capturing evolving visit behavior in clickstream data. J Interact Mark 18:5–19. https://doi.org/10.1002/dir.10074

    Article  Google Scholar 

  • Moe WW, Chipman H, George EI, McCulloch RE (2002) A Bayesian treed model of online purchasing behavior using in-store navigational clickstream. Revising for 2nd review at Journal of Marketing Research

  • Moertini VS, Ibrahim N (2015) Efficient techniques for predicting suppliers churn tendency in e-commerce based on website access data. J Theoret Appl Inf Technol 74(3):300–309

    Google Scholar 

  • Montgomery A, Li S, Srinivasan K, Liechty J (2004) Modeling online browsing and path analysis using clickstream data. Mark Sci 23:579–595. https://doi.org/10.1287/mksc.1040.0073

    Article  Google Scholar 

  • Nikiforakis N, Kapravelos A, Joosen W, Kruegel C, Piessens F, Vigna G (2014) On the workings and current practices of web-based device fingerprinting. IEEE Secur Priv 12:28–36

    Article  Google Scholar 

  • Nofer M, Hinz O, Muntermann J, Roßnagel H (2014) The economic impact of privacy violations and security breaches: a laboratory experiment. Bus Inf Syst Eng 6:339–348. https://doi.org/10.1007/s12599-014-0351-3

    Article  Google Scholar 

  • O’Connell BM, Walker KR (2014) User-browser interaction-based fraud detection system. In: USPTO Patent Full-Text and Image Database. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8,650,080.PN.&OS=PN/8,650,080&RS=PN/8,650,080. Accessed 22 Dec 2016

  • Padmanabhan B, Zheng Z, Kimbrough SO (2001) Personalization from incomplete data: what you don’t know can hurt. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California

  • Padmanabhan B, Zheng Z, Kimbrough SO (2006) An empirical analysis of the value of complete information for eCRM models. MIS Q 30(2):247–267

    Article  Google Scholar 

  • Pai D, Sharang A, Yadagiri MM, Agrawal S (2014) Modelling visit similarity using click-stream data: a supervised approach. In: Web information systems engineering (WISE). Springer, Heidelberg, pp 135–145

  • Park CH, Park YH (2015) Investigating purchase conversion by uncovering online visit patterns. SSRN Electron J. https://doi.org/10.2139/ssrn.1685469

    Google Scholar 

  • Pitman A, Zanker M (2010). Insights from applying sequential pattern mining to e-commerce click stream data. In: IEEE international conference on data mining workshops (ICDMW). IEEE, pp 967–975

  • Pollach I (2011) Online privacy as a corporate social responsibility: an empirical study. Bus Ethics Europ Rev 20:88–102

    Article  Google Scholar 

  • Rodden K, Fu X, Aula A, Spiro I (2008) Eye-mouse coordination patterns on web search results pages. In: Proceedings of extended abstracts on human factors in computing systems (CHI’08)

  • Sarwar SM, Hasan M, Ignatov DI (2015) Two-stage cascaded classifier for purchase prediction. arXiv preprint arXiv:1508.03856

  • Sato S, Asahi Y (2012) A daily-level purchasing model at an e-commerce site. Int J Electric Comput Eng (IJECE). https://doi.org/10.11591/ijece.v2i6.1816

    Google Scholar 

  • Senécal S, Kalczynski P, Nantel J (2005) Consumers’ decision-making process and their online shopping behavior: a clickstream analysis. J Bus Res 58:1599–1608. https://doi.org/10.1016/j.jbusres.2004.06.003

    Article  Google Scholar 

  • Senécal S, Kalczynski P, Fredette M (2014) Dynamic identification of anonymous consumers’ visit goals using clickstream. Int J Electron Bus 11:220. https://doi.org/10.1504/ijeb.2014.063036

    Article  Google Scholar 

  • Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of the 21st national conference on artificial intelligence. AAAI Press, Boston, MA, USA

  • Sipior JC, Ward BT, Mendoza RA (2011) Online privacy concerns associated with cookies, flash cookies, and web beacons. J Internet Commer 10:1–16

    Article  Google Scholar 

  • Sismeiro C, Bucklin R (2004) Modeling purchase behavior at an e-commerce web site: a task-completion approach. J Mark Res 41:306–323. https://doi.org/10.1509/jmkr.41.3.306.35985

    Article  Google Scholar 

  • Skok G (2000) Establishing a legitimate expectation of privacy in clickstream data. Michigan Telecommun Technol Law Rev 6:61–88

    Google Scholar 

  • Solove DJ (2001) Privacy and power: computer databases and metaphors for information privacy. Stanf Law Rev 53:1393–1462

    Article  Google Scholar 

  • Stange M, Funk B (2014) Real-time-advertising. Bus Inf Syst Eng 6(5):305–308. https://doi.org/10.1007/s12599-014-0346-0

    Article  Google Scholar 

  • Stange M, Funk B (2015) How much tracking is necessary? The learning curve in Bayesian user journey analysis. In: Proceedings of the 23rd European conference on information systems

  • Statista (2016a) Executive survey: big data sets that add the most value 2012. In: Statista. https://www.statista.com/statistics/249054/executive-survey-on-big-data-sets-that-add-the-most-company-value/. Accessed 22 Dec 2016

  • Statista (2016b) Löschen oder Unterdrücken von Cookies bei deutschen Internetnutzern bis 2015| Umfrage. In: Statista. https://de.statista.com/statistik/daten/studie/168870/umfrage/nutzung-von-programmen-die-cookies-loeschen/. Accessed 22 Dec 2016

  • Statista (2016c) Global online shopping conversion rate 2016. Statistic. In: Statista. https://www.statista.com/statistics/439576/online-shopper-conversion-rate-worldwide/. Accessed 12 Jan 2017

  • Statista (2017) The ten coupon websites with the highest conversion rate in China in June 2011. Statistic. In: Statista. https://www.statista.com/statistics/278752/coupon-websites-by-conversion-rate-in-china/. Accessed 08 Nov 2017

  • Suh E, Lim S, Hwang H, Kim S (2004) A prediction model for the purchase probability of anonymous customers to support real time web marketing: a case study. Expert Syst Appl 27(2):245–255. https://doi.org/10.1016/j.eswa.2004.01.008

    Article  Google Scholar 

  • Turow J, King J, Hoofnagle C, Bleakley A, Hennessy M (2009) Americans reject tailored advertising and three activities that enable it. SSRN Electron J. https://doi.org/10.2139/ssrn.1478214

    Google Scholar 

  • Van der Meer D, Dutta K, Datta A, Ramamritham K, Navanthe SB (2000) Enabling scalable online personalization on the web. In: Proceedings of the 2nd ACM conference on electronic commerce. ACM, pp 185–196

  • Vroomen B, Donkers B, Verhoef P, Franses P (2005) Selecting profitable customers for complex services on the Internet. J Serv Res 8(1):37–47. https://doi.org/10.1177/1094670505276681

    Article  Google Scholar 

  • Wu F, Chiu IH, Lin JR (2005) Prediction of the intention of purchase of the user surfing on the web using hidden Markov model. In: Proceedings of international conference on services systems and services management (ICSSSM’05). IEEE, pp 387–390

  • Yang Y (2010) Web user behavioral profiling for user identification. Decis Support Syst 49(3):261–271. https://doi.org/10.1016/j.dss.2010.03.001

    Article  Google Scholar 

  • Zhang Y, Bradlow E, Small D (2015) Predicting customer value using clumpiness: from RFM to RFMC. Mark Sci 34(2):195–208. https://doi.org/10.1287/mksc.2014.0873

    Article  Google Scholar 

  • Zhao Y, Yao L, Zhang Y (2016) Purchase prediction using Tmall-specific features. Concurr Comput Pract Exp 28(14):3879–3894. https://doi.org/10.1002/cpe.3720

    Article  Google Scholar 

  • Zheng Z, Padmanabhan B, Kimbrough S (2003) On the existence and significance of data preprocessing biases in web-usage mining. INFORMS J Comput 15:148–170. https://doi.org/10.1287/ijoc.15.2.148.14449

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annika Baumann.

Additional information

Accepted after two revisions by Prof. Dr. Suhl.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 967 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baumann, A., Haupt, J., Gebert, F. et al. The Price of Privacy. Bus Inf Syst Eng 61, 413–431 (2019). https://doi.org/10.1007/s12599-018-0528-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12599-018-0528-2

Keywords

Navigation