ABSTRACT
Clickstream data collected at any web site (site-centric data) is inherently incomplete, since it does not capture users' browsing behavior across sites (user-centric data). Hence, models learned from such data may be subject to limitations, the nature of which has not been well studied. Understanding the limitations is particularly important since most current personalization techniques are based on site-centric data only. In this paper, we empirically examine the implications of learning from incomplete data in the context of two specific problems: (a) predicting if the remainder of any given session will result in a purchase and (b) predicting if a given user will make a purchase at any future session. For each of these problems we present new algorithms for fast and accurate data preprocessing of clickstream data. Based on a comprehensive experiment on user-level clickstream data gathered from 20,000 users' browsing behavior, we demonstrate that models built on user-centric data outperform models built on site-centric data for both prediction tasks.
- 1.Adomavicius, G., and Tuzhilin, A., 1999, User Profiling in Personalization Applications through Rule Discovery and Validation, KDD-99, pp. 377-381, San Diego. Google ScholarDigital Library
- 2.Aggarwal, C.C., Sun, Z., and Yu, P.S., 1998, Online Generation of Profile Association Rules'. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- 3.Ansari, S., 2000. Integrating E-Commerce and Data Mining: Architecture and Challenges, Web-KDD, Aug., 2000.Google Scholar
- 4.Brodley, C., and Kohavi, R., 2000, Peel the Onion, KDD- CUP 2000, Boston, 2000.Google Scholar
- 5.Chan, P.K., 1999. A Non-Invasive Learning Approach to Building Web User Profiles. In Proceedings WebKDD 1999.Google Scholar
- 6.Cutler, M, 2000, E-Metrics: Tomorrow's Business Metrics Today, In the Proceedings of the Sixth ACM SIGKDD International Conference on KDD, KDD 2000, Boston, Aug. 2000. Google ScholarDigital Library
- 7.Johnson, E., Moe, W., Fader, P., Bellman, S., and Lohse, J., 2000, On the Depth and Dynamics of Online Search Behavior, Wharton School Working Paper #00-014, June, 2000.Google Scholar
- 8.Khabaza, T., 2001, "As E-asy as Falling Offa Web Log, Data mining Hits the Web", SPSS Data Mining Magazine, January.Google Scholar
- 9.Kimbrough, S., Padmanabhan, B., and Zheng, Z., 2000, On Usage Metric for Determining Authoritative Sites, In the Proceedings of WITS 2000, Brisbane, Australia.Google Scholar
- 10.Korgaonkar, P., and Wolin, L.D., 1999, A Multivariate analysis of Web usage, J. of Advertising Research, 39, pp 53-68.Google Scholar
- 11.Mena, J., 1999, "Data Mining Your Website", Digital Press of Butterworth-Heinemann. Google ScholarDigital Library
- 12.Mobasher, B., Dai H., 2000, Discovery of Aggregate Usage Profiles for Web Personalization, Web-KDD, Aug., 2000Google Scholar
- 13.Mobasher, B., Cooley, R., Srivastava J., 1999, Automatic Personalization Based on Web Usage Mining, Technical Report of Depaul University, TR 99-010.Google Scholar
- 14.Moe, W., and Fader, P., 2000, Which Visits Lead to Purchases? Dynamic Conversion Behavior at e-Commerce Sites, The Wharton School, Working Paper #00-023. Aug. 2000 (A)Google Scholar
- 15.Moe, W., and Fader, P., 2000, Capturing Evolving Visit Behavior in Clickstream Data, The Wharton School, Working Paper #00-003, Aug. 2000 (B).Google Scholar
- 16.Novak, T., and Hoffman, D., 1997, New Metrics for New Media: Toward the Development of Web Measurement Standards, World Wide Web Journal 2(1), pp. 213-246. Google ScholarDigital Library
- 17.Nasraoui, O., Frigui, H., Joshi, A., Krishnapuram, R., 1999, Mining Web Access Logs Using Relational Competitive Fuzzy Clustering, In the Proceedings of the Eight International Fuzzy Systems Association World Congress, Taipei, August, 1999.Google Scholar
- 18.Padmanabhan, B., Zheng, Z., Kimbrough, S., 2001, A Comparison of Site-Centric and User-Centric Data Mining Approaches to Predicting Session-Level Purchase Behavior on the Web, The Wharton School OPIM Dept Working Paper 01-01-03.Google Scholar
- 19.Park, Y., Fader, P., 2000, Modeling Browsing Behavior at Multiple Sites, In the Proceedings of Informs Marketing Science Conference, Los Angels, June 2000.Google Scholar
- 20.Perkowitz, M., Etzioni, O, 1997, Adaptive web sites: an AI challenge, In Proceedings of the 15th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- 21.Perkowitz, M., Etzioni, O, 1997, Adaptive sites: Automatically synthesizing web pages, In Proe. of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), pages 727--732, Madison, Wisconsin, July 1998. Google ScholarDigital Library
- 22.Pitkow, J., 1998, Summary of WWW Characterizations, Computer Networks And ISDN Systerns(30:1-7), p551-558. Google ScholarDigital Library
- 23.Schechter, S., Krishnan, M., and Smith, M., 1998. Using Path Profiles to Predict HTrP Requests, In the Proceedings of the 7 th Int'l. WWW Conference, Brisbane, Australia. Google ScholarDigital Library
- 24.Sen, S., Padmanabhan, B., Tuzhilin, A., White, N., and Stein, R., 1998. The Identification and Satisfaction Of Consumer Analysis-Driven Information Needs Of Marketers on The WWW, European Journal Of Marketing (32:7/8), pp. 688- 702.Google Scholar
- 25.Theusinger, C., Huber, K., 2000. Analyzing the Footsteps of Your Customers, Web-KDD 2000.Google Scholar
- 26.VanderMeer, D., Dutta, K., Datta, A., 2000, Enabling Scalable Online Personalization on the Web, In the Proceedings of Electronic Commerce (EC00)/ACM, Oct., 2000, Minneapolis. Google ScholarDigital Library
Index Terms
- Personalization from incomplete data: what you don't know can hurt
Recommendations
Collecting Quality Data for Database Mining
AI '01: Proceedings of the 14th Australian Joint Conference on Artificial Intelligence: Advances in Artificial IntelligenceData collecting is necessary to some organizations such as nuclear power plants and earthquake bureaus, which have very small databases. Traditional data collecting is to obtain necessary data from internal and external data-sources and join all data ...
Web personalization based on usage mining
FDIA'09: Proceedings of the Third BCS-IRSG conference on Future Directions in Information AccessPersonalized or recommender systems are a particular type of information filtering applications. User profiles, representing the information needs and preferences of users, can be inferred from log or clickthrough data, or the ratings that users provide ...
Analysis of Data Extraction and Data Cleaning in Web Usage Mining
ICARCSET '15: Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015)Data preprocessing is considered as an important phase of Web usage mining due to unstructured, heterogeneous and noisy nature of log data. Complete and effective data preprocessing insures the efficiency and scalability of algorithms used in pattern ...
Comments