skip to main content
10.1145/502512.502535acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Personalization from incomplete data: what you don't know can hurt

Authors Info & Claims
Published:26 August 2001Publication History

ABSTRACT

Clickstream data collected at any web site (site-centric data) is inherently incomplete, since it does not capture users' browsing behavior across sites (user-centric data). Hence, models learned from such data may be subject to limitations, the nature of which has not been well studied. Understanding the limitations is particularly important since most current personalization techniques are based on site-centric data only. In this paper, we empirically examine the implications of learning from incomplete data in the context of two specific problems: (a) predicting if the remainder of any given session will result in a purchase and (b) predicting if a given user will make a purchase at any future session. For each of these problems we present new algorithms for fast and accurate data preprocessing of clickstream data. Based on a comprehensive experiment on user-level clickstream data gathered from 20,000 users' browsing behavior, we demonstrate that models built on user-centric data outperform models built on site-centric data for both prediction tasks.

References

  1. 1.Adomavicius, G., and Tuzhilin, A., 1999, User Profiling in Personalization Applications through Rule Discovery and Validation, KDD-99, pp. 377-381, San Diego. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Aggarwal, C.C., Sun, Z., and Yu, P.S., 1998, Online Generation of Profile Association Rules'. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Ansari, S., 2000. Integrating E-Commerce and Data Mining: Architecture and Challenges, Web-KDD, Aug., 2000.Google ScholarGoogle Scholar
  4. 4.Brodley, C., and Kohavi, R., 2000, Peel the Onion, KDD- CUP 2000, Boston, 2000.Google ScholarGoogle Scholar
  5. 5.Chan, P.K., 1999. A Non-Invasive Learning Approach to Building Web User Profiles. In Proceedings WebKDD 1999.Google ScholarGoogle Scholar
  6. 6.Cutler, M, 2000, E-Metrics: Tomorrow's Business Metrics Today, In the Proceedings of the Sixth ACM SIGKDD International Conference on KDD, KDD 2000, Boston, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Johnson, E., Moe, W., Fader, P., Bellman, S., and Lohse, J., 2000, On the Depth and Dynamics of Online Search Behavior, Wharton School Working Paper #00-014, June, 2000.Google ScholarGoogle Scholar
  8. 8.Khabaza, T., 2001, "As E-asy as Falling Offa Web Log, Data mining Hits the Web", SPSS Data Mining Magazine, January.Google ScholarGoogle Scholar
  9. 9.Kimbrough, S., Padmanabhan, B., and Zheng, Z., 2000, On Usage Metric for Determining Authoritative Sites, In the Proceedings of WITS 2000, Brisbane, Australia.Google ScholarGoogle Scholar
  10. 10.Korgaonkar, P., and Wolin, L.D., 1999, A Multivariate analysis of Web usage, J. of Advertising Research, 39, pp 53-68.Google ScholarGoogle Scholar
  11. 11.Mena, J., 1999, "Data Mining Your Website", Digital Press of Butterworth-Heinemann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Mobasher, B., Dai H., 2000, Discovery of Aggregate Usage Profiles for Web Personalization, Web-KDD, Aug., 2000Google ScholarGoogle Scholar
  13. 13.Mobasher, B., Cooley, R., Srivastava J., 1999, Automatic Personalization Based on Web Usage Mining, Technical Report of Depaul University, TR 99-010.Google ScholarGoogle Scholar
  14. 14.Moe, W., and Fader, P., 2000, Which Visits Lead to Purchases? Dynamic Conversion Behavior at e-Commerce Sites, The Wharton School, Working Paper #00-023. Aug. 2000 (A)Google ScholarGoogle Scholar
  15. 15.Moe, W., and Fader, P., 2000, Capturing Evolving Visit Behavior in Clickstream Data, The Wharton School, Working Paper #00-003, Aug. 2000 (B).Google ScholarGoogle Scholar
  16. 16.Novak, T., and Hoffman, D., 1997, New Metrics for New Media: Toward the Development of Web Measurement Standards, World Wide Web Journal 2(1), pp. 213-246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Nasraoui, O., Frigui, H., Joshi, A., Krishnapuram, R., 1999, Mining Web Access Logs Using Relational Competitive Fuzzy Clustering, In the Proceedings of the Eight International Fuzzy Systems Association World Congress, Taipei, August, 1999.Google ScholarGoogle Scholar
  18. 18.Padmanabhan, B., Zheng, Z., Kimbrough, S., 2001, A Comparison of Site-Centric and User-Centric Data Mining Approaches to Predicting Session-Level Purchase Behavior on the Web, The Wharton School OPIM Dept Working Paper 01-01-03.Google ScholarGoogle Scholar
  19. 19.Park, Y., Fader, P., 2000, Modeling Browsing Behavior at Multiple Sites, In the Proceedings of Informs Marketing Science Conference, Los Angels, June 2000.Google ScholarGoogle Scholar
  20. 20.Perkowitz, M., Etzioni, O, 1997, Adaptive web sites: an AI challenge, In Proceedings of the 15th International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.Perkowitz, M., Etzioni, O, 1997, Adaptive sites: Automatically synthesizing web pages, In Proe. of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), pages 727--732, Madison, Wisconsin, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.Pitkow, J., 1998, Summary of WWW Characterizations, Computer Networks And ISDN Systerns(30:1-7), p551-558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Schechter, S., Krishnan, M., and Smith, M., 1998. Using Path Profiles to Predict HTrP Requests, In the Proceedings of the 7 th Int'l. WWW Conference, Brisbane, Australia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Sen, S., Padmanabhan, B., Tuzhilin, A., White, N., and Stein, R., 1998. The Identification and Satisfaction Of Consumer Analysis-Driven Information Needs Of Marketers on The WWW, European Journal Of Marketing (32:7/8), pp. 688- 702.Google ScholarGoogle Scholar
  25. 25.Theusinger, C., Huber, K., 2000. Analyzing the Footsteps of Your Customers, Web-KDD 2000.Google ScholarGoogle Scholar
  26. 26.VanderMeer, D., Dutta, K., Datta, A., 2000, Enabling Scalable Online Personalization on the Web, In the Proceedings of Electronic Commerce (EC00)/ACM, Oct., 2000, Minneapolis. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Personalization from incomplete data: what you don't know can hurt

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2001
        493 pages
        ISBN:158113391X
        DOI:10.1145/502512

        Copyright © 2001 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 August 2001

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader