Skip to main content
Log in

Learning implicit user interest hierarchy for context in personalization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

To provide a more robust context for personalization, we desire to extract a continuum of general to specific interests of a user, called a user interest hierarchy (UIH). The higher-level interests are more general, while the lower-level interests are more specific. A UIH can represent a user’s interests at different abstraction levels and can be learned from the contents (words/phrases) in a set of web pages bookmarked by a user. We propose a divisive hierarchical clustering (DHC) algorithm to group terms (topics) into a hierarchy where more general interests are represented by a larger set of terms. Our approach does not need user involvement and learns the UIH “implicitly”. To enrich features used in the UIH, we used phrases in addition to words. Our experiment indicates that DHC with the Augmented Expected Mutual Information (AEMI) correlation function and MaxChildren threshold-finding method built more meaningful UIHs than the other combinations on average; using words and phrases as features improved the quality of UIHs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ahonen H, Heinonen O, Klemettinen M, Verkamo AI (1998) Applying data mining techniques for descriptive phrase extraction in digital document collections. In: Proceedings of the advances in digital libraries conference, pp 2–11

  2. Bellegarda JR (1998) Exploiting both local and global constraints for multi-span statistical language modeling. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2, pp 677–680

  3. Billsus D, Pazzani MJ (1999) A hybrid user model for news story classification. In: Proceedings of the 7th international conference on user modeling. Springer, New York, pp 99–108

    Google Scholar 

  4. Chan PK (1999) A non-invasive learning approach to building web user profiles. In: KDD workshop on web usage analysis and user profiling, pp 7–12

  5. Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Advances in knowledge discovery and data mining. AAAI/MIT, Menlo Park, pp 153–180

    Google Scholar 

  6. Croft WB, Turtle HR, Lewis DD (1991) The use of phrases and structure queries in information retrieval. In: Proceedings of the SIGIR conference on research and development in information retrieval, pp 32–45

  7. Fagan JL (1987) Automatic phrase indexing for document retrieval. In: Proceedings of the 10th annual ACM SIGIR conference on research & development in information retrieval, pp 91–101

  8. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172

    Google Scholar 

  9. Google (2004) http://www.google.com/

  10. Han J (ed) (2001) Data mining: concepts and techniques. Kaufmann, San Francisco, p 338

    Google Scholar 

  11. Kim H, Chan PK (2003) Learning implicit user interest hierarchy for context in personalization. In: Proceedings of the international conference on intelligent user interfaces. ACM, New York, pp 101–108

    Google Scholar 

  12. Kim H, Chan PK (2004) Identifying variable-length meaningful phrases with correlation functions. In: Proceedings of the international conference on tools with artificial intelligence (ICTAI). IEEE, New York, pp 30–38

    Google Scholar 

  13. Kim H, Chan PK (2005) Personalized ranking of search results with implicitly learned user interest hierarchies. In: Proceedings of the 11th international conference on knowledge discovery and data mining (ACM SIGKDD WebKDD) workshop on knowledge discovery in the web, Chicago, IL. ACM, New York

    Google Scholar 

  14. Lind DA, Marchal WG, Mason RD (2002) Statistical techniques in business & economics, 11th edn. McGraw–Hill, Irwin, pp 377–412

    Google Scholar 

  15. Milligan GW, Cooper MC (1985) An examination of procedures for detecting the number of clusters in a data set. Psychometrika 50:159

    Article  Google Scholar 

  16. Mitchell T (1997) Machine learning. McGraw–Hill, New York, pp 81–126 and 154–199

    MATH  Google Scholar 

  17. Mobasher B, Cooley R, Srivastave J (1999) Creating adaptive web sites through usage-based clustering of URLs. In: Proceedings of the 1999 IEEE knowledge and data engineering exchange workshop, pp 19–25

  18. Pazzani M, Billsus D (1997) Learning and revising user profiles: the identification of interesting Web sites. Mach Learn 27(3):313–331

    Article  Google Scholar 

  19. Pazzani M, Muramatsu J, Billsus D (1996) Syskill & Webert: identifying interesting web sites. In: Proceedings of the national conference on artificial intelligence, pp 54–61

  20. Perkowitz M, Etzioni O (2000) Towards adaptive web sites: conceptual framework and case study. Artif Intel 118:245–275

    Article  MATH  Google Scholar 

  21. Rasmussen E (1992) Clustering algorithms. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice–Hall, Englewood Cliffs

    Google Scholar 

  22. Russell S, Norvig P (eds) (1995) Artificial intelligence: a modern approach. Prentice–Hall, New York, p 74

    MATH  Google Scholar 

  23. Salton G (1989) Automatic text processing. Addison–Wesley, Reading

    Google Scholar 

  24. Trajkova J, Gauch S (2004) Improving ontology-based user profiles. In: Proceedings of the RIAO, Vaucluse, France, pp 380–389

  25. Turpin A, Moffat A (1999) Statistical phrases for vector-space information retrieval. In: Proceedings of the SIGIR, pp 309–310

  26. Voorhees EM (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Inf Process Manag 22(6):465–476

    Article  Google Scholar 

  27. Wu H, Gunopulos D (2002) Evaluating the utility of statistical phrases and latent semantic indexing for text classification. In: Proceedings of IEEE international conference on data mining, pp 713–716

  28. Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of the SIGIR conference on research and development in information retrieval, pp 46–54

  29. Zamir O, Etzioni O (1999) Groper: a dynamic clustering interface to web search results. Comput Netw 31:1361–1374

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyoung-Rae Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, HR., Chan, P.K. Learning implicit user interest hierarchy for context in personalization. Appl Intell 28, 153–166 (2008). https://doi.org/10.1007/s10489-007-0056-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-007-0056-0

Keywords

Navigation