Abstract
To provide a more robust context for personalization, we desire to extract a continuum of general to specific interests of a user, called a user interest hierarchy (UIH). The higher-level interests are more general, while the lower-level interests are more specific. A UIH can represent a user’s interests at different abstraction levels and can be learned from the contents (words/phrases) in a set of web pages bookmarked by a user. We propose a divisive hierarchical clustering (DHC) algorithm to group terms (topics) into a hierarchy where more general interests are represented by a larger set of terms. Our approach does not need user involvement and learns the UIH “implicitly”. To enrich features used in the UIH, we used phrases in addition to words. Our experiment indicates that DHC with the Augmented Expected Mutual Information (AEMI) correlation function and MaxChildren threshold-finding method built more meaningful UIHs than the other combinations on average; using words and phrases as features improved the quality of UIHs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahonen H, Heinonen O, Klemettinen M, Verkamo AI (1998) Applying data mining techniques for descriptive phrase extraction in digital document collections. In: Proceedings of the advances in digital libraries conference, pp 2–11
Bellegarda JR (1998) Exploiting both local and global constraints for multi-span statistical language modeling. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2, pp 677–680
Billsus D, Pazzani MJ (1999) A hybrid user model for news story classification. In: Proceedings of the 7th international conference on user modeling. Springer, New York, pp 99–108
Chan PK (1999) A non-invasive learning approach to building web user profiles. In: KDD workshop on web usage analysis and user profiling, pp 7–12
Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Advances in knowledge discovery and data mining. AAAI/MIT, Menlo Park, pp 153–180
Croft WB, Turtle HR, Lewis DD (1991) The use of phrases and structure queries in information retrieval. In: Proceedings of the SIGIR conference on research and development in information retrieval, pp 32–45
Fagan JL (1987) Automatic phrase indexing for document retrieval. In: Proceedings of the 10th annual ACM SIGIR conference on research & development in information retrieval, pp 91–101
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172
Google (2004) http://www.google.com/
Han J (ed) (2001) Data mining: concepts and techniques. Kaufmann, San Francisco, p 338
Kim H, Chan PK (2003) Learning implicit user interest hierarchy for context in personalization. In: Proceedings of the international conference on intelligent user interfaces. ACM, New York, pp 101–108
Kim H, Chan PK (2004) Identifying variable-length meaningful phrases with correlation functions. In: Proceedings of the international conference on tools with artificial intelligence (ICTAI). IEEE, New York, pp 30–38
Kim H, Chan PK (2005) Personalized ranking of search results with implicitly learned user interest hierarchies. In: Proceedings of the 11th international conference on knowledge discovery and data mining (ACM SIGKDD WebKDD) workshop on knowledge discovery in the web, Chicago, IL. ACM, New York
Lind DA, Marchal WG, Mason RD (2002) Statistical techniques in business & economics, 11th edn. McGraw–Hill, Irwin, pp 377–412
Milligan GW, Cooper MC (1985) An examination of procedures for detecting the number of clusters in a data set. Psychometrika 50:159
Mitchell T (1997) Machine learning. McGraw–Hill, New York, pp 81–126 and 154–199
Mobasher B, Cooley R, Srivastave J (1999) Creating adaptive web sites through usage-based clustering of URLs. In: Proceedings of the 1999 IEEE knowledge and data engineering exchange workshop, pp 19–25
Pazzani M, Billsus D (1997) Learning and revising user profiles: the identification of interesting Web sites. Mach Learn 27(3):313–331
Pazzani M, Muramatsu J, Billsus D (1996) Syskill & Webert: identifying interesting web sites. In: Proceedings of the national conference on artificial intelligence, pp 54–61
Perkowitz M, Etzioni O (2000) Towards adaptive web sites: conceptual framework and case study. Artif Intel 118:245–275
Rasmussen E (1992) Clustering algorithms. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice–Hall, Englewood Cliffs
Russell S, Norvig P (eds) (1995) Artificial intelligence: a modern approach. Prentice–Hall, New York, p 74
Salton G (1989) Automatic text processing. Addison–Wesley, Reading
Trajkova J, Gauch S (2004) Improving ontology-based user profiles. In: Proceedings of the RIAO, Vaucluse, France, pp 380–389
Turpin A, Moffat A (1999) Statistical phrases for vector-space information retrieval. In: Proceedings of the SIGIR, pp 309–310
Voorhees EM (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Inf Process Manag 22(6):465–476
Wu H, Gunopulos D (2002) Evaluating the utility of statistical phrases and latent semantic indexing for text classification. In: Proceedings of IEEE international conference on data mining, pp 713–716
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of the SIGIR conference on research and development in information retrieval, pp 46–54
Zamir O, Etzioni O (1999) Groper: a dynamic clustering interface to web search results. Comput Netw 31:1361–1374
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, HR., Chan, P.K. Learning implicit user interest hierarchy for context in personalization. Appl Intell 28, 153–166 (2008). https://doi.org/10.1007/s10489-007-0056-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-007-0056-0