Abstract
Knowledge of the information goal of users is critical in website design, analyzing the efficacy of such designs, and in ensuring effective user-access to desired information. Determining the information goal is complex due to the subjective and latent nature of user information needs. This challenge is further exacerbated in media-rich websites since the semantics of media-based information is context-based and emergent. A critical step in determining information goals lies in the identification of content pages. These are the pages which contain the information the user seeks. We propose a method to automatically determine the content pages by taking into account the organization of the web site, the media-based information content, as well as the influence of a specific user browsing pattern. Given a specific browsing pattern, in our method, putative content pages are identified as the pages corresponding to the local minima of page-content entropy values. For an (unknown) user information goal this intuitively corresponds to modeling the progressive transition of the user from pages with generic information to those with specific information. Experimental investigations on media rich sites demonstrate the effectiveness of the technique and underline its potential in modeling user information needs and actions in a media-rich web.
Similar content being viewed by others
References
Bertini M, Del Bimbo A, Nunziati W (2006) Video clip matching using MPEG-7 descriptors and edit distance. Conference on Image and Video Retrieval, LCNS 4071:133–142
Bhattarai B, Wong M, Singh R (2007) Discovering user information goals with semantic website media modeling, ACM International Conference on Multi-Media Modeling. Lect Notes Comput Sci 4351:364–375
Brusilovsky P (2001) Adaptive hypermedia, user modeling and user adapted interactions 11:87–110
Chi E, Pirolli PL, Chen K, Pitkow J (2001) Using information scent to model user information needs and actions on the web. ACM CHI 490–497
Craswell N, Hawking D, Robertson S (2001) Effective site finding using link anchor information. ACM SIGIR
Deng Y, Manjunath B (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans on Pattern Analysis and Machine Intelligence 23(8):800–810
Howarth P, Rüger S (2004) Evaluation of texture features for content-based image retrieval. LCNS 3115:326–334
Kang I, Kim G (2003) Query type classification for web-document retrieval. ACM SIGIR
Lee U, Liu Z, Cho J (2005) Automatic identification of user goals in web searclh. WWW
Olston C, Chi E (2003) Scenttrails: integrating browsing and searching on the web. ACM Trans Comput-Hum Interact 10(3):177–197
Pirolli P (2003) A theory of information scent. In: Jacko J, Stephanidis C (eds) Human-computer interaction, Vol. 1, (pp. 213–217), Lawrence Erlbaum, Mahwah
Pirolli P, Card S (1999) Information foraging. Psychol Rev 106(4):643–675
Qiu F, Cho J (2006) Automatic identification of user interest for personalized search. WWW 727–736
Rose DE, Levinson D (2004) Understanding user goals in web search. WWW
Salton G, Buckley C (1988) On the use of spreading activation methods in automatic information retrieval. ACM Conference on Information Retrieval, pp. 147–160
Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Technical Report: TR87-881
Santini S, Jain R (1997) Similarity is a geometer. Multimedia Tools and Applications 5:277–306
Santini S, Gupta A, Jain R (2001) Emergent semantics through interaction in image databases. IEEE Trans Knowl Data Eng 13(3)
Singh R, Bhattarai B (2009a) Information-theoretic identification of content pages for analyzing user information needs and actions on the multimedia web. ACM Symposium on Applied Computing, pp. 1806–1810
Singh R, Bhattarai B (2009b) Analysis of usage patterns in large multimedia websites. In: Chbeir R, Hassanien A-E, Abraham A, Badr Y (eds) Emergent web intelligence. Springer Verlag (To Appear)
Singh V, Grey J, Thakar A, Szalay AS, Raddick J, Boroski B, Lebedeva S, Yanny B (2006) “SkyServer traffic report—the first five years”, Microsoft Technical Report, MSR TR-2006-190, December
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Acknowledgments
The authors thank the anonymous reviewers for comments that helped in improving the quality and presentation of the paper. The authors also thank Mike Wong for his participation in early parts of this research, results from which were published in [2]. Jim Gray from Microsoft Research showed great enthusiasm for this research, participated in numerous discussions, and was instrumental in providing access to the Skyserver logs. This work was funded, in part, by a Microsoft research grant to RS.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Singh, R., Bhattarai, B.D. Dynamic content-page identification for media-rich websites. Multimed Tools Appl 50, 491–507 (2010). https://doi.org/10.1007/s11042-010-0487-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0487-1