Skip to main content
Log in

Dynamic content-page identification for media-rich websites

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Knowledge of the information goal of users is critical in website design, analyzing the efficacy of such designs, and in ensuring effective user-access to desired information. Determining the information goal is complex due to the subjective and latent nature of user information needs. This challenge is further exacerbated in media-rich websites since the semantics of media-based information is context-based and emergent. A critical step in determining information goals lies in the identification of content pages. These are the pages which contain the information the user seeks. We propose a method to automatically determine the content pages by taking into account the organization of the web site, the media-based information content, as well as the influence of a specific user browsing pattern. Given a specific browsing pattern, in our method, putative content pages are identified as the pages corresponding to the local minima of page-content entropy values. For an (unknown) user information goal this intuitively corresponds to modeling the progressive transition of the user from pages with generic information to those with specific information. Experimental investigations on media rich sites demonstrate the effectiveness of the technique and underline its potential in modeling user information needs and actions in a media-rich web.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bertini M, Del Bimbo A, Nunziati W (2006) Video clip matching using MPEG-7 descriptors and edit distance. Conference on Image and Video Retrieval, LCNS 4071:133–142

    Article  Google Scholar 

  2. Bhattarai B, Wong M, Singh R (2007) Discovering user information goals with semantic website media modeling, ACM International Conference on Multi-Media Modeling. Lect Notes Comput Sci 4351:364–375

    Article  Google Scholar 

  3. Brusilovsky P (2001) Adaptive hypermedia, user modeling and user adapted interactions 11:87–110

    Google Scholar 

  4. Chi E, Pirolli PL, Chen K, Pitkow J (2001) Using information scent to model user information needs and actions on the web. ACM CHI 490–497

  5. Craswell N, Hawking D, Robertson S (2001) Effective site finding using link anchor information. ACM SIGIR

  6. Deng Y, Manjunath B (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans on Pattern Analysis and Machine Intelligence 23(8):800–810

    Article  Google Scholar 

  7. Howarth P, Rüger S (2004) Evaluation of texture features for content-based image retrieval. LCNS 3115:326–334

    Google Scholar 

  8. http://htmlparser.sourceforge.net

  9. Kang I, Kim G (2003) Query type classification for web-document retrieval. ACM SIGIR

  10. Lee U, Liu Z, Cho J (2005) Automatic identification of user goals in web searclh. WWW

  11. Olston C, Chi E (2003) Scenttrails: integrating browsing and searching on the web. ACM Trans Comput-Hum Interact 10(3):177–197

    Article  Google Scholar 

  12. Pirolli P (2003) A theory of information scent. In: Jacko J, Stephanidis C (eds) Human-computer interaction, Vol. 1, (pp. 213–217), Lawrence Erlbaum, Mahwah

  13. Pirolli P, Card S (1999) Information foraging. Psychol Rev 106(4):643–675

    Article  Google Scholar 

  14. Qiu F, Cho J (2006) Automatic identification of user interest for personalized search. WWW 727–736

  15. Rose DE, Levinson D (2004) Understanding user goals in web search. WWW

  16. Salton G, Buckley C (1988) On the use of spreading activation methods in automatic information retrieval. ACM Conference on Information Retrieval, pp. 147–160

  17. Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Technical Report: TR87-881

  18. Santini S, Jain R (1997) Similarity is a geometer. Multimedia Tools and Applications 5:277–306

    Article  Google Scholar 

  19. Santini S, Gupta A, Jain R (2001) Emergent semantics through interaction in image databases. IEEE Trans Knowl Data Eng 13(3)

  20. Singh R, Bhattarai B (2009a) Information-theoretic identification of content pages for analyzing user information needs and actions on the multimedia web. ACM Symposium on Applied Computing, pp. 1806–1810

  21. Singh R, Bhattarai B (2009b) Analysis of usage patterns in large multimedia websites. In: Chbeir R, Hassanien A-E, Abraham A, Badr Y (eds) Emergent web intelligence. Springer Verlag (To Appear)

  22. Singh V, Grey J, Thakar A, Szalay AS, Raddick J, Boroski B, Lebedeva S, Yanny B (2006) “SkyServer traffic report—the first five years”, Microsoft Technical Report, MSR TR-2006-190, December

  23. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous reviewers for comments that helped in improving the quality and presentation of the paper. The authors also thank Mike Wong for his participation in early parts of this research, results from which were published in [2]. Jim Gray from Microsoft Research showed great enthusiasm for this research, participated in numerous discussions, and was instrumental in providing access to the Skyserver logs. This work was funded, in part, by a Microsoft research grant to RS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, R., Bhattarai, B.D. Dynamic content-page identification for media-rich websites. Multimed Tools Appl 50, 491–507 (2010). https://doi.org/10.1007/s11042-010-0487-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0487-1

Keywords

Navigation