Skip to main content
Log in

Clustering Web video search results based on integration of multiple features

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The usage of Web video search engines has been growing at an explosive rate. Due to the ambiguity of query terms and duplicate results, a good clustering of video search results is essential to enhance user experience as well as improve retrieval performance. Existing systems that cluster videos only consider the video content itself. This paper presents the first system that clusters Web video search results by fusing the evidences from a variety of information sources besides the video content such as title, tags and description. We propose a novel framework that can integrate multiple features and enable us to adopt existing clustering algorithms. We discuss our careful design of different components of the system and a number of implementation decisions to achieve high effectiveness and efficiency. A thorough user study shows that with an innovative interface showing the clustering output, our system delivers a much better presentation of search results and hence increases the usability of video search engines significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MATH  MathSciNet  Google Scholar 

  2. Bao, S., Yang, B., Fei, B., Xu, S., Su, Z., Yu, Y.: Social propagation: boosting social annotations for web mining. World Wide Web 12(4), 399–420 (2009)

    Article  Google Scholar 

  3. Cai, D., He, X., Li, Z., Ma, W.Y., Wen, J.R.: Hierarchical clustering of WWW image search results using visual, textual and link information. In: ACM Multimedia, pp. 952–959 (2004)

    Google Scholar 

  4. Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41(3) (2009)

  5. Cheung, S.C.S., Zakhor, A.: Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13(1), 59–74 (2003)

    Article  Google Scholar 

  6. Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant interaction-time scatter/gather browsing of very large document collections. In: SIGIR, pp. 126–134 (1993)

  7. Cutting, D.R., Pedersen, J.O., Karger, D.R., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: SIGIR, pp. 318–329 (1992)

  8. Eda, T., Yoshikawa, M., Uchiyama, T., Uchiyama, T.: The effectiveness of latent semantic analysis for building up a bottom-up taxonomy from folksonomy tags. World Wide Web 12(4), 421–440 (2009)

    Article  Google Scholar 

  9. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  Google Scholar 

  10. Gao, B., Liu, T.Y., Qin, T., Zheng, X., Cheng, Q., Ma, W.Y.: Web image clustering by consistent utilization of visual features and surrounding texts. In: ACM Multimedia, pp. 112–121 (2005)

  11. García, R., Gimeno, J.M., Perdrix, F., Gil, R., Oliva, M., López, J.M., Pascual, A., Sendín, M.: Building a usable and accessible semantic Web interaction platform. World Wide Web 13(1–2), 143–167 (2010)

    Article  Google Scholar 

  12. Gibbon, D.C., Liu, Z.: Introduction to Video Search Engines. Springer (2008)

  13. Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. The Johns Hopkins University Press (1996)

  14. Huang, Z., Shen, H.T., Shao, J., Zhou, X., Cui, B.: Bounded coordinate system indexing for real-time video clip search. ACM Trans. Inf. Syst. 27(3) (2009)

  15. Islam, A., Inkpen, D.Z.: Semantic text similarity using corpus-based word similarity and string similarity. TKDD 2(2) (2008)

  16. Jansen, B.J., Campbell, G., Gregg, M.: Real time search user behavior. In: CHI Extended Abstracts, pp. 3961–3966 (2010)

  17. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the Web. Inf. Process. Manag. 36(2), 207–227 (2000)

    Article  Google Scholar 

  18. Jing, F., Wang, C., Yao, Y., Deng, K., Zhang, L., Ma, W.Y.: Igroup: web image search results clustering. In: ACM Multimedia, pp. 377–384 (2006)

    Google Scholar 

  19. Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: WWW, pp. 658–665 (2004)

  20. Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.A.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  21. Liu, S., Zhu, M., Zheng, Q.: Mining similarities for clustering Web video clips. In: CSSE (4), pp. 759–762 (2008)

  22. Mecca, G., Raunich, S., Pappalardo, A.: A new algorithm for clustering search results. Data Knowl. Eng. 62(3), 504–522 (2007)

    Article  Google Scholar 

  23. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI (2006)

  24. Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005)

    Article  Google Scholar 

  25. Rege, M., Dong, M., Hua, J.: Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In: WWW, pp. 317–326 (2008)

  26. Shah, C.: Tubekit: a query-based youtube crawling toolkit. In: JCDL, p. 433 (2008)

  27. Shen, H.T., Ooi, B.C., Zhou, X., Huang, Z.: Towards effective indexing for very large video sequence database. In: SIGMOD Conference, pp. 730–741 (2005)

  28. Shen, H.T., Zhou, X., Cui, B.: Indexing and integrating multiple features for WWW images. World Wide Web 9(3), 343–364 (2006)

    Article  Google Scholar 

  29. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  30. Siorpaes, K., Simperl, E.P.B.: Human intelligence in the process of semantic content creation. World Wide Web 13(1–2), 33–59 (2010)

    Article  Google Scholar 

  31. Snoek, C., Worring, M.: Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools Appl. 25(1), 5–35 (2005)

    Article  Google Scholar 

  32. Taddesse, F.G., Tekli, J., Chbeir, R., Viviani, M., Yétongnon, K.: Semantic-based merging of rss items. World Wide Web 13(1–2), 169–207 (2010)

    Article  Google Scholar 

  33. Wang, H., Divakaran, A., Vetro, A., Chang, S.F., Sun, H.: Survey of compressed-domain features used in audio-visual indexing and analysis. J. Vis. Commun. Image Represent. 14(2), 150–183 (2003)

    Article  Google Scholar 

  34. Wang, X.J., Ma, W.Y., Zhang, L., Li, X.: Iteratively clustering web images based on link and attribute reinforcements. In: ACM Multimedia, pp. 122–131 (2005)

  35. Woodruff, A., Rosenholtz, R., Morrison, J.B., Faulring, A., Pirolli, P.: A comparison of the use of text summaries, plain thumbnails, and enhanced thumbnails for web search tasks. JASIST 53(2), 172–185 (2002)

    Article  Google Scholar 

  36. Xu, S., Jin, T., Lau, F.C.M.: A new visual search interface for web browsing. In: WSDM, pp. 152–161 (2009)

  37. Yang, J., Li, Q., Wenyin, L., Zhuang, Y.: Searching for flash movies on the web: A content and context based framework. World Wide Web 8(4), 495–517 (2005)

    Article  Google Scholar 

  38. Zamir, O., Etzioni, O.: Grouper: A dynamic clustering interface to web search results. Comput. Networks 31(11–16), 1361–1374 (1999)

    Article  Google Scholar 

  39. Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y., Ma, J.: Learning to cluster Web search results. In: SIGIR, pp. 210–217 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hindle, A., Shao, J., Lin, D. et al. Clustering Web video search results based on integration of multiple features. World Wide Web 14, 53–73 (2011). https://doi.org/10.1007/s11280-010-0097-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0097-x

Keywords

Navigation