Click data guided query modeling with click propagation and sparse coding

Tan, Min; Yu, Jun; Huang, Qingming; Wu, Weichen

doi:10.1007/s11042-018-5703-4

Click data guided query modeling with click propagation and sparse coding

Published: 31 January 2018

Volume 77, pages 22145–22158, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Min Tan ORCID: orcid.org/0000-0002-1842-4050¹,
Jun Yu¹,
Qingming Huang² &
…
Weichen Wu¹

278 Accesses
5 Citations
Explore all metrics

Abstract

We address the problem of fine-grained image recognition using user click data, wherein each image is represented as a semantical query-click feature vector. Usually, the query set obtained from search engines is large-scale and redundant, making the click feature be high-dimensional and sparse. We propose a novel query modeling approach to merge semantically similar queries, and construct a compact click feature with the merged queries. To deal with the sparsity and in-consistency in click feature, we design a graph based propagation approach to predict the zero-clicks, ensuring similar images have similar clicks for each query. Afterwards, using the propagated click feature, we formulate the query merging problem as a sparse coding based recognition task. In addition, the hot queries are utilized to construct the dictionary. We evaluate our method for fine-grained image recognition on the public Clickture-Dog dataset. It is shown that, the propagated click feature performs much better than the original one. In the query merging procedure, sparse coding performs better than traditional K-mean algorithm. Also, the “hot queries” outperform K-SVD in dictionary learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Notes

https://github.com/Zjutanmin/SCodeByClick.git.
The optimal α is 0.9 and 0.5 for Prop-E and Prop-W respectively.
We use VGG-net [13] with 16-layers to learn a CNN model, including 13 convolutional layers and 3 fully connected layers. It is pre-trained on ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)-2012 dataset.

References

Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: large-scale fine-grained visual categorization of birds. In: IEEE Conference on computer vision and pattern recognition, pp 2019–2026
Chang YS (2017) Fine-grained attention for image caption generation. Multimed Tool Appl PP(7):1–13
Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Article Google Scholar
Datta D, Singh SK, Chowdary CR (2017) Bridging the gap: effect of text query reformulation in multimodal retrieval. Multimed Tool Appl 76:1–18
Article Google Scholar
Feng L, Bhanu B (2016) Semantic concept co-occurrence patterns for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 38(4):1–1
Article Google Scholar
Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM International conference on multimedia. ACM, pp 243–252
Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition. Colorado Springs, CO
Li C, Song Q, Wang Y, Song H, Kang Q, Cheng J, Lu H (2016) Learning to recognition from bing clickture data. In: IEEE International conference on multimedia and expo workshops, pp 1–4
Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Article Google Scholar
Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering: enriching text qa with media information. In: ACM SIGIR Conference on research and development in information retrieval, SIGIR ‘11. ACM, pp 695–704
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13:1–13:23
Article Google Scholar
Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM International conference on multimedia, MM’12. ACM, pp 59–68
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Tan M, Wang Y, Pan G (2012) Feature reduction for efficient object detection via L1-norm latent SVM. In: Intelligent science and intelligent data engineering
Tan M, Pan G, Wang Y, Zhang Y, Wu Z (2014) L1-norm latent svm for compact features in object detection. Neurocomputing 139(139):56–64
Article Google Scholar
Tan M, Hu Z, Wang B, Zhao J, Wang Y (2016) Robust object recognition via weakly supervised metric and template learning. Neurocomputing 101:96–107
Article Google Scholar
Tan M, Wang B, Wu Z, Wang J, Pan G (2016) Weakly supervised metric learning for traffic sign recognition in a lidar-equipped vehicle. IEEE Trans Intell Transp Syst 17(5):1415–1427. https://doi.org/10.1109/TITS.2015.2506182
Article Google Scholar
Tan M, Yu J, Zheng G, Wu W, Sun K (2016) Deep neural network boosted large scale image recognition using user click data. In: International conference on internet multimedia computing and service, pp 118–121
Tsung-Yu Lin AR, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: IEEE International conference on computer vision
Wang R, Liu T, Tao D (2017) Multiclass learning with partially corrupted labels. IEEE Trans Neural Netw Learn Syst PP(99):1–13
Google Scholar
Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502
Yan C, Luo M, Liu W, Zheng Q (2017) Robust dictionary learning with graph regularization for unsupervised person re-identification. Multimed Tool Appl (2):1–25
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Article Google Scholar
Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process 21(11):4636–4648
Article MathSciNet MATH Google Scholar
Yu J, Rui Y, Chen B (2014) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimedia 16(1):159–168
Article Google Scholar
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Article MathSciNet MATH Google Scholar
Yu J, Tao D, Meng W, Yong R (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Article Google Scholar
Zhang H, Zha ZJ, Yang Y, Yan S, Chua TS (2014) Robust (semi) nonnegative graph embedding. IEEE Trans Image Process A Publ the IEEE Signal Process Society 23(7):2996–3012
Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2014) Attribute-augmented semantic hierarchy:towards a unified framework for content-based image retrieval. ACM Trans Multimed Comput Commun Appl 11(1s):1–21
Article Google Scholar
Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: venue category estimation from micro-video. In: ACM On multimedia conference, pp 1415–1424
Zhang Y, Wei XS, Wu J, Cai J (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725
Article MathSciNet Google Scholar
Zhang H, Huang Y, Xu X, Zhu Z, Deng C (2017) Latent semantic factorization for multimedia representation learning. Multimed Tool Appl (1):1–16
Zheng G, Tan M, Yu J, Wu Q, Fan J (2017) Fine-grained image recongnition via weakly supervised click data guided bilinear cnn model. In: IEEE International conference on multimedia and expo (accpet). IEEE

Download references

Acknowledgments

This work was partly supported by National Natural Science Foundation of China (No. 61602136, No.61622205, No. 61472110), and Zhejiang Provincial Natural Science Foundation of China under Grant LR15F020002.

Author information

Authors and Affiliations

Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
Min Tan, Jun Yu & Weichen Wu
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, 101408, China
Qingming Huang

Authors

Min Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qingming Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weichen Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, M., Yu, J., Huang, Q. et al. Click data guided query modeling with click propagation and sparse coding. Multimed Tools Appl 77, 22145–22158 (2018). https://doi.org/10.1007/s11042-018-5703-4

Download citation

Received: 06 September 2017
Revised: 02 December 2017
Accepted: 22 January 2018
Published: 31 January 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5703-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Click data guided query modeling with click propagation and sparse coding

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Click data guided query modeling with click propagation and sparse coding

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation