research-article

Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network

Authors:

Tiejun ZhaoAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 441 - 450

https://doi.org/10.1145/2733373.2806243

Published: 13 October 2015 Publication History

Abstract

Labelled image datasets are the backbone for high-level image understanding tasks with wide application scenarios, and continuously drive and evaluate the progress of feature designing and supervised learning models. Recently, the million scale labelled image dataset further contributes to the rebirth of deep convolutional neural network and bypass manual designing handcraft features. However, the construction process of image dataset is mainly manual-based and quite labor intensive, which often take years' efforts to construct a million scale dataset with high quality. In this paper, we propose a deep learning based method to construct large scale image dataset in an automatic way. Specifically, word representation and image representation are learned in a deep neural network from large amount of click-through logs, and further used to define word-word similarity and image-word similarity. These two similarities are used to automatize the two labor intensive steps in manual-based image dataset construction: query formation and noisy image removal. With a new proposed cross convolutional filter regularizer, we can construct a million scale image dataset in one week. Finally, two image datasets are constructed to verify the effectiveness of the method. In addition to scale, the automatically constructed dataset has comparable accuracy, diversity and cross-dataset generalization with manually labelled image datasets.

References

[1]

Y. Bai, W. Yu, T. Xiao, C. Xu, K. Yang, W.-Y. Ma, and T. Zhao. Bag-of-words based deep neural network for image retrieval. In Proceedings of the ACM International Conference on Multimedia, pages 229--232. ACM, 2014.

Digital Library

[2]

D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia, pages 223--232. ACM, 2013.

Digital Library

[3]

X. Chen, A. Shrivastava, and A. Gupta. Neil: Extracting visual knowledge from web data. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1409--1416. IEEE, 2013.

Digital Library

[4]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In CIVR, Santorini, Greece., 2009.

Digital Library

[5]

B. Collins, J. Deng, K. Li, and L. Fei-Fei. Towards scalable dataset construction: An active learning approach. In ECCV. 2008.

Digital Library

[6]

R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML, 2008.

Digital Library

[7]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.

[8]

S. K. Divvala, A. Farhadi, and C. Guestrin. Learning everything about anything: Webly-supervised visual concept learning. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3270--3277. IEEE, 2014.

Digital Library

[9]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.

[10]

L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(4):594--611, 2006.

Digital Library

[11]

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121--2129, 2013.

Digital Library

[12]

G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.

[13]

X.-S. Hua and J. Li. Prajna: Towards recognizing whatever you want from images without image labeling. AAAI - Association for the Advancement of Artificial Intelligence, January 2015.

[14]

X.-S. Hua, L. Yang, J. Wang, J. Wang, M. Ye, K. Wang, Y. Rui, and J. Li. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In ACM Multimedia, 2013.

Digital Library

[15]

A. Krizhevsky. cuda-convnet2, 2014. http://code.google.com/p/cuda-convnet2/.

[16]

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.

[17]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

Digital Library

[18]

L.-J. Li and L. Fei-Fei. Optimol: automatic online picture collection via incremental model learning. International journal of computer vision, 88(2):147--168, 2010.

Digital Library

[19]

G. A. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995.

Digital Library

[20]

J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), 12, 2014.

[21]

B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. IJCV, 77(1--3):157--173, 2008.

Digital Library

[22]

A. Shrivastava, S. Singh, and A. Gupta. Constrained semi-supervised learning using attributes and comparative attributes. In ECCV, pages 369--383. 2012.

Digital Library

[23]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[24]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.

[25]

A. Torralba and A. A. Efros. Unbiased look at dataset bias. In CVPR, 2011.

Digital Library

[26]

A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958--1970, 2008.

Digital Library

[27]

L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579--2605):85, 2008.

[28]

J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, volume 11, pages 2764--2770, 2011.

Digital Library

[29]

J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: large-scale scene recognition from abbey to zoo. In CVPR, 2010.

[30]

W. Yu, K. Yang, Y. Bai, H. Yao, and Y. Rui. Visualizing and comparing convolutional neural networks. arXiv preprint arXiv:1412.6631, 2014.

[31]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV. 2014.

Cited By

Liu L(2024)Automatic Summarization and Key Information Extraction Algorithm of Japanese Corpus Based on Deep Neural Network2024 International Conference on Electrical Drives, Power Electronics & Engineering (EDPEE)10.1109/EDPEE61724.2024.00152(790-795)Online publication date: 27-Feb-2024
https://doi.org/10.1109/EDPEE61724.2024.00152
Suran SPattanaik VKurvers RHallin CDe Liddo AKrimmer RDraheim D(2022)Building Global Societies on Collective Intelligence: Challenges and OpportunitiesDigital Government: Research and Practice10.1145/35681693:4(1-6)Online publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1145/3568169
Li YLi G(2022)The Impacts of Digital Literacy on Citizen Civic Engagement—Evidence from ChinaDigital Government: Research and Practice10.1145/35327853:4(1-12)Online publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1145/3532785
Show More Cited By

Index Terms

Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

Edge-preserving image denoising using a deep convolutional neural network
Highlights
- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
Abstract
This paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Texture Dataset Construction and Texture Image Retrieval based on Deep Learning
CSAI '21: Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence

In the deep texture image retrieval, to address the problem that the retrieval performance is affected by the lack of sufficiently large texture image dataset used for the effective training of deep neural network, a deep learning based texture dataset ...
From synthetic to natural — single natural image dehazing deep networks using synthetic dataset domain randomization
Abstract
Image dehazing methods aim to solve the problem of poor visibility in images due to haze. Techniques proposed for image dehazing in literature focus on image priors, haze lines or data driven statistical models. Variations of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
535
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu L(2024)Automatic Summarization and Key Information Extraction Algorithm of Japanese Corpus Based on Deep Neural Network2024 International Conference on Electrical Drives, Power Electronics & Engineering (EDPEE)10.1109/EDPEE61724.2024.00152(790-795)Online publication date: 27-Feb-2024
https://doi.org/10.1109/EDPEE61724.2024.00152
Suran SPattanaik VKurvers RHallin CDe Liddo AKrimmer RDraheim D(2022)Building Global Societies on Collective Intelligence: Challenges and OpportunitiesDigital Government: Research and Practice10.1145/35681693:4(1-6)Online publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1145/3568169
Li YLi G(2022)The Impacts of Digital Literacy on Citizen Civic Engagement—Evidence from ChinaDigital Government: Research and Practice10.1145/35327853:4(1-12)Online publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1145/3532785
Yu JTan MZhang HRui YTao D(2022)Hierarchical Deep Click Feature Prediction for Fine-Grained Image RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2019.293205844:2(563-578)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TPAMI.2019.2932058
Roh YHeo GWhang S(2021)A Survey on Data Collection for Machine Learning: A Big Data - AI Integration PerspectiveIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294616233:4(1328-1347)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TKDE.2019.2946162
B BR JS JM RK SK R(2021)A novel approach of classifying ABO blood group image dataset using deep learning algorithm2021 International Conference on Computational Performance Evaluation (ComPE)10.1109/ComPE53109.2021.9752278(393-398)Online publication date: 1-Dec-2021
https://doi.org/10.1109/ComPE53109.2021.9752278
Yamanishi RMizoguchi YIwahori Y(2021)Construction of Attribute Dataset with SNS Mining for Generic Object RecognitionProcedia Computer Science10.1016/j.procs.2021.08.143192(1401-1410)Online publication date: 2021
https://doi.org/10.1016/j.procs.2021.08.143
Mithun NPanda RRoy-Chowdhury A(2020)Construction of Diverse Image Datasets From Web Collections With Limited LabelingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2019.289889930:4(1147-1161)Online publication date: Apr-2020
https://doi.org/10.1109/TCSVT.2019.2898899
Anvari ZAthitsos VMakedon F(2019)A pipeline for automated face dataset creation from unlabeled imagesProceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3316782.3321522(227-235)Online publication date: 5-Jun-2019
https://dl.acm.org/doi/10.1145/3316782.3321522
Tan MYu JZhang HRui YTao D(2019)Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep NetworkIEEE Transactions on Image Processing10.1109/TIP.2019.292186128:12(6047-6062)Online publication date: Dec-2019
https://doi.org/10.1109/TIP.2019.2921861
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents