skip to main content
10.1145/2733373.2806243acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network

Published: 13 October 2015 Publication History

Abstract

Labelled image datasets are the backbone for high-level image understanding tasks with wide application scenarios, and continuously drive and evaluate the progress of feature designing and supervised learning models. Recently, the million scale labelled image dataset further contributes to the rebirth of deep convolutional neural network and bypass manual designing handcraft features. However, the construction process of image dataset is mainly manual-based and quite labor intensive, which often take years' efforts to construct a million scale dataset with high quality. In this paper, we propose a deep learning based method to construct large scale image dataset in an automatic way. Specifically, word representation and image representation are learned in a deep neural network from large amount of click-through logs, and further used to define word-word similarity and image-word similarity. These two similarities are used to automatize the two labor intensive steps in manual-based image dataset construction: query formation and noisy image removal. With a new proposed cross convolutional filter regularizer, we can construct a million scale image dataset in one week. Finally, two image datasets are constructed to verify the effectiveness of the method. In addition to scale, the automatically constructed dataset has comparable accuracy, diversity and cross-dataset generalization with manually labelled image datasets.

References

[1]
Y. Bai, W. Yu, T. Xiao, C. Xu, K. Yang, W.-Y. Ma, and T. Zhao. Bag-of-words based deep neural network for image retrieval. In Proceedings of the ACM International Conference on Multimedia, pages 229--232. ACM, 2014.
[2]
D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia, pages 223--232. ACM, 2013.
[3]
X. Chen, A. Shrivastava, and A. Gupta. Neil: Extracting visual knowledge from web data. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1409--1416. IEEE, 2013.
[4]
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In CIVR, Santorini, Greece., 2009.
[5]
B. Collins, J. Deng, K. Li, and L. Fei-Fei. Towards scalable dataset construction: An active learning approach. In ECCV. 2008.
[6]
R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML, 2008.
[7]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
[8]
S. K. Divvala, A. Farhadi, and C. Guestrin. Learning everything about anything: Webly-supervised visual concept learning. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3270--3277. IEEE, 2014.
[9]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.
[10]
L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(4):594--611, 2006.
[11]
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121--2129, 2013.
[12]
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.
[13]
X.-S. Hua and J. Li. Prajna: Towards recognizing whatever you want from images without image labeling. AAAI - Association for the Advancement of Artificial Intelligence, January 2015.
[14]
X.-S. Hua, L. Yang, J. Wang, J. Wang, M. Ye, K. Wang, Y. Rui, and J. Li. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In ACM Multimedia, 2013.
[15]
A. Krizhevsky. cuda-convnet2, 2014. http://code.google.com/p/cuda-convnet2/.
[16]
A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.
[17]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[18]
L.-J. Li and L. Fei-Fei. Optimol: automatic online picture collection via incremental model learning. International journal of computer vision, 88(2):147--168, 2010.
[19]
G. A. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995.
[20]
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), 12, 2014.
[21]
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. IJCV, 77(1--3):157--173, 2008.
[22]
A. Shrivastava, S. Singh, and A. Gupta. Constrained semi-supervised learning using attributes and comparative attributes. In ECCV, pages 369--383. 2012.
[23]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[24]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.
[25]
A. Torralba and A. A. Efros. Unbiased look at dataset bias. In CVPR, 2011.
[26]
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958--1970, 2008.
[27]
L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579--2605):85, 2008.
[28]
J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, volume 11, pages 2764--2770, 2011.
[29]
J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: large-scale scene recognition from abbey to zoo. In CVPR, 2010.
[30]
W. Yu, K. Yang, Y. Bai, H. Yao, and Y. Rui. Visualizing and comparing convolutional neural networks. arXiv preprint arXiv:1412.6631, 2014.
[31]
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV. 2014.

Cited By

View all
  • (2024)Automatic Summarization and Key Information Extraction Algorithm of Japanese Corpus Based on Deep Neural Network2024 International Conference on Electrical Drives, Power Electronics & Engineering (EDPEE)10.1109/EDPEE61724.2024.00152(790-795)Online publication date: 27-Feb-2024
  • (2022)Building Global Societies on Collective Intelligence: Challenges and OpportunitiesDigital Government: Research and Practice10.1145/35681693:4(1-6)Online publication date: 5-Dec-2022
  • (2022)The Impacts of Digital Literacy on Citizen Civic Engagement—Evidence from ChinaDigital Government: Research and Practice10.1145/35327853:4(1-12)Online publication date: 5-Dec-2022
  • Show More Cited By

Index Terms

  1. Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic image dataset construction
    2. deep learning
    3. image representation
    4. word representation

    Qualifiers

    • Research-article

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Automatic Summarization and Key Information Extraction Algorithm of Japanese Corpus Based on Deep Neural Network2024 International Conference on Electrical Drives, Power Electronics & Engineering (EDPEE)10.1109/EDPEE61724.2024.00152(790-795)Online publication date: 27-Feb-2024
    • (2022)Building Global Societies on Collective Intelligence: Challenges and OpportunitiesDigital Government: Research and Practice10.1145/35681693:4(1-6)Online publication date: 5-Dec-2022
    • (2022)The Impacts of Digital Literacy on Citizen Civic Engagement—Evidence from ChinaDigital Government: Research and Practice10.1145/35327853:4(1-12)Online publication date: 5-Dec-2022
    • (2022)Hierarchical Deep Click Feature Prediction for Fine-Grained Image RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2019.293205844:2(563-578)Online publication date: 1-Feb-2022
    • (2021)A Survey on Data Collection for Machine Learning: A Big Data - AI Integration PerspectiveIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294616233:4(1328-1347)Online publication date: 1-Apr-2021
    • (2021)A novel approach of classifying ABO blood group image dataset using deep learning algorithm2021 International Conference on Computational Performance Evaluation (ComPE)10.1109/ComPE53109.2021.9752278(393-398)Online publication date: 1-Dec-2021
    • (2021)Construction of Attribute Dataset with SNS Mining for Generic Object RecognitionProcedia Computer Science10.1016/j.procs.2021.08.143192(1401-1410)Online publication date: 2021
    • (2020)Construction of Diverse Image Datasets From Web Collections With Limited LabelingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2019.289889930:4(1147-1161)Online publication date: Apr-2020
    • (2019)A pipeline for automated face dataset creation from unlabeled imagesProceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3316782.3321522(227-235)Online publication date: 5-Jun-2019
    • (2019)Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep NetworkIEEE Transactions on Image Processing10.1109/TIP.2019.292186128:12(6047-6062)Online publication date: Dec-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media