research-article

Real-time Analysis and Visualization of the YFCC100m Dataset

Authors:

Sebastian Kalkowski,

Christian Schulze,

Andreas Dengel,

Damian BorthAuthors Info & Claims

MMCommons '15: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions

Pages 25 - 30

https://doi.org/10.1145/2814815.2814820

Published: 30 October 2015 Publication History

Abstract

With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset. Following best practice from data collections, such as ImageNet and MS COCO, this paper presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time. Providing statistics of the queried images and videos will enable researchers to refine their query successively, such that the users desired subset of interest can be narrowed down quickly. The final set of image and video can be downloaded as URLs from the browser for further processing.

References

[1]

J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The yli-med corpus: Characteristics, procedures, and plans. arXiv preprint arXiv:1503.04250, 2015.

[2]

D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In Proc. ACM Int. Conf. on Multimedia (ACM MM), pages 223--232, October 2013.

Digital Library

[3]

L. Cao, S.-F. Chang, N. Codella, C. Cotton, D. Ellis, L. Gong, M. Hill, G. Hua, J. Kender, M. Merler, Y. Mu amd A. Natsev, and J. Smith. IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In Proc. NIST TRECVID Workshop (unreviewed workshop paper), December 2011.

[4]

J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce, et al. The placing task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pages 27--31. ACM, 2014.

Digital Library

[5]

J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 248--255, July 2009.

[6]

M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. Int. Journal of Computer Vision, 88(2):303--338, June 2010.

Digital Library

[7]

M. Huiskes and M. Lew. The mir flickr retrieval evaluation. In Proc. ACM Int. Conf. Multimedia Information Retrieval (ACM MIR), October 2008.

Digital Library

[8]

M. Huiskes, B. Thomee, and M. Lew. New Trends and Ideas in Visual Concept Detection: the MIR Flickr Retrieval Evaluation Initiative. In Proc. ACM Int. Conf. on Multimedia (ACM MM), pages 527--536, October 2010.

Digital Library

[9]

A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 1106--1114, December 2012.

[10]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014, pages 740--755. Springer, 2014.

[11]

K. Ni, R. Pearce, K. Boakye, B. Van Essen, D. Borth, B. Chen, and E. Wang. Large-scale deep learning on the yfcc100m dataset. arXiv preprint arXiv:1502.03409, 2015.

[12]

A. Smeaton, P. Over, and W. Kraaij. High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. In Multimedia Content Analysis, Theory and Applications, pages 151--174. Springer, 2009.

[13]

B. Thomee, J. Moreno, and D. A Shamma. Who's time is it anyway?: Investigating the accuracy of camera timestamps. In Proc. of the ACM Int. Conf. on Multimedia (ACM MM), pages 909--912. ACM, 2014.

Digital Library

[14]

B. Thomee, D. A Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, 2015.

[15]

V. Yanulevskaya, J. van Gemert, K. Roth, A. Herbold, N. Sebe, and J.M. Geusebroek. Emotional Valence Categorization using Holistic Image Features. In Proc. IEEE Int Conf on Image Processing (ICIP), pages 101--104, October 2008.

[16]

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems, pages 487--495, 2014.

Digital Library

Cited By

Janssens RWolfert PDemeester TBelpaeme T(2025)Integrating Visual Context Into Language Models for Situated Social Conversation StartersIEEE Transactions on Affective Computing10.1109/TAFFC.2024.342870416:1(223-236)Online publication date: Jan-2025
https://doi.org/10.1109/TAFFC.2024.3428704
Fang FLiang WCheng YXu QLim J(2024)Enhancing Representation Learning With Spatial Transformation and Early Convolution for Reinforcement Learning-Based Small Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328445334:1(315-328)Online publication date: Jan-2024
https://doi.org/10.1109/TCSVT.2023.3284453
Astruc GDufour NSiglidis IAronssohn CBouia NFu SLoiseau RNguyen VRaude CVincent EXu LZhou HLandrieu L(2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02074
Show More Cited By

Index Terms

Real-time Analysis and Visualization of the YFCC100m Dataset
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset
MMCommons '16: Proceedings of the 2016 ACM Workshop on Multimedia COMMONS

The Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M) is one of the largest public databases containing images and videos and their annotations for research on multimedia analysis. In this paper, we present our study on analysis of ...
Multimedia Sensor Dataset for the Analysis of Vehicle Movement
MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference

With applications ranging from basic trajectory calculations to complex autonomous vehicle operations, detailed vehicle movement analysis has been getting more attention in academia and industry. So far, real-data driven analysis, e.g., utilizing ...
A Real-World Web Cross-Media Dataset Containing Images, Texts and Videos
ICIMCS '14: Proceedings of International Conference on Internet Multimedia Computing and Service

During recent years, the amount of multimedia data on social websites is growing exponentially. It is observed that multimedia data corresponding to the same semantic concept usually appears in different media types and from heterogeneous data sources. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMCommons '15: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions

October 2015

50 pages

ISBN:9781450337441

DOI:10.1145/2814815

Program Chairs:
Gerald Friedland
International Computer Science Institute, USA
,
Chong-Wah Ngo
VIREO Research Group, City University of Hong Kong, Hong Kong
,
David A. Shamma
Yahoo Labs & Flickr, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Bundesministerium für Bildung und Forschung

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 30, 2015

Brisbane, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
330
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Janssens RWolfert PDemeester TBelpaeme T(2025)Integrating Visual Context Into Language Models for Situated Social Conversation StartersIEEE Transactions on Affective Computing10.1109/TAFFC.2024.342870416:1(223-236)Online publication date: Jan-2025
https://doi.org/10.1109/TAFFC.2024.3428704
Fang FLiang WCheng YXu QLim J(2024)Enhancing Representation Learning With Spatial Transformation and Early Convolution for Reinforcement Learning-Based Small Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328445334:1(315-328)Online publication date: Jan-2024
https://doi.org/10.1109/TCSVT.2023.3284453
Astruc GDufour NSiglidis IAronssohn CBouia NFu SLoiseau RNguyen VRaude CVincent EXu LZhou HLandrieu L(2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02074
Rodriguez JTavassoli NLevy ELederman GSivov DLissandrini MMottin D(2024)Does the Performance of Text-to-Image Retrieval Models Generalize Beyond Captions-as-a-Query?Advances in Information Retrieval10.1007/978-3-031-56066-8_15(161-176)Online publication date: 15-Mar-2024
https://doi.org/10.1007/978-3-031-56066-8_15
Mao SXi WYu LLü GXing XZhou XWan WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Enhanced CatBoost with Stacking Features for Social Media PredictionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612839(9430-9435)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612839
Su WZhu XTao CLu LLi BHuang GQiao YWang XZhou JDai J(2023)Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01525(15888-15899)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01525
Pan XYe THan DSong SHuang GKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Contrastive language-image pre-training with knowledge graphsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601934(22895-22910)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601934
Zhu JZhu XWang WWang XLi HWang XDai JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Uni-perceiver-MoEProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600463(2664-2678)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600463
Schoeffmann KJónsson BGurrin C(2022)Dataset column: Report from the MMM 2019 Special Session on Multimedia Datasets for Repeatable Experimentation (MDRE 2019)ACM SIGMultimedia Records10.1145/3524460.352446911:3(1-1)Online publication date: 8-Mar-2022
https://dl.acm.org/doi/10.1145/3524460.3524469
Xie JZheng S(2022)Zero-shot Object Detection Through Vision-Language Embedding Alignment2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00121(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/ICDMW58026.2022.00121
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten