skip to main content
10.1145/2814815.2814820acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Real-time Analysis and Visualization of the YFCC100m Dataset

Published: 30 October 2015 Publication History

Abstract

With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset. Following best practice from data collections, such as ImageNet and MS COCO, this paper presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time. Providing statistics of the queried images and videos will enable researchers to refine their query successively, such that the users desired subset of interest can be narrowed down quickly. The final set of image and video can be downloaded as URLs from the browser for further processing.

References

[1]
J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The yli-med corpus: Characteristics, procedures, and plans. arXiv preprint arXiv:1503.04250, 2015.
[2]
D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In Proc. ACM Int. Conf. on Multimedia (ACM MM), pages 223--232, October 2013.
[3]
L. Cao, S.-F. Chang, N. Codella, C. Cotton, D. Ellis, L. Gong, M. Hill, G. Hua, J. Kender, M. Merler, Y. Mu amd A. Natsev, and J. Smith. IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In Proc. NIST TRECVID Workshop (unreviewed workshop paper), December 2011.
[4]
J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce, et al. The placing task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pages 27--31. ACM, 2014.
[5]
J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 248--255, July 2009.
[6]
M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. Int. Journal of Computer Vision, 88(2):303--338, June 2010.
[7]
M. Huiskes and M. Lew. The mir flickr retrieval evaluation. In Proc. ACM Int. Conf. Multimedia Information Retrieval (ACM MIR), October 2008.
[8]
M. Huiskes, B. Thomee, and M. Lew. New Trends and Ideas in Visual Concept Detection: the MIR Flickr Retrieval Evaluation Initiative. In Proc. ACM Int. Conf. on Multimedia (ACM MM), pages 527--536, October 2010.
[9]
A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 1106--1114, December 2012.
[10]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014, pages 740--755. Springer, 2014.
[11]
K. Ni, R. Pearce, K. Boakye, B. Van Essen, D. Borth, B. Chen, and E. Wang. Large-scale deep learning on the yfcc100m dataset. arXiv preprint arXiv:1502.03409, 2015.
[12]
A. Smeaton, P. Over, and W. Kraaij. High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. In Multimedia Content Analysis, Theory and Applications, pages 151--174. Springer, 2009.
[13]
B. Thomee, J. Moreno, and D. A Shamma. Who's time is it anyway?: Investigating the accuracy of camera timestamps. In Proc. of the ACM Int. Conf. on Multimedia (ACM MM), pages 909--912. ACM, 2014.
[14]
B. Thomee, D. A Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, 2015.
[15]
V. Yanulevskaya, J. van Gemert, K. Roth, A. Herbold, N. Sebe, and J.M. Geusebroek. Emotional Valence Categorization using Holistic Image Features. In Proc. IEEE Int Conf on Image Processing (ICIP), pages 101--104, October 2008.
[16]
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems, pages 487--495, 2014.

Cited By

View all
  • (2025)Integrating Visual Context Into Language Models for Situated Social Conversation StartersIEEE Transactions on Affective Computing10.1109/TAFFC.2024.342870416:1(223-236)Online publication date: Jan-2025
  • (2024)Enhancing Representation Learning With Spatial Transformation and Early Convolution for Reinforcement Learning-Based Small Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328445334:1(315-328)Online publication date: Jan-2024
  • (2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
  • Show More Cited By

Index Terms

  1. Real-time Analysis and Visualization of the YFCC100m Dataset

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMCommons '15: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions
    October 2015
    50 pages
    ISBN:9781450337441
    DOI:10.1145/2814815
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. YFCC100m
    2. browser
    3. dataset
    4. search
    5. visualization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 30, 2015
    Brisbane, Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Integrating Visual Context Into Language Models for Situated Social Conversation StartersIEEE Transactions on Affective Computing10.1109/TAFFC.2024.342870416:1(223-236)Online publication date: Jan-2025
    • (2024)Enhancing Representation Learning With Spatial Transformation and Early Convolution for Reinforcement Learning-Based Small Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328445334:1(315-328)Online publication date: Jan-2024
    • (2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
    • (2024)Does the Performance of Text-to-Image Retrieval Models Generalize Beyond Captions-as-a-Query?Advances in Information Retrieval10.1007/978-3-031-56066-8_15(161-176)Online publication date: 15-Mar-2024
    • (2023)Enhanced CatBoost with Stacking Features for Social Media PredictionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612839(9430-9435)Online publication date: 26-Oct-2023
    • (2023)Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01525(15888-15899)Online publication date: Jun-2023
    • (2022)Contrastive language-image pre-training with knowledge graphsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601934(22895-22910)Online publication date: 28-Nov-2022
    • (2022)Uni-perceiver-MoEProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600463(2664-2678)Online publication date: 28-Nov-2022
    • (2022)Dataset column: Report from the MMM 2019 Special Session on Multimedia Datasets for Repeatable Experimentation (MDRE 2019)ACM SIGMultimedia Records10.1145/3524460.352446911:3(1-1)Online publication date: 8-Mar-2022
    • (2022)Zero-shot Object Detection Through Vision-Language Embedding Alignment2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00121(1-15)Online publication date: Nov-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media