short-paper

MoZuMa: A Model Zoo for Multimedia Applications

Authors:
Stéphane Massonnet

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

,
Marco Romanelli

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

,
Rémi Lebret

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

,
Niels Poulsen

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

,
Karl Aberer

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 7335–7338https://doi.org/10.1145/3503161.3548542

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 7335–7338

ABSTRACT

Lots of machine learning models with applications in Multimedia Search are released as Open Source Software. However, integrating these models into an application is not always an easy task due to the lack of a consistent interface to run, train or distribute models. With MoZuMa, we aim at reducing this effort by providing a model zoo for image similarity, text-to-image retrieval, face recognition, object similarity search, video key-frames detection and multilingual text search implemented in a generic interface with a modular architecture. The code is released as Open Source Software at https://github.com/mozuma/mozuma.

Supplemental Material

M22-mmos09.mp4

mp4

166.5 MB

Download

References

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.Google Scholar
ONNX Runtime developers. 2021. ONNX Runtime. https://onnxruntime.ai/. Version: x.y.z.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90Google Scholar
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Qiang Meng, Shichao Zhao, Zhida Huang, and Feng Zhou. 2021. MagFace: A Universal Representation for Face Recognition and Quality Assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14225--14234.Google ScholarCross Ref
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748--8763. https://proceedings.mlr.press/v139/radford21a.htmlGoogle Scholar
Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/2004.09813Google ScholarCross Ref
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (Oct 2016), 1499--1503. https://doi.org/10.1109/LSP.2016.2603342Google ScholarCross Ref
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, LijuanWang, Yejin Choi, and Jianfeng Gao. 2021. VinVL: Revisiting Visual Representations in Vision-Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5579--5588.Google ScholarCross Ref
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google Scholar

Index Terms

MoZuMa: A Model Zoo for Multimedia Applications
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
  2. Machine learning

Recommendations

Licenses of Open Source Software and their Economic Values
SAINT-W '05: Proceedings of the 2005 Symposium on Applications and the Internet Workshops

Licenses of open source software (OSS) are quiet various but can be categorised into three. That is GPL (GNU general Public License) like, LGPL (GNU Lesser general Public License) like, or MPL (Mozilla Public License) like. Although there are numbers of ...
Read More
Microsoft, Open Source and the software ecosystem: of predators and prey—the leopard can change its spots

Over the past few years, Microsoft has promoted a project called 'Shared Source Initiative', which allows certain customers (e.g., research institutions and independent software vendors) access to its source code on a restricted basis. As part of this ...
Read More
Open Source Drives Innovation

Engineers using free and open source software created many of today's most innovative products and solutions. FOSS also changed the way we develop software. IEEE Software's Open Source column first appeared in this magazine in July 2004. In this last ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multimedia search
open source software
vision and language
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 84
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MoZuMa: A Model Zoo for Multimedia Applications

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Licenses of Open Source Software and their Economic Values

Microsoft, Open Source and the software ecosystem: of predators and prey—the leopard can change its spots

Open Source Drives Innovation