short-paper

Extending CLIP for Text-to-font Retrieval

Authors:

Zhenyu GuAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 1170 - 1174

https://doi.org/10.1145/3652583.3657624

Published: 07 June 2024 Publication History

Abstract

This study addresses the challenge of font retrieval in design by proposing a novel approach utilizing contrastive learning to establish a shared embedding space for texts and fonts. In contrast to previous methods limited to word-level queries, our method enables text-font retrieval at the sentence level. We collected text-font pair data from web pages and design templates on the Internet, finetuned the CLIP model on these pairs, and obtained text and font encoders for our application. The top-k fonts were then retrieved using the cosine distance between input text and font embeddings. Our approach offers three key advantages: (1) retrieving fonts with sentence-level text as input, which is intuitively consistent with design behaviors; (2) leveraging text-font pair data available on the Internet without manual annotation; and (3) scalability, the trained font encoder can encode new font candidates without retraining the model. We introduced an evaluation metric for font retrieval results. The results indicate that the retrieved fonts in the top 3 score better than those from baseline methods, and the top 1 retrieved font is competitive with the fonts selected by experienced graphic designers.

References

[1]

Saemi Choi and Kiyoharu Aizawa. 2019. Emotype: Expressing Emotions by Changing Typeface in Mobile Messenger Texting. Multimedia Tools and Applications, Vol. 78, 11 (June 2019), 14155--14172. https://doi.org/10.1007/s11042-018-6753-3

Digital Library

[2]

Saemi Choi, Kiyoharu Aizawa, and Nicu Sebe. 2018. FontMatcher: Font Image Paring for Harmonious Digital Graphic Design. In 23rd International Conference on Intelligent User Interfaces (IUI '18). Association for Computing Machinery, New York, NY, USA, 37--41. https://doi.org/10.1145/3172944.3173001

Digital Library

[3]

Saemi Choi, Shun Matsumura, and Kiyoharu Aizawa. 2019. Assist Users' Interactions in Font Search with Unexpected but Useful Concepts Generated by Multimodal Learning. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR '19). Association for Computing Machinery, New York, NY, USA, 235--243. https://doi.org/10.1145/3323873.3325037

Digital Library

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805 arxiv: 1810.04805 [cs]

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arXiv.2010.11929 arxiv: 2010.11929

[6]

Martin Dv zbor. 2009. Design Problems, Frames and Innovative Solutions. IOS Press.

[7]

Mariya Hendriksen, Maurits Bleeker, Svitlana Vakulenko, Nanne van Noord, Ernst Kuiper, and Maarten de Rijke. 2022. Extending CLIP for Category-to-Image Retrieval in E-Commerce. In Advances in Information Retrieval (Lecture Notes in Computer Science), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 289--303. https://doi.org/10.1007/978-3-030-99736-6_20

Digital Library

[8]

Jihun Kang, Daichi Haraguchi, Seiya Matsuda, Akisato Kimura, and Seiichi Uchida. 2022. Shared Latent Space of Font Shapes and Their Noisy Impressions. In MultiMedia Modeling (Lecture Notes in Computer Science), Björn Pór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, and Benoit Huet (Eds.). Springer International Publishing, Cham, 146--157. https://doi.org/10.1007/978-3-030-98355-0_13

Digital Library

[9]

Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/arXiv.1412.6980 arxiv: 1412.6980 [cs]

[10]

Tugba Kulahcioglu and Gerard de Melo. 2020. Fonts Like This but Happier: A New Way to Discover Fonts. In Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 2973--2981.

Digital Library

[11]

Tugba Kulahcioglu and Gerard De Melo. 2019. Fontlex: A typographical lexicon based on affective associations. In LREC 2018 - 11th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), 62--69.

[12]

Leland McInnes, John Healy, and James Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arxiv: 1802.03426

[13]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Sept. 2013). arxiv: 1301.3781 [cs]

[14]

Peter O'Donovan, Jānis Lībeks, Aseem Agarwala, and Aaron Hertzmann. 2014. Exploratory Font Selection Using Crowdsourced Attributes. ACM Transactions on Graphics, Vol. 33, 4 (July 2014), 92:1--92:9. https://doi.org/10.1145/2601097.2601110

Digital Library

[15]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 8748--8763.

[16]

I.-Chao Shen, Fu-Yin Cherng, Takeo Igarashi, Wen-Chieh Lin, and Bing-Yu Chen. 2023. EvIcon: Designing High-Usability Icon with Human-in-the-Loop Exploration and IconCLIP. In Computer Graphics Forum (Issue6, Vol. Volume42). arXiv. https://doi.org/10.48550/arXiv.2305.17609 arxiv: 2305.17609 [cs]

[17]

Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, and Kurt Keutzer. 2021. How Much Can CLIP Benefit Vision-and-Language Tasks? https://doi.org/10.48550/arXiv.2107.06383 arxiv: 2107.06383 [cs]

[18]

Amirreza Shirani, Franck Dernoncourt, Jose Echevarria, Paul Asente, Nedim Lipka, and Thamar Solorio. 2020. Let Me Choose: From Verbal Context to Font Selection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8607--8613. https://doi.org/10.18653/v1/2020.acl-main.762

[19]

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. 2017. Deep Sets. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.

[20]

Nanxuan Zhao, Ying Cao, and Rynson W.H. Lau. 2018. Modeling Fonts in Context: Font Prediction on Web Designs. Computer Graphics Forum, Vol. 37, 7 (2018), 385--395. https://doi.org/10.1111/cgf.13576

Index Terms

Extending CLIP for Text-to-font Retrieval

Recommendations

Exploratory font selection using crowdsourced attributes

This paper presents interfaces for exploring large collections of fonts for design tasks. Existing interfaces typically list fonts in a long, alphabetically-sorted menu that can be challenging and frustrating to explore. We instead propose three ...
Ribbon Font Neural Style Transfer for OpenType-SVG Font
SA '22: SIGGRAPH Asia 2022 Posters

We use existing machine learning neural style transfer model, differential rasterizer, for colored font design. The input of the proposed system is an existing TrueType font and the output is an neural style transferred OpenType-SVG color font. Each ...
KAFD Arabic font database

Font recognition is useful for improving optical text recognition systems' accuracy and time, and to restore the documents' original formats. This paper addresses a need for Arabic font recognition research by introducing an Arabic font recognition ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
111
Total Downloads

Downloads (Last 12 months)111
Downloads (Last 6 weeks)29

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten