skip to main content
10.1145/3652583.3657624acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Extending CLIP for Text-to-font Retrieval

Published: 07 June 2024 Publication History

Abstract

This study addresses the challenge of font retrieval in design by proposing a novel approach utilizing contrastive learning to establish a shared embedding space for texts and fonts. In contrast to previous methods limited to word-level queries, our method enables text-font retrieval at the sentence level. We collected text-font pair data from web pages and design templates on the Internet, finetuned the CLIP model on these pairs, and obtained text and font encoders for our application. The top-k fonts were then retrieved using the cosine distance between input text and font embeddings. Our approach offers three key advantages: (1) retrieving fonts with sentence-level text as input, which is intuitively consistent with design behaviors; (2) leveraging text-font pair data available on the Internet without manual annotation; and (3) scalability, the trained font encoder can encode new font candidates without retraining the model. We introduced an evaluation metric for font retrieval results. The results indicate that the retrieved fonts in the top 3 score better than those from baseline methods, and the top 1 retrieved font is competitive with the fonts selected by experienced graphic designers.

References

[1]
Saemi Choi and Kiyoharu Aizawa. 2019. Emotype: Expressing Emotions by Changing Typeface in Mobile Messenger Texting. Multimedia Tools and Applications, Vol. 78, 11 (June 2019), 14155--14172. https://doi.org/10.1007/s11042-018-6753-3
[2]
Saemi Choi, Kiyoharu Aizawa, and Nicu Sebe. 2018. FontMatcher: Font Image Paring for Harmonious Digital Graphic Design. In 23rd International Conference on Intelligent User Interfaces (IUI '18). Association for Computing Machinery, New York, NY, USA, 37--41. https://doi.org/10.1145/3172944.3173001
[3]
Saemi Choi, Shun Matsumura, and Kiyoharu Aizawa. 2019. Assist Users' Interactions in Font Search with Unexpected but Useful Concepts Generated by Multimodal Learning. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR '19). Association for Computing Machinery, New York, NY, USA, 235--243. https://doi.org/10.1145/3323873.3325037
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805 arxiv: 1810.04805 [cs]
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arXiv.2010.11929 arxiv: 2010.11929
[6]
Martin Dv zbor. 2009. Design Problems, Frames and Innovative Solutions. IOS Press.
[7]
Mariya Hendriksen, Maurits Bleeker, Svitlana Vakulenko, Nanne van Noord, Ernst Kuiper, and Maarten de Rijke. 2022. Extending CLIP for Category-to-Image Retrieval in E-Commerce. In Advances in Information Retrieval (Lecture Notes in Computer Science), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 289--303. https://doi.org/10.1007/978-3-030-99736-6_20
[8]
Jihun Kang, Daichi Haraguchi, Seiya Matsuda, Akisato Kimura, and Seiichi Uchida. 2022. Shared Latent Space of Font Shapes and Their Noisy Impressions. In MultiMedia Modeling (Lecture Notes in Computer Science), Björn Pór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, and Benoit Huet (Eds.). Springer International Publishing, Cham, 146--157. https://doi.org/10.1007/978-3-030-98355-0_13
[9]
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/arXiv.1412.6980 arxiv: 1412.6980 [cs]
[10]
Tugba Kulahcioglu and Gerard de Melo. 2020. Fonts Like This but Happier: A New Way to Discover Fonts. In Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 2973--2981.
[11]
Tugba Kulahcioglu and Gerard De Melo. 2019. Fontlex: A typographical lexicon based on affective associations. In LREC 2018 - 11th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), 62--69.
[12]
Leland McInnes, John Healy, and James Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arxiv: 1802.03426
[13]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Sept. 2013). arxiv: 1301.3781 [cs]
[14]
Peter O'Donovan, Jānis Lībeks, Aseem Agarwala, and Aaron Hertzmann. 2014. Exploratory Font Selection Using Crowdsourced Attributes. ACM Transactions on Graphics, Vol. 33, 4 (July 2014), 92:1--92:9. https://doi.org/10.1145/2601097.2601110
[15]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 8748--8763.
[16]
I.-Chao Shen, Fu-Yin Cherng, Takeo Igarashi, Wen-Chieh Lin, and Bing-Yu Chen. 2023. EvIcon: Designing High-Usability Icon with Human-in-the-Loop Exploration and IconCLIP. In Computer Graphics Forum (Issue6, Vol. Volume42). arXiv. https://doi.org/10.48550/arXiv.2305.17609 arxiv: 2305.17609 [cs]
[17]
Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, and Kurt Keutzer. 2021. How Much Can CLIP Benefit Vision-and-Language Tasks? https://doi.org/10.48550/arXiv.2107.06383 arxiv: 2107.06383 [cs]
[18]
Amirreza Shirani, Franck Dernoncourt, Jose Echevarria, Paul Asente, Nedim Lipka, and Thamar Solorio. 2020. Let Me Choose: From Verbal Context to Font Selection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8607--8613. https://doi.org/10.18653/v1/2020.acl-main.762
[19]
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. 2017. Deep Sets. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.
[20]
Nanxuan Zhao, Ying Cao, and Rynson W.H. Lau. 2018. Modeling Fonts in Context: Font Prediction on Web Designs. Computer Graphics Forum, Vol. 37, 7 (2018), 385--395. https://doi.org/10.1111/cgf.13576

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
May 2024
1379 pages
ISBN:9798400706196
DOI:10.1145/3652583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clip
  2. contrastive learning
  3. design tool
  4. embedding space
  5. font

Qualifiers

  • Short-paper

Conference

ICMR '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 111
    Total Downloads
  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)29
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media