research-article

SamCap: Energy-based Controllable Image Captioning by Gradient-Based Sampling

Authors:

Zhihua WeiAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 608 - 617

https://doi.org/10.1145/3652583.3658112

Published: 07 June 2024 Publication History

Abstract

Despite remarkable advances in image captioning, existing models still lack the ability to generate controllable and diverse captions. As a solution, controllable image captioning (CIC) has recently gained attention, with the goal of generating image captions that satisfy the constraints of the given control signals. Current CIC methods have two main limitations: (1) They can only handle one specific control signal and lack the ability to handle combinations of multiple control signals. (2) They depend on costly supervised learning from task-specific data, which becomes impractical with increasing model size. To this end, we propose an energy-based sampling method for controllable image captioning, named SamCap. Specifically, by combining various constraint functions with the log likelihood of the image captioner into an energy function, we can generate captions that satisfy the specified constraints through gradient-based sampling. SamCap provides a learning-free and plug-and-play solution, that can integrate with any existing image captioner without task-specific fine-tuning. Extensive results demonstrate that SamCap not only matches the performance of SOTA signal-specific CIC models for single control signals, but also shows significant advantages in handling combinations of multiple control signals.

References

[1]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob L. Menick, Sebastian Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Miko?aj Bi?kowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karén Simonyan. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems, Vol. 35 (Dec. 2022), 23716--23736. https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html

[2]

Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. SPICE: Semantic Propositional Image Caption Evaluation. In Computer Vision -- ECCV 2016 (Lecture Notes in Computer Science ), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 382--398. https://doi.org/10.1007/978--3--319--46454--1_24

[3]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss (Eds.). Association for Computational Linguistics, Ann Arbor, Michigan, 65--72. https://aclanthology.org/W05-0909

[4]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. https://doi.org/10.48550/arXiv.1308.3432 arXiv:1308.3432 [cs].

[5]

Long Chen, Zhihong Jiang, Jun Xiao, and Wei Liu. 2021. Human-like Controllable Image Captioning with Verb-specific Semantic Roles. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16841--16851. https://doi.org/10.1109/CVPR46437.2021.01657 ISSN: 2575--7075.

[6]

Qi Chen, Chaorui Deng, and Qi Wu. 2022. Learning Distinct and Representative Modes for Image Captioning. Advances in Neural Information Processing Systems, Vol. 35 (Dec. 2022), 9472--9485. https://proceedings.neurips.cc/paper_files/paper/2022/hash/3d77c6dcc7f143aa2154e7f4d5e22d68-Abstract-Conference.html

[7]

Shizhe Chen, Qin Jin, Peng Wang, and Qi Wu. 2020. Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9959--9968. https://doi.org/10.1109/CVPR42600.2020.00998 ISSN: 2575--7075.

[8]

Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, and Jiebo Luo. 2018. "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 527--543. https://doi.org/10.1007/978--3-030-01249--6_32

[9]

Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. 2019. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8299--8308. https://doi.org/10.1109/CVPR.2019.00850 ISSN: 2575--7075.

[10]

Marcella Cornia, Lorenzo Baraldi, Ayellet Tal, and Rita Cucchiara. 2023. Fully-attentive iterative networks for region-based controllable image and video captioning. Computer Vision and Image Understanding, Vol. 237 (Dec. 2023), 103857. https://doi.org/10.1016/j.cviu.2023.103857

Digital Library

[11]

Chaorui Deng, Ning Ding, Mingkui Tan, and Qi Wu. 2020. Length-Controllable Image Captioning. In Computer Vision -- ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 712--729. https://doi.org/10.1007/978--3-030--58601-0_42

Digital Library

[12]

Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander G. Schwing, and David Forsyth. 2019. Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10687--10696. https://doi.org/10.1109/CVPR.2019.01095 ISSN: 2575--7075.

[13]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805 arXiv:1810.04805 [cs].

[14]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://openreview.net/forum?id=YicbFdNTTy

[15]

Yilun Du and Igor Mordatch. 2019. Implicit Generation and Modeling with Energy Based Models. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc. https://papers.nips.cc/paper/2019/hash/378a063b8fdb1db941e34f4bde584c7d-Abstract.html

[16]

Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, and Zicheng Liu. 2022. Injecting Semantic Concepts into End-to-End Image Captioning. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17988--17998. https://doi.org/10.1109/CVPR52688.2022.01748 ISSN: 2575--7075.

[17]

Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. 2017. StyleNet: Generating Attractive Visual Captions with Styles. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 955--964. https://doi.org/10.1109/CVPR.2017.108 ISSN: 1063--6919.

[18]

Longteng Guo, Jing Liu, Peng Yao, Jiangwei Li, and Hanqing Lu. 2019. MSCap: Multi-Style Image Captioning With Unpaired Stylized Text. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4199--4208. https://doi.org/10.1109/CVPR.2019.00433 ISSN: 2575--7075.

[19]

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2021. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 7514--7528. https://doi.org/10.18653/v1/2021.emnlp-main.595

[20]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (Nov. 1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735 Conference Name: Neural Computation.

Digital Library

[21]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2019. The Curious Case of Neural Text Degeneration. https://openreview.net/forum?id=rygGQyrFvH

[22]

Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, and Lijuan Wang. 2022. Scaling Up Vision-Language Pretraining for Image Captioning. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17959--17968. https://doi.org/10.1109/CVPR52688.2022.01745 ISSN: 2575--7075.

[23]

Andrej Karpathy and Li Fei-Fei. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. 3128--3137. https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Karpathy_Deep_Visual-Semantic_Alignments_2015_CVPR_paper.html

[24]

Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 5583--5594. https://proceedings.mlr.press/v139/kim21k.html ISSN: 2640--3498.

[25]

Sachin Kumar, Biswajit Paria, and Yulia Tsvetkov. 2022. Gradient-based Constrained Sampling from Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2251--2277. https://doi.org/10.18653/v1/2022.emnlp-main.144

[26]

Yann Lecun, Sumit Chopra, and Raia Hadsell. 2006. A tutorial on energy-based learning.

[27]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In Proceedings of the 40th International Conference on Machine Learning. PMLR, 19730--19742. https://proceedings.mlr.press/v202/li23q.html ISSN: 2640--3498.

[28]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In Proceedings of the 39th International Conference on Machine Learning. PMLR, 12888--12900. https://proceedings.mlr.press/v162/li22n.html ISSN: 2640--3498.

[29]

Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, and Jianfeng Gao. 2020. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In Computer Vision -- ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 121--137. https://doi.org/10.1007/978--3-030--58577--8_8

[30]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https://aclanthology.org/W04--1013

[31]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision -- ECCV 2014 (Lecture Notes in Computer Science ), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740--755. https://doi.org/10.1007/978--3--319--10602--1_48

[32]

Annika Lindh, Robert Ross, and John Kelleher. 2020. Language-Driven Region Pointer Advancement for Controllable Image Captioning. In Proceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (Online), 1922--1935. https://doi.org/10.18653/v1/2020.coling-main.174

[33]

Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, and Zhiting Hu. 2022. Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 2055--2078. https://doi.org/10.18653/v1/2022.naacl-main.150

[34]

Xin Liu, Muhammad Khalifa, and Lu Wang. 2023. BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 186--200. https://doi.org/10.18653/v1/2023.acl-short.18

[35]

Alexander Mathews, Lexing Xie, and Xuming He. 2016. SentiCap: Generating Image Descriptions with Sentiments. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30, 1 (March 2016). https://doi.org/10.1609/aaai.v30i1.10475 Number: 1.

[36]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Pierre Isabelle, Eugene Charniak, and Dekang Lin (Eds.). Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. https://doi.org/10.3115/1073083.1073135

Digital Library

[37]

Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. In 2015 IEEE International Conference on Computer Vision (ICCV). 2641--2649. https://doi.org/10.1109/ICCV.2015.303 ISSN: 2380--7504.

Digital Library

[38]

Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, and Vittorio Ferrari. 2020. Connecting Vision and Language with Localized Narratives. In Computer Vision -- ECCV 2020 (Lecture Notes in Computer Science ), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 647--664. https://doi.org/10.1007/978--3-030--58558--7_38

[39]

Lianhui Qin, Sean Welleck, Daniel Khashabi, and Yejin Choi. 2022. COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics. Advances in Neural Information Processing Systems, Vol. 35 (Dec. 2022), 9538--9551. https://proceedings.neurips.cc/paper_files/paper/2022/hash/3e25d1aff47964c8409fd5c8dc0438d7-Abstract-Conference.html

[40]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html

[41]

Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, and Jason Weston. 2019. Engaging Image Captioning via Personality. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12508--12518. https://doi.org/10.1109/CVPR.2019.01280 ISSN: 2575--7075.

[42]

Yang Song and Stefano Ermon. 2019. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html

[43]

Yoad Tewel, Yoav Shalev, Idan Schwartz, and Lior Wolf. 2022. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17897--17907. https://doi.org/10.1109/CVPR52688.2022.01739 ISSN: 2575--7075.

[44]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

[45]

Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. CIDEr: Consensus-based image description evaluation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4566--4575. https://doi.org/10.1109/CVPR.2015.7299087 ISSN: 1063--6919.

[46]

Ning Wang, Jiahao Xie, Jihao Wu, Mingbo Jia, and Linlin Li. 2023 b. Controllable Image Captioning via Prompting. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 2 (June 2023), 2617--2625. https://doi.org/10.1609/aaai.v37i2.25360 Number: 2.

Digital Library

[47]

Zhen Wang, Jun Xiao, Yueting Zhuang, Fei Gao, Jian Shao, and Long Chen. 2023 a. Learning Combinatorial Prompts for Universal Controllable Image Captioning. https://doi.org/10.48550/arXiv.2303.06338 arXiv:2303.06338 [cs].

[48]

Max Welling and Yee Whye Teh. 2011. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML'11). Omnipress, Madison, WI, USA, 681--688.

[49]

Kun Yan, Lei Ji, Huaishao Luo, Ming Zhou, Nan Duan, and Shuai Ma. 2021. Control Image Captioning Spatially and Temporally. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 2014--2025. https://doi.org/10.18653/v1/2021.acl-long.157

[50]

Kevin Yang and Dan Klein. 2021. FUDGE: Controlled Text Generation With Future Discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, Online, 3511--3535. https://doi.org/10.18653/v1/2021.naacl-main.276

[51]

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, Vol. 2 (2014), 67--78. https://doi.org/10.1162/tacl_a_00166 Place: Cambridge, MA Publisher: MIT Press.

[52]

Niange Yu, Xiaolin Hu, Binheng Song, Jian Yang, and Jianwei Zhang. 2019. Topic-Oriented Image Captioning Based on Order-Embedding. IEEE Transactions on Image Processing, Vol. 28, 6 (June 2019), 2743--2754. https://doi.org/10.1109/TIP.2018.2889922 Conference Name: IEEE Transactions on Image Processing.

[53]

Zequn Zeng, Hao Zhang, Ruiying Lu, Dongsheng Wang, Bo Chen, and Zhengjue Wang. 2023. ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23465--23476. https://doi.org/10.1109/CVPR52729.2023.02247 ISSN: 2575--7075.

[54]

Ji Zhang, Kuizhi Mei, Yu Zheng, and Jianping Fan. 2021. Integrating Part of Speech Guidance for Image Captioning. IEEE Transactions on Multimedia, Vol. 23 (2021), 92--104. https://doi.org/10.1109/TMM.2020.2976552 Conference Name: IEEE Transactions on Multimedia.

[55]

Wentian Zhao, Xinxiao Wu, and Xiaoxun Zhang. 2020. MemCap: Memorizing Style Knowledge for Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 07 (April 2020), 12984--12992. https://doi.org/10.1609/aaai.v34i07.6998 Number: 07.

[56]

Yue Zheng, Yali Li, and Shengjin Wang. 2019. Intention Oriented Image Captions With Guiding Objects. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8387--8396. https://doi.org/10.1109/CVPR.2019.00859 ISSN: 2575--7075.

Index Terms

SamCap: Energy-based Controllable Image Captioning by Gradient-Based Sampling
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
      1. Natural language generation

Recommendations

Length-Controllable Image Captioning
Computer Vision – ECCV 2020
Abstract
The last decade has witnessed remarkable progress in the image captioning task; however, most existing methods cannot control their captions, e.g., choosing to describe the image either roughly or in detail. In this paper, we propose to use a ...
Fully-attentive iterative networks for region-based controllable image and video captioning
Abstract
Controllable image captioning has recently gained attention as a way to increase the diversity and the applicability to real-world scenarios of image captioning algorithms. In this task, a captioner is conditioned on an external control signal, ...
Highlights
- We propose a fully-attentive and iterative network for controllable image captioning.
- We design novel attention operators that can deal with region-based control signals.
- We introduce a decoder which explicitly focuses on each part ...
Gradient-based sampling: an adaptive importance sampling for least-squares
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

In modern data analysis, random sampling is an efficient and widely-used strategy to overcome the computational difficulties brought by large sample size. In previous studies, researchers conducted random sampling which is according to the input data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Nature Science Foundation of China

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
104
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)30

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten