An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers

Zhang, Chi; Cheng, Jingpu; Li, Qianxiao

doi:10.1007/978-3-031-73668-1_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15111))

Included in the following conference series:

European Conference on Computer Vision

223 Accesses

Abstract

While recent advancements in model fine-tuning predominantly emphasize the utilization of low-rank adaptation (LoRA), we propose an alternative approach centered on reducing the precision of adaptation matrices. In particular, we depart from the common viewpoint that considers adaptation matrices solely as weight differences, and reinterpret them as “control variables” to perturb pre-trained ViT systems. This new perspective enables the establishment of a control-oriented framework, facilitating the exploration of optimal controls guided by the Pontryagin Maximum Principle. Furthermore, we demonstrate that for bounded control sets such as hypercubes, the optimal controls often take on boundary values, leading naturally to a binary controller design. Theoretical analysis reveals that employing a binary control strategy achieves the same reachable state as its full-precision counterpart in the continuous idealisation of deep residual structures, a finding corroborated by later empirical investigations. Our studies further indicate that the controller’s rank holds greater significance than its precision. As such, opting for low-precision yet high-rank controls is demonstrated to obtain better performance for practical vision tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

When Do We Not Need Larger Vision Models?

Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning

Softmax-Free Linear Transformers

Article 13 March 2024

Notes

1.
It is possible to remove the normalization layer as [8, 12]. We retain such a block in this paper for simplicity of implementations.
2.
We also provide a pure gradient method in Appendix C.

References

Aghajanyan, A., Zettlemoyer, L., Gupta, S.: Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255 (2020)
Artstein, Z.: Discrete and continuous bang-bang and facial spaces or: look for the extreme points. SIAM Rev. 22(2), 172–185 (1980)
Article MathSciNet Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bellman, R., Glicksberg, I., Gross, O.: On the “bang-bang” control problem. Q. Appl. Math. 14(1), 11–18 (1956)
Google Scholar
Benning, M., Celledoni, E., Ehrhardt, M.J., Owren, B., Schönlieb, C.B.: Deep learning as optimal control problems: models and numerical methods. arXiv preprint arXiv:1904.05657 (2019)
Bishop, R.C.D.R.H.: Modern control systems (2011)
Google Scholar
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29
Chapter Google Scholar
Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International Conference on Machine Learning, pp. 1059–1071. PMLR (2021)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. Adv. Neural. Inf. Process. Syst. 35, 16664–16678 (2022)
Google Scholar
Chen, T., Zhang, Z., Ouyang, X., Liu, Z., Shen, Z., Wang, Z.: “bnn-bn=?”: training binary neural networks without batch normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4619–4629 (2021)
Google Scholar
Chernousko, F.L., Lyubushin, A.: Method of successive approximations for solution of optimal control problems. Optimal Control Appl. Methods 3(2), 101–114 (1982)
Article MathSciNet Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Craig, J.J.: Introduction to Robotics. Pearson Education (2006)
Google Scholar
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized LLMs. arXiv preprint arXiv:2305.14314 (2023)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Franklin, G.F., Powell, J.D., Emami-Naeini, A., Powell, J.D.: Feedback Control of Dynamic Systems, vol. 4. Prentice Hall, Upper Saddle River (2002)
Google Scholar
Gelfand, I.M., Silverman, R.A., et al.: Calculus of variations. Courier (2000)
Google Scholar
Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Prob. 34(1), 014004 (2017)
Article MathSciNet Google Scholar
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)
Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41
Chapter Google Scholar
Karimi Mahabadi, R., Henderson, J., Ruder, S.: Compacter: efficient low-rank hypercomplex adapter layers. Adv. Neural. Inf. Process. Syst. 34, 1022–1035 (2021)
Google Scholar
Kerimkulov, B., Šiška, D., Szpruch, L.: A modified MSA for stochastic control problems. Appl. Math. Optim. 1–20 (2021)
Google Scholar
Kirk, D.E.: Optimal control theory: an introduction. Courier Corporation (2004)
Google Scholar
Konečnỳ, J., McMahan, B., Ramage, D.: Federated optimization: distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Kwakernaak, H., Sivan, R.: Linear Optimal Control Systems, vol. 1. Wiley-Interscience, New York (1972)
Google Scholar
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)
Li, F., Liu, B., Wang, X., Zhang, B., Yan, J.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, Q., Chen, L., Tai, C., Weinan, E.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18(165), 1–29 (2018)
MathSciNet Google Scholar
Li, Q., Hao, S.: An optimal control approach to deep learning and applications to discrete-weight neural networks. In: International Conference on Machine Learning, pp. 2985–2994. PMLR (2018)
Google Scholar
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2011)
Google Scholar
Lin, M., et al.: Rotated binary neural network. Adv. Neural. Inf. Process. Syst. 33, 7474–7485 (2020)
Google Scholar
Lin, Z., Madotto, A., Fung, P.: Exploring versatile generative language model via parameter-efficient transfer learning. arXiv preprint arXiv:2004.03829 (2020)
Luo, G., et al.: Towards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106 (2023)
Lyubushin, A.: Modifications of the method of successive approximations for solving optimal control problems. USSR Comput. Math. Math. Phys. 22(1), 29–34 (1982)
Article MathSciNet Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Nguyen, N.T.: Model-Reference Adaptive Control. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56393-0
Book Google Scholar
Pontryagin, L.S.: Mathematical Theory of Optimal Processes. Routledge, Milton Park (2018)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Sohn, K., et al.: Visual prompt tuning for generative transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19840–19851 (2023)
Google Scholar
Sonneborn, L., Van Vleck, F.: The bang-bang principle for linear control systems. J. Soc. Ind. Appl. Math. Ser. A Control 2(2), 151–159 (1964)
Article MathSciNet Google Scholar
Sussmann, H.J., Willems, J.C.: 300 years of optimal control: from the brachystochrone to the maximum principle. IEEE Control Syst. Mag. 17(3), 32–44 (1997)
Article Google Scholar
Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 1(5), 1–11 (2017)
MathSciNet Google Scholar
Xu, Y., et al.: QA-LORA: quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717 (2023)
Zeng, Y., Lee, K.: The expressive power of low-rank adaptation. arXiv preprint arXiv:2310.17513 (2023)
Zhang, C., Ekanut, S., Zhen, L., Li, Z.: Augmented multi-party computation against gradient leakage in federated learning. IEEE Trans. Big Data (2022)
Google Scholar
Zhang, C., Jingpu, C., Xu, Y., Li, Q.: Parameter-efficient fine-tuning with controls. In: Forty-First International Conference on Machine Learning (2024)
Google Scholar
Zhang, C., Li, Q.: Distributed optimization for degenerate loss functions arising from over-parameterization. Artif. Intell. 301, 103575 (2021)
Article MathSciNet Google Scholar
Zhang, C., et al.: Generative gradient inversion via over-parameterized networks in federated learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5126–5135 (2023)
Google Scholar
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., Dong, B.: You only propagate once: accelerating adversarial training via maximal principle. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Download references

Acknowledgements

This research is supported by the National Research Foundation, Singapore, under the NRF fellowship (Project No. NRF-NRFF13-2021-0005).

Author information

Authors and Affiliations

Department of Mathematics, National University of Singapore, Singapore, Singapore
Chi Zhang, Jingpu Cheng & Qianxiao Li

Authors

Chi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jingpu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qianxiao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chi Zhang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 611 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Cheng, J., Li, Q. (2025). An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15111. Springer, Cham. https://doi.org/10.1007/978-3-031-73668-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-73668-1_9
Published: 01 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73667-4
Online ISBN: 978-3-031-73668-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers