Skip to main content

An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

While recent advancements in model fine-tuning predominantly emphasize the utilization of low-rank adaptation (LoRA), we propose an alternative approach centered on reducing the precision of adaptation matrices. In particular, we depart from the common viewpoint that considers adaptation matrices solely as weight differences, and reinterpret them as “control variables” to perturb pre-trained ViT systems. This new perspective enables the establishment of a control-oriented framework, facilitating the exploration of optimal controls guided by the Pontryagin Maximum Principle. Furthermore, we demonstrate that for bounded control sets such as hypercubes, the optimal controls often take on boundary values, leading naturally to a binary controller design. Theoretical analysis reveals that employing a binary control strategy achieves the same reachable state as its full-precision counterpart in the continuous idealisation of deep residual structures, a finding corroborated by later empirical investigations. Our studies further indicate that the controller’s rank holds greater significance than its precision. As such, opting for low-precision yet high-rank controls is demonstrated to obtain better performance for practical vision tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    It is possible to remove the normalization layer as [8, 12]. We retain such a block in this paper for simplicity of implementations.

  2. 2.

    We also provide a pure gradient method in Appendix C.

References

  1. Aghajanyan, A., Zettlemoyer, L., Gupta, S.: Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255 (2020)

  2. Artstein, Z.: Discrete and continuous bang-bang and facial spaces or: look for the extreme points. SIAM Rev. 22(2), 172–185 (1980)

    Article  MathSciNet  Google Scholar 

  3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  4. Bellman, R., Glicksberg, I., Gross, O.: On the “bang-bang” control problem. Q. Appl. Math. 14(1), 11–18 (1956)

    Google Scholar 

  5. Benning, M., Celledoni, E., Ehrhardt, M.J., Owren, B., Schönlieb, C.B.: Deep learning as optimal control problems: models and numerical methods. arXiv preprint arXiv:1904.05657 (2019)

  6. Bishop, R.C.D.R.H.: Modern control systems (2011)

    Google Scholar 

  7. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29

    Chapter  Google Scholar 

  8. Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International Conference on Machine Learning, pp. 1059–1071. PMLR (2021)

    Google Scholar 

  9. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  10. Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  11. Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. Adv. Neural. Inf. Process. Syst. 35, 16664–16678 (2022)

    Google Scholar 

  12. Chen, T., Zhang, Z., Ouyang, X., Liu, Z., Shen, Z., Wang, Z.: “bnn-bn=?”: training binary neural networks without batch normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4619–4629 (2021)

    Google Scholar 

  13. Chernousko, F.L., Lyubushin, A.: Method of successive approximations for solution of optimal control problems. Optimal Control Appl. Methods 3(2), 101–114 (1982)

    Article  MathSciNet  Google Scholar 

  14. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  15. Craig, J.J.: Introduction to Robotics. Pearson Education (2006)

    Google Scholar 

  16. Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized LLMs. arXiv preprint arXiv:2305.14314 (2023)

  17. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  18. Franklin, G.F., Powell, J.D., Emami-Naeini, A., Powell, J.D.: Feedback Control of Dynamic Systems, vol. 4. Prentice Hall, Upper Saddle River (2002)

    Google Scholar 

  19. Gelfand, I.M., Silverman, R.A., et al.: Calculus of variations. Courier (2000)

    Google Scholar 

  20. Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Prob. 34(1), 014004 (2017)

    Article  MathSciNet  Google Scholar 

  21. Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)

    Google Scholar 

  22. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  23. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  24. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

    Google Scholar 

  25. Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41

    Chapter  Google Scholar 

  26. Karimi Mahabadi, R., Henderson, J., Ruder, S.: Compacter: efficient low-rank hypercomplex adapter layers. Adv. Neural. Inf. Process. Syst. 34, 1022–1035 (2021)

    Google Scholar 

  27. Kerimkulov, B., Šiška, D., Szpruch, L.: A modified MSA for stochastic control problems. Appl. Math. Optim. 1–20 (2021)

    Google Scholar 

  28. Kirk, D.E.: Optimal control theory: an introduction. Courier Corporation (2004)

    Google Scholar 

  29. Konečnỳ, J., McMahan, B., Ramage, D.: Federated optimization: distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015)

  30. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  31. Kwakernaak, H., Sivan, R.: Linear Optimal Control Systems, vol. 1. Wiley-Interscience, New York (1972)

    Google Scholar 

  32. Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)

  33. Li, F., Liu, B., Wang, X., Zhang, B., Yan, J.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)

  34. Li, Q., Chen, L., Tai, C., Weinan, E.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18(165), 1–29 (2018)

    MathSciNet  Google Scholar 

  35. Li, Q., Hao, S.: An optimal control approach to deep learning and applications to discrete-weight neural networks. In: International Conference on Machine Learning, pp. 2985–2994. PMLR (2018)

    Google Scholar 

  36. Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)

  37. Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2011)

    Google Scholar 

  38. Lin, M., et al.: Rotated binary neural network. Adv. Neural. Inf. Process. Syst. 33, 7474–7485 (2020)

    Google Scholar 

  39. Lin, Z., Madotto, A., Fung, P.: Exploring versatile generative language model via parameter-efficient transfer learning. arXiv preprint arXiv:2004.03829 (2020)

  40. Luo, G., et al.: Towards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106 (2023)

  41. Lyubushin, A.: Modifications of the method of successive approximations for solving optimal control problems. USSR Comput. Math. Math. Phys. 22(1), 29–34 (1982)

    Article  MathSciNet  Google Scholar 

  42. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  43. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)

    Google Scholar 

  44. Nguyen, N.T.: Model-Reference Adaptive Control. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56393-0

    Book  Google Scholar 

  45. Pontryagin, L.S.: Mathematical Theory of Optimal Processes. Routledge, Milton Park (2018)

    Google Scholar 

  46. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  47. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  48. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  49. Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  50. Sohn, K., et al.: Visual prompt tuning for generative transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19840–19851 (2023)

    Google Scholar 

  51. Sonneborn, L., Van Vleck, F.: The bang-bang principle for linear control systems. J. Soc. Ind. Appl. Math. Ser. A Control 2(2), 151–159 (1964)

    Article  MathSciNet  Google Scholar 

  52. Sussmann, H.J., Willems, J.C.: 300 years of optimal control: from the brachystochrone to the maximum principle. IEEE Control Syst. Mag. 17(3), 32–44 (1997)

    Article  Google Scholar 

  53. Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  54. Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 1(5), 1–11 (2017)

    MathSciNet  Google Scholar 

  55. Xu, Y., et al.: QA-LORA: quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717 (2023)

  56. Zeng, Y., Lee, K.: The expressive power of low-rank adaptation. arXiv preprint arXiv:2310.17513 (2023)

  57. Zhang, C., Ekanut, S., Zhen, L., Li, Z.: Augmented multi-party computation against gradient leakage in federated learning. IEEE Trans. Big Data (2022)

    Google Scholar 

  58. Zhang, C., Jingpu, C., Xu, Y., Li, Q.: Parameter-efficient fine-tuning with controls. In: Forty-First International Conference on Machine Learning (2024)

    Google Scholar 

  59. Zhang, C., Li, Q.: Distributed optimization for degenerate loss functions arising from over-parameterization. Artif. Intell. 301, 103575 (2021)

    Article  MathSciNet  Google Scholar 

  60. Zhang, C., et al.: Generative gradient inversion via over-parameterized networks in federated learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5126–5135 (2023)

    Google Scholar 

  61. Zhang, D., Zhang, T., Lu, Y., Zhu, Z., Dong, B.: You only propagate once: accelerating adversarial training via maximal principle. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  62. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)

  63. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Download references

Acknowledgements

This research is supported by the National Research Foundation, Singapore, under the NRF fellowship (Project No. NRF-NRFF13-2021-0005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 611 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, C., Cheng, J., Li, Q. (2025). An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15111. Springer, Cham. https://doi.org/10.1007/978-3-031-73668-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73668-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73667-4

  • Online ISBN: 978-3-031-73668-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics