Abstract
While recent advancements in model fine-tuning predominantly emphasize the utilization of low-rank adaptation (LoRA), we propose an alternative approach centered on reducing the precision of adaptation matrices. In particular, we depart from the common viewpoint that considers adaptation matrices solely as weight differences, and reinterpret them as “control variables” to perturb pre-trained ViT systems. This new perspective enables the establishment of a control-oriented framework, facilitating the exploration of optimal controls guided by the Pontryagin Maximum Principle. Furthermore, we demonstrate that for bounded control sets such as hypercubes, the optimal controls often take on boundary values, leading naturally to a binary controller design. Theoretical analysis reveals that employing a binary control strategy achieves the same reachable state as its full-precision counterpart in the continuous idealisation of deep residual structures, a finding corroborated by later empirical investigations. Our studies further indicate that the controller’s rank holds greater significance than its precision. As such, opting for low-precision yet high-rank controls is demonstrated to obtain better performance for practical vision tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aghajanyan, A., Zettlemoyer, L., Gupta, S.: Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255 (2020)
Artstein, Z.: Discrete and continuous bang-bang and facial spaces or: look for the extreme points. SIAM Rev. 22(2), 172–185 (1980)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bellman, R., Glicksberg, I., Gross, O.: On the “bang-bang” control problem. Q. Appl. Math. 14(1), 11–18 (1956)
Benning, M., Celledoni, E., Ehrhardt, M.J., Owren, B., Schönlieb, C.B.: Deep learning as optimal control problems: models and numerical methods. arXiv preprint arXiv:1904.05657 (2019)
Bishop, R.C.D.R.H.: Modern control systems (2011)
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29
Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International Conference on Machine Learning, pp. 1059–1071. PMLR (2021)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. Adv. Neural. Inf. Process. Syst. 35, 16664–16678 (2022)
Chen, T., Zhang, Z., Ouyang, X., Liu, Z., Shen, Z., Wang, Z.: “bnn-bn=?”: training binary neural networks without batch normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4619–4629 (2021)
Chernousko, F.L., Lyubushin, A.: Method of successive approximations for solution of optimal control problems. Optimal Control Appl. Methods 3(2), 101–114 (1982)
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Craig, J.J.: Introduction to Robotics. Pearson Education (2006)
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized LLMs. arXiv preprint arXiv:2305.14314 (2023)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Franklin, G.F., Powell, J.D., Emami-Naeini, A., Powell, J.D.: Feedback Control of Dynamic Systems, vol. 4. Prentice Hall, Upper Saddle River (2002)
Gelfand, I.M., Silverman, R.A., et al.: Calculus of variations. Courier (2000)
Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Prob. 34(1), 014004 (2017)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41
Karimi Mahabadi, R., Henderson, J., Ruder, S.: Compacter: efficient low-rank hypercomplex adapter layers. Adv. Neural. Inf. Process. Syst. 34, 1022–1035 (2021)
Kerimkulov, B., Šiška, D., Szpruch, L.: A modified MSA for stochastic control problems. Appl. Math. Optim. 1–20 (2021)
Kirk, D.E.: Optimal control theory: an introduction. Courier Corporation (2004)
Konečnỳ, J., McMahan, B., Ramage, D.: Federated optimization: distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Kwakernaak, H., Sivan, R.: Linear Optimal Control Systems, vol. 1. Wiley-Interscience, New York (1972)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)
Li, F., Liu, B., Wang, X., Zhang, B., Yan, J.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, Q., Chen, L., Tai, C., Weinan, E.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18(165), 1–29 (2018)
Li, Q., Hao, S.: An optimal control approach to deep learning and applications to discrete-weight neural networks. In: International Conference on Machine Learning, pp. 2985–2994. PMLR (2018)
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2011)
Lin, M., et al.: Rotated binary neural network. Adv. Neural. Inf. Process. Syst. 33, 7474–7485 (2020)
Lin, Z., Madotto, A., Fung, P.: Exploring versatile generative language model via parameter-efficient transfer learning. arXiv preprint arXiv:2004.03829 (2020)
Luo, G., et al.: Towards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106 (2023)
Lyubushin, A.: Modifications of the method of successive approximations for solving optimal control problems. USSR Comput. Math. Math. Phys. 22(1), 29–34 (1982)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Nguyen, N.T.: Model-Reference Adaptive Control. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56393-0
Pontryagin, L.S.: Mathematical Theory of Optimal Processes. Routledge, Milton Park (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Sohn, K., et al.: Visual prompt tuning for generative transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19840–19851 (2023)
Sonneborn, L., Van Vleck, F.: The bang-bang principle for linear control systems. J. Soc. Ind. Appl. Math. Ser. A Control 2(2), 151–159 (1964)
Sussmann, H.J., Willems, J.C.: 300 years of optimal control: from the brachystochrone to the maximum principle. IEEE Control Syst. Mag. 17(3), 32–44 (1997)
Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 1(5), 1–11 (2017)
Xu, Y., et al.: QA-LORA: quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717 (2023)
Zeng, Y., Lee, K.: The expressive power of low-rank adaptation. arXiv preprint arXiv:2310.17513 (2023)
Zhang, C., Ekanut, S., Zhen, L., Li, Z.: Augmented multi-party computation against gradient leakage in federated learning. IEEE Trans. Big Data (2022)
Zhang, C., Jingpu, C., Xu, Y., Li, Q.: Parameter-efficient fine-tuning with controls. In: Forty-First International Conference on Machine Learning (2024)
Zhang, C., Li, Q.: Distributed optimization for degenerate loss functions arising from over-parameterization. Artif. Intell. 301, 103575 (2021)
Zhang, C., et al.: Generative gradient inversion via over-parameterized networks in federated learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5126–5135 (2023)
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., Dong, B.: You only propagate once: accelerating adversarial training via maximal principle. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
Acknowledgements
This research is supported by the National Research Foundation, Singapore, under the NRF fellowship (Project No. NRF-NRFF13-2021-0005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, C., Cheng, J., Li, Q. (2025). An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15111. Springer, Cham. https://doi.org/10.1007/978-3-031-73668-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-73668-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73667-4
Online ISBN: 978-3-031-73668-1
eBook Packages: Computer ScienceComputer Science (R0)