Probabilistic Interpretation of the Distillation Problem

Grabovoy, A. V.; Strijov, V. V.

doi:10.1134/S000511792201009X

Probabilistic Interpretation of the Distillation Problem

OPTIMIZATION, SYSTEM ANALYSIS, OPERATIONS RESEARCH
Published: 15 February 2022

Volume 83, pages 123–137, (2022)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

A. V. Grabovoy¹ &
V. V. Strijov^1,2

67 Accesses
2 Citations
Explore all metrics

Abstract

The article deals with methods for reducing the complexity of approximating models. Probabilistic substantiation of distillation and privileged teaching methods is proposed. General conclusions are given for an arbitrary parametric function with a predetermined structure. A theoretical basis is demonstrated for the special cases of linear and logistic regression. The analysis of the considered models is carried out in a computational experiment on synthetic samples and real data. The FashionMNIST and Twitter Sentiment Analysis samples are considered as real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Paul Erdős and Probabilistic Reasoning

Bayesian computation: a summary of the current state, and samples backwards and forwards

Article Open access 11 June 2015

Logistic Models with Piecewise Arguments

REFERENCES

Vaswani, A., Gomez, A., Jones, L., Kaiser, L., Parmar, N., Polosukhin, I., Shazeer, N., and Uszkoreit, J., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 5, pp. 6000–6010.
Google Scholar
Devlin, J., Chang, M., Lee, K., and Toutanova, K., BERT: pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. North Am. Ch. Assoc. Comput. Linguist.: Hum. Lang. Technol. (Minnesota, 2019), vol. 1, pp. 4171–4186.
He, K., Ren, S., Sun, J., and Zhang, X., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vision Pattern Recognit. (Las Vegas, 2016), pp. 770–778.
Bakhteev, O.Yu. and Strijov, V.V., Deep learning model selection of suboptimal complexity, Autom. Remote Control, 2018, vol. 79, pp. 1474–1488.
Article MathSciNet Google Scholar
Hinton, G., Dean, J., and Vinyals, O., Distilling the knowledge in a neural network, NIPS Deep Learn. Representation Learn. Workshop (2015).
Bucilu, C., Caruana, R., and Mizil, A., Model compression, Proc. ACM SIGKDD Conf. Knowl. Discovery Data Min. (Philadelphia, 2006), pp. 535–541.
Lopez-Paz, D., Bottou, L., Scholkopf, B., and Vapnik, V., Unifying distillation and privileged information, Int. Conf. Learn. Representations (Puerto Rico, 2016).
Tang, Z., Wang, D., and Zhang, Z., Recurrent neural network training with dark knowledge transfer, Proc. IEEE Conf. Acoust. Speech Signal Process. (Shanghai, 2016), vol. 2, pp. 5900–5904.
Darrell, T., Hoffman, J., Saenko, K., and Tzeng, E., Simultaneous deep transfer across domains and tasks, Proc. IEEE Conf. Comput. Vision (Santiago, 2015), vol. 2, pp. 4068–4076.
Ahn, S., Dai, Z., Damianou, A., Hu, S., and Lawrence, N., Variational information distillation for knowledge transfer, Proc. IEEE Conf. Comput. Vision Pattern Recognit. (Long Beach, 2019), pp. 9163–9171.
Burges, C., Cortes, C., and LeCun, Y., The MNIST dataset of handwritten digits, 1998. http://yann.lecun.com/exdb/mnist/index.html .
Che, Z., Chen, Y., Guoping, H., Liu, W., Wang, T., and Ziqing, Y., TextBrewer: an open-source knowledge distillation toolkit for natural language processing, Proc. 58th Annu. Meet. Assoc. Comput. Linguist.: Syst. Demonstr. (Online, 2020).
Huang, Z. and Naiyan, W., Like what you like: knowledge distill via neuron selectivity transfer, 2017. .
Fu, T., Lei, Z., Liao, S., Mei, T., Wang, S., and Wang, X., Exclusivity-consistency regularized knowledge distillation for face recognition, Lect. Notes Comput. Sci., 2020, vol. 1, pp. 23–69.
Google Scholar
Vapnik, V. and Izmailov, R., Learning using privileged information: similarity control and knowledge transfer, J. Mach. Learn. Res., 2015, vol. 16, pp. 2023–2049.
MathSciNet MATH Google Scholar
Ivakhnenko, A. and Madala, H., Inductive Learning Algorithms for Complex Systems Modeling, Boca Raton: CRC Press, 1994.
MATH Google Scholar
Rasul, K., Vollgraf, R., and Xiao, H., Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms, 2017. .
Kozareva, Z., Nakov, P., Ritter, A., Rosenthal, S., Stoyanov, V., and Wilson, T., SemEval-2013 task 2: sentiment analysis in Twitter, Proc. Seventh Int. Workshop Semantic Eval. (SemEval 2013) (Atlanta, 2013), pp. 312–320.
Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., and LeCun, Y., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1989, vol. 1, no. 4, pp. 541–551.
Article Google Scholar
Hochreiter, S. and Schmidhuber, J., Long short-term memory, Neural Comput., 1997, vol. 9, no. 8, pp. 1735–1780.
Article Google Scholar
Ba, J. and Kingma D, Adam: a method for stochastic optimization, Int. Conf. Learn. Representations (San Diego, 2014).
Kod vychislitel’nogo eksperimenta (Computational Experiment Source Code). https://github.com/andriygav/PrivilegeLearning .

Download references

Funding

This article contains the results of the project “Mathematical Methods for Mining Big Data,” carried out as part of the implementation of the Competence Center Program of the National Technology Initiative “Big Data Storage and Analysis Center,” supported by the Ministry of Science and Higher Education of the Russian Federation under agreement no. 13/1251/2018 of December 11, 2018 between Lomonosov Moscow State University and the Foundation for Support of Projects of the National Technology Initiative. This work was supported by the Russian Foundation for Basic Research, projects nos. 19-07-01155, 19-07-00875, and 19-07-00885.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow oblast, 141701, Russia
A. V. Grabovoy & V. V. Strijov
Dorodnicyn Computing Centre, Russian Academy of Sciences, Moscow, 119333, Russia
V. V. Strijov

Authors

A. V. Grabovoy
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Strijov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. V. Grabovoy or V. V. Strijov.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grabovoy, A.V., Strijov, V.V. Probabilistic Interpretation of the Distillation Problem. Autom Remote Control 83, 123–137 (2022). https://doi.org/10.1134/S000511792201009X

Download citation

Received: 29 August 2020
Revised: 14 August 2021
Accepted: 29 August 2021
Published: 15 February 2022
Issue Date: January 2022
DOI: https://doi.org/10.1134/S000511792201009X

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions