Skip to main content
Log in

Probabilistic Interpretation of the Distillation Problem

  • OPTIMIZATION, SYSTEM ANALYSIS, OPERATIONS RESEARCH
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

The article deals with methods for reducing the complexity of approximating models. Probabilistic substantiation of distillation and privileged teaching methods is proposed. General conclusions are given for an arbitrary parametric function with a predetermined structure. A theoretical basis is demonstrated for the special cases of linear and logistic regression. The analysis of the considered models is carried out in a computational experiment on synthetic samples and real data. The FashionMNIST and Twitter Sentiment Analysis samples are considered as real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Similar content being viewed by others

REFERENCES

  1. Vaswani, A., Gomez, A., Jones, L., Kaiser, L., Parmar, N., Polosukhin, I., Shazeer, N., and Uszkoreit, J., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 5, pp. 6000–6010.

    Google Scholar 

  2. Devlin, J., Chang, M., Lee, K., and Toutanova, K., BERT: pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. North Am. Ch. Assoc. Comput. Linguist.: Hum. Lang. Technol. (Minnesota, 2019), vol. 1, pp. 4171–4186.

  3. He, K., Ren, S., Sun, J., and Zhang, X., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vision Pattern Recognit. (Las Vegas, 2016), pp. 770–778.

  4. Bakhteev, O.Yu. and Strijov, V.V., Deep learning model selection of suboptimal complexity, Autom. Remote Control, 2018, vol. 79, pp. 1474–1488.

    Article  MathSciNet  Google Scholar 

  5. Hinton, G., Dean, J., and Vinyals, O., Distilling the knowledge in a neural network, NIPS Deep Learn. Representation Learn. Workshop (2015).

  6. Bucilu, C., Caruana, R., and Mizil, A., Model compression, Proc. ACM SIGKDD Conf. Knowl. Discovery Data Min. (Philadelphia, 2006), pp. 535–541.

  7. Lopez-Paz, D., Bottou, L., Scholkopf, B., and Vapnik, V., Unifying distillation and privileged information, Int. Conf. Learn. Representations (Puerto Rico, 2016).

  8. Tang, Z., Wang, D., and Zhang, Z., Recurrent neural network training with dark knowledge transfer, Proc. IEEE Conf. Acoust. Speech Signal Process. (Shanghai, 2016), vol. 2, pp. 5900–5904.

  9. Darrell, T., Hoffman, J., Saenko, K., and Tzeng, E., Simultaneous deep transfer across domains and tasks, Proc. IEEE Conf. Comput. Vision (Santiago, 2015), vol. 2, pp. 4068–4076.

  10. Ahn, S., Dai, Z., Damianou, A., Hu, S., and Lawrence, N., Variational information distillation for knowledge transfer, Proc. IEEE Conf. Comput. Vision Pattern Recognit. (Long Beach, 2019), pp. 9163–9171.

  11. Burges, C., Cortes, C., and LeCun, Y., The MNIST dataset of handwritten digits, 1998. http://yann.lecun.com/exdb/mnist/index.html .

  12. Che, Z., Chen, Y., Guoping, H., Liu, W., Wang, T., and Ziqing, Y., TextBrewer: an open-source knowledge distillation toolkit for natural language processing, Proc. 58th Annu. Meet. Assoc. Comput. Linguist.: Syst. Demonstr. (Online, 2020).

  13. Huang, Z. and Naiyan, W., Like what you like: knowledge distill via neuron selectivity transfer, 2017. .

  14. Fu, T., Lei, Z., Liao, S., Mei, T., Wang, S., and Wang, X., Exclusivity-consistency regularized knowledge distillation for face recognition, Lect. Notes Comput. Sci., 2020, vol. 1, pp. 23–69.

    Google Scholar 

  15. Vapnik, V. and Izmailov, R., Learning using privileged information: similarity control and knowledge transfer, J. Mach. Learn. Res., 2015, vol. 16, pp. 2023–2049.

    MathSciNet  MATH  Google Scholar 

  16. Ivakhnenko, A. and Madala, H., Inductive Learning Algorithms for Complex Systems Modeling, Boca Raton: CRC Press, 1994.

    MATH  Google Scholar 

  17. Rasul, K., Vollgraf, R., and Xiao, H., Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms, 2017. .

  18. Kozareva, Z., Nakov, P., Ritter, A., Rosenthal, S., Stoyanov, V., and Wilson, T., SemEval-2013 task 2: sentiment analysis in Twitter, Proc. Seventh Int. Workshop Semantic Eval. (SemEval 2013) (Atlanta, 2013), pp. 312–320.

  19. Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., and LeCun, Y., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1989, vol. 1, no. 4, pp. 541–551.

    Article  Google Scholar 

  20. Hochreiter, S. and Schmidhuber, J., Long short-term memory, Neural Comput., 1997, vol. 9, no. 8, pp. 1735–1780.

    Article  Google Scholar 

  21. Ba, J. and Kingma D, Adam: a method for stochastic optimization, Int. Conf. Learn. Representations (San Diego, 2014).

  22. Kod vychislitel’nogo eksperimenta (Computational Experiment Source Code). https://github.com/andriygav/PrivilegeLearning .

Download references

Funding

This article contains the results of the project “Mathematical Methods for Mining Big Data,” carried out as part of the implementation of the Competence Center Program of the National Technology Initiative “Big Data Storage and Analysis Center,” supported by the Ministry of Science and Higher Education of the Russian Federation under agreement no. 13/1251/2018 of December 11, 2018 between Lomonosov Moscow State University and the Foundation for Support of Projects of the National Technology Initiative. This work was supported by the Russian Foundation for Basic Research, projects nos. 19-07-01155, 19-07-00875, and 19-07-00885.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Grabovoy or V. V. Strijov.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grabovoy, A.V., Strijov, V.V. Probabilistic Interpretation of the Distillation Problem. Autom Remote Control 83, 123–137 (2022). https://doi.org/10.1134/S000511792201009X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S000511792201009X

Keywords

Navigation