State and tendency: an empirical study of deep learning question&answer topics on Stack Overflow

Zhao, Henghui; Li, Yanhui; Liu, Fanwei; Xie, Xiaoyuan; Chen, Lin

doi:10.1007/s11432-019-3018-6

State and tendency: an empirical study of deep learning question&answer topics on Stack Overflow

Research Paper
Published: 15 October 2021

Volume 64, article number 212105, (2021)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Henghui Zhao¹,
Yanhui Li¹,
Fanwei Liu¹,
Xiaoyuan Xie² &
…
Lin Chen¹

162 Accesses
4 Citations
Explore all metrics

Abstract

Deep learning has developed rapidly in recent years, attracting the attention of numerous researchers. Since a wide range of topics are covered in this field, we are wondering what topics researchers have concerned about. However, after investigation, we find that very few researchers have paid attention to this demand. In this paper, we conduct a large-scale study to analyze the questions faced by deep learning developers. We use Stack Overflow, one of the largest question&answer sites, as our data source, and extract 32969 posts about deep learning as our studied dataset. After filtering, augmenting and pre-processing the post datasets from Stack Overflow, we use the latent Dirichlet allocation (LDA) topic model to summarize 30 topics based on their text content. In addition, we measure the difficulty and popularity of each topic, compare the different issues faced by different deep learning frameworks, and analyze the development trend of each topic. Our main results are as follows: (1) developers ask a broad spectrum of questions about deep-learning, ranging from Data Shape to Object Detection; (2) Gradient Propagation is the most popular among all the topics and (3) Object Detection is the most difficult; (4) issues of Package Installation, Code Understanding and Method Introduction are common in the current different deep learning frameworks; (5) there are three trends in these topics, e.g., a significant rising trend is found in the number of discussion on Data Shape. Finally, based on our research findings, we make some targeted and valuable suggestions for developers, researchers, educators, and framework providers of deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Common challenges of deep reinforcement learning applications development: an empirical study

Article 14 June 2024

Predicting Tags for Learner Questions on Stack Overflow

Article Open access 27 November 2024

Towards Quality Improvement and Prediction of Closed Questions on Stack Overflow

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Wan Z, Xia X, Lo D, et al. How does machine learning change software development practices? IEEE Trans Software Eng, 2020. doi: https://doi.org/10.1109/TSE.2019.2937083
Graves A, Mohamed A, Hinton G E. Speech recognition with deep recurrent neural networks. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, 2013. 6645–6649
Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention. 2015. ArXiv: 1412.7755
Redmon J, Divvala S K, Girshick R B, et al. You only look once: unified, real-time object detection. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, 2016. 779–788
Gawehn E, Hiss J A, Schneider G. Deep learning in drug discovery. Mol Inf, 2016, 35: 3–14
Article Google Scholar
Park Y, Kellis M. Deep learning for regulatory genomics. Nat Biotechnol, 2015, 33: 825–826
Article Google Scholar
Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. In: Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, 2016. 265–283
Collobert R, Kavukcuoglu K, Farabet C. Torch7: a matlab-like environment for machine learning. In: Proceedings of Neural Information Processing Systems, 2011
Jia Y, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia 2014. 675–678
Theano Development Team. Theano: a Python framework for fast computation of mathematical expressions. 2016. ArXiv:1605.02688
Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw, 2015, 61: 85–117
Article Google Scholar
Erickson B J, Korfiatis P, Akkus Z, et al. Toolkits and libraries for deep learning. J Digit Imag, 2017, 30: 400–405
Article Google Scholar
Rosen C, Shihab E. What are mobile developers asking about? A large scale study using stack overflow. Empir Softw Eng, 2016, 21: 1192–1223
Article Google Scholar
Yang X L, Lo D, Xia X, et al. What security questions do developers ask? A large-scale study of stack overflow posts. J Comput Sci Technol, 2016, 31: 910–924
Article Google Scholar
Ahmed S, Bagherzadeh M. What do concurrency developers ask about? A large-scale study using stack overflow. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Oulu, 2018. 1–10
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2012, 3: 993–1022
MATH Google Scholar
Allamanis M, Sutton C. Why, when, and what: analyzing stack overflow questions by topic, type, and code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, Piscataway, 2013. 53–56
Barua A, Thomas S W, Hassan A E. What are developers talking about? An analysis of topics and trends in Stack Overflow. Empir Softw Eng, 2014, 19: 619–654
Article Google Scholar
Bajaj K, Pattabiraman K, Mesbah A. Mining questions asked by web developers. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Hyderabad, 2014. 112–121
Rama G M, Sarkar S, Heafield K. Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st Annual India Software Engineering Conference, Hyderabad, 2008. 113–120
Arora R, Ravindran B. Latent dirichlet allocation based multi-document summarization. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, New York, 2008. 91–97
Bolelli L, Ertekin S, Giles C L. Topic and trend detection in text collections using latent dirichlet allocation. In: Advances in Information Retrieval. Berlin: Springer, 2009. 776–780
Chapter Google Scholar
Tirunillai, Tellis G J. Mining marketing meaning from online chatter: strategic brand analysis of big data using latent dirichlet allocation. J Marketing Res, 2014, 51: 463–479
Article Google Scholar
Guo Y, Barnes S J, Jia Q. Mining meaning from online ratings and reviews: tourist satisfaction analysis using latent dirichlet allocation. Tourism Manage, 2017, 59: 467–483
Article Google Scholar
Hoffman M D, Blei D M, Wang C, et al. Stochastic variational inference. J Mach Learn Res, 2013, 14: 1303–1347
MathSciNet MATH Google Scholar
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022
MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res, 2011, 12: 2825–2830
MathSciNet MATH Google Scholar
Chen Z F, Ma W W Y, Lin W, et al. A study on the changes of dynamic feature code when fixing bugs: towards the benefits and costs of Python dynamic features. Sci China Inf Sci, 2018, 61: 012107
Article Google Scholar
Chen L, Wu D, Ma W, et al. How C++ templates are used for generic programming. ACM Trans Softw Eng Methodol, 2020, 29: 1–49
Google Scholar
Chen Z, Chen L, Ma W, et al. Understanding metric-based detectable smells in Python software: a comparative study. Inf Softw Tech, 2018, 94: 14–29
Article Google Scholar
Guo Z, Li Y, Ma W, et al. Boosting crash-inducing change localization with rank-performance-based feature subset selection. Empir Softw Eng, 2020, 25: 1905–1950
Article Google Scholar
Wang C, Li Y, Chen L, et al. Examining the effects of developer familiarity on bug fixing. J Syst Softw, 2020, 169: 110667
Article Google Scholar
Nadi S, Krüger S, Mezini M, et al. Jumping through hoops: why do Java developers struggle with cryptography APIs? In: Proceedings of the 38th International Conference on Software Engineering, Hannover, 2017. 935–946
Pohlert T. Trend: non-parametric trend tests and change-point detection. 2018. R Package Version 1.1.1
Labovitz S. Criteria for selecting a significance level: a note on the sacredness of.05. The American Sociologist, 1968, 3: 220–222
Google Scholar
Boslaugh S, Watters P A. Statistics in a Nutshell: a Desktop Quick Reference. Sebastopol: O’Reilly Media, 2008
Google Scholar
Benesty J, Chen J, Huang Y, et al. Pearson Correlation Coefficient. Berlin: Springer, 2009
Book Google Scholar
Beyer S, Pinzger M. A manual categorization of android app development issues on stack overflow. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014. 531–535
Han J, Shihab E, Wan Z, et al. What do programmers discuss about deep learning frameworks. Empir Softw Eng, 2020, 25: 2694–2747
Article Google Scholar
Wan Z, Xia X, Hassan A E. What is discussed about blockchain? A case study on the use of balanced LDA and the reference architecture of a domain to capture online discussions about blockchain platforms across the stack exchange communities. IEEE Trans Softw Eng, 2019. doi: https://doi.org/10.1109/TSE.2019.2921343
Huang J, Peng M, Wang H, et al. A probabilistic method for emerging topic tracking in Microblog stream. World Wide Web, 2017, 20: 325–350
Article Google Scholar
Zhu C, Zhu H, Ge Y, et al. Tracking the evolution of social emotions with topic models. Knowl Inf Syst, 2016, 47: 517–544
Article Google Scholar
Xu T, Zhu H, Zhu C, et al. Measuring the popularity of job skills in recruitment market: a multi-criteria approach. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, 2018. 2572–2579

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2018YFB1003901) and National Natural Science Foundation of China (Grant Nos. 61872177, 61772259, 61972289, 61832009). We thank the anonymous referees for their helpful comments on this paper.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Henghui Zhao, Yanhui Li, Fanwei Liu & Lin Chen
School of Computer Science, Wuhan University, Wuhan, 430072, China
Xiaoyuan Xie

Authors

Henghui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yanhui Li
View author publications
You can also search for this author in PubMed Google Scholar
Fanwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyuan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Lin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanhui Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Li, Y., Liu, F. et al. State and tendency: an empirical study of deep learning question&answer topics on Stack Overflow. Sci. China Inf. Sci. 64, 212105 (2021). https://doi.org/10.1007/s11432-019-3018-6

Download citation

Received: 06 December 2019
Revised: 18 March 2020
Accepted: 07 May 2020
Published: 15 October 2021
DOI: https://doi.org/10.1007/s11432-019-3018-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State and tendency: an empirical study of deep learning question&answer topics on Stack Overflow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Common challenges of deep reinforcement learning applications development: an empirical study

Predicting Tags for Learner Questions on Stack Overflow

Towards Quality Improvement and Prediction of Closed Questions on Stack Overflow

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

State and tendency: an empirical study of deep learning question&answer topics on Stack Overflow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Common challenges of deep reinforcement learning applications development: an empirical study

Predicting Tags for Learner Questions on Stack Overflow

Towards Quality Improvement and Prediction of Closed Questions on Stack Overflow

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation