ABSTRACT
The process of classifying a piece of source code into a Readable or Unreadable class is referred to as Code Readability Classification. To build accurate classification models, existing studies focus on handcrafting features from different aspects that intuitively seem to correlate with code readability, and then exploring various machine learning algorithms based on the newly proposed features. On the contrary, our work opens up a new way to tackle the problem by using the technique of deep learning. Specifically, we propose IncepCRM, a novel model based on the Inception architecture that can learn multi-scale features automatically from source code with little manual intervention. We apply the information of human annotators as the auxiliary input for training IncepCRM and empirically verify the performance of IncepCRM on three publicly available datasets. The results show that: 1) Annotator information is beneficial for model performance as confirmed by robust statistical tests (i.e., the Brunner-Munzel test and Cliff's delta); 2) IncepCRM can achieve an improved accuracy against previously reported models across all datasets. The findings of our study confirm the feasibility and effectiveness of deep learning for code readability classification.
- Raymond P L Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Transactions on Software Engineering 36, 4 (jul 2010), 546--558. Google ScholarDigital Library
- Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494--509.Google ScholarCross Ref
- Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 1. 1107--1116.Google ScholarCross Ref
- Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. 2015. Modeling readability to improve unit tests. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015. ACM Press, New York, New York, USA, 107--118. Google ScholarDigital Library
- Hoa Khanh Dam, Truyen Tran, John Grundy, and Aditya Ghose. 2016. DeepSoft: a vision for a deep model of software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016, Vol. 1691. ACM Press, New York, New York, USA, 944--947. arXiv:1602.05561 Google ScholarDigital Library
- Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein. 2005. A Tutorial on the Cross-Entropy Method. Annals of Operations Research 134, 1 (feb 2005), 19--67.Google ScholarCross Ref
- Jonathan Dorn. 2012. A General Software Readability Model. MCS Thesis avairable from (http://www.cs.virginia.edu/~weimer/students/dorn-mcs-paper.pdf) (2012).Google Scholar
- Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology 32, 3 (1948), 221.Google ScholarCross Ref
- Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016. ACM Press, New York, New York, USA, 631--642. arXiv:1508.06655 Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 770--778. arXiv:1512.03385Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. (2014), 1--15. arXiv:1412.6980Google Scholar
- Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust Statistical Methods for Empirical Software Engineering. Empirical Software Engineering 22, 2 (apr 2017), 579--630. Google ScholarDigital Library
- Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. PMLR, 1378--1387. http://proceedings.mlr.press/v48/kumar16.html Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444. arXiv:1312.6184v5Google Scholar
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 (dec 1989), 541--551. Google ScholarDigital Library
- Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems. 396--404. Google ScholarDigital Library
- Taek Lee, Jung Been Lee, and Hoh Peter In. 2013. A study of different coding styles affecting code readability. International Journal of Software Engineering and its Applications 7, 5 (2013), 413--422.Google Scholar
- Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).Google Scholar
- G Harry Mc Laughlin. 1969. SMOG grading-a new readability formula. Journal of reading 12, 8 (1969), 639--646.Google Scholar
- Karin Neubert and Edgar Brunner. 2007. A studentized permutation test for the non-parametric Behrens-Fisher problem. Computational Statistics & Data Analysis 51, 10 (jun 2007), 5192--5204. Google ScholarDigital Library
- Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. In Proceeding of the 8th working conference on Mining software repositories - MSR '11, Vol. 11. ACM Press, New York, New York, USA, 73. Google ScholarDigital Library
- Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and cohen's d for evaluating group differences on the NSSE and other surveys. In annual meeting of the Florida Association of Institutional Research. 1--33.Google Scholar
- Simone Scalabrino, Mario Linares-Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2016. Improving code readability models with textual features. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC), Vol. 2016-July. IEEE, 1--10.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Science (New York, N.Y.) 313, 5786 (sep 2014), 504--7. arXiv:1409.1556Google Scholar
- Christian Szegedy. {n. d.}. Scene classification with inception-7.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1--9. arXiv:1409.4842Google ScholarCross Ref
- Yahya Tashtoush, Zeinab Odat, Izzat Alsmadi, and Maryan Yatim. 2013. Impact of Programming Features on Code Readability. International Journal of Software Engineering and Its Applications 7, 6 (nov 2013), 441--458.Google ScholarCross Ref
- Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering - ICSE '16, Vol. 14--22-May-. ACM Press, New York, New York, USA, 297--308. Google ScholarDigital Library
Index Terms
- An Inception Architecture-Based Model for Improving Code Readability Classification
Recommendations
Towards using visual, semantic and structural features to improve code readability classification
Abstract Context:Code readability, which correlates strongly with software quality, plays a critical role in software maintenance and evolvement. Although existing deep learning-based code readability models have reached a rather ...
Highlights- A novel method is proposed to reserve visual, semantic and structural information.
Multiclass wound image classification using an ensemble deep CNN-based classifier
AbstractAcute and chronic wounds are a challenge to healthcare systems around the world and affect many people's lives annually. Wound classification is a key step in wound diagnosis that would help clinicians to identify an optimal treatment ...
Highlights- Deep Learning-based methods can be used for wound analysis.
- DCNN-based ...
Comments