Abstract
The wide adoption of component-based software development and the (re)use of software residing in code hosting platforms have led to an increased interest shown towards source code readability and comprehensibility. One factor that can undeniably improve readability is the consistent code styling and formatting used across a project. To that end, many code formatting approaches usually define a set of rules, in order to model a commonly accepted formatting. However, this approach is mostly based on the experts’ expertise, is time-consuming and ignores the specific styling and formatting a team selects to use. Thus, it becomes too intrusive and may be not adopted. In this work, we present an automated mechanism that can be trained to identify deviations from the selected formatting style of a given project, given a set of source code files, and provide recommendations towards maintaining a common styling across all files of the project. At first, source code is transformed into small meaningful pieces, called tokens, which are used to train the models of our mechanism, in order to predict the probability of a token being wrongly positioned. Then, a number of possible fixes are examined as replacements of the wrongly positioned token and, based on a scoring function, the most suitable fixes are given as recommendations to the developer. Preliminary evaluation on various axes indicates that our approach can effectively detect formatting deviations from the project’s code styling and provide actionable recommendations to the developer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pp. 281–293. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2635868.2635883
Codrep: Codrep 2019 (2019). https://github.com/KTH/codrep-2019. Accessed 27 Sept 2020
GNU Project: Indent - GNU project (2007). https://www.gnu.org/software/indent/. Accessed 27 Sept 2020
Hellendoorn, V.J., Devanbu, P.: Are deep neural networks the best choice for modeling source code? In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pp. 763–773. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3106237.3106290
Hindle, A., Godfrey, M.W., Holt, R.C.: From indentation shapes to code structures. In: 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 111–120 (2008)
Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 473–479. MIT Press (1997). http://papers.nips.cc/paper/1215-lstm-can-solve-hard-long-time-lag-problems.pdf
Karanikiotis, T., Chatzidimitriou, K.C., Symeonidis, A.L.: Towards automatically generating a personalized code formatting mechanism. In: Proceedings of the 16th International Conference on Software Technologies (2021). https://doi.org/10.5220/0010579900900101
Kesler, T.E., Uram, R.B., Magareh-Abed, F., Fritzsche, A., Amport, C., Dunsmore, H.: The effect of indentation on program comprehension. Int. J. Man-Mach. Stud. 21(5), 415–428 (1984) https://doi.org/10.1016/S0020-7373(84)80068-1. http://www.sciencedirect.com/science/article/pii/S0020737384800681
Lee, T., Lee, J.B., In, H.: A study of different coding styles affecting code readability. Int. J. Softw. Eng. Its Appl. 7, 413–422 (2013). https://doi.org/10.14257/ijseia.2013.7.5.36
Loriot, B., Madeiral, F., Monperrus, M.: STYLER: learning formatting conventions to repair checkstyle errors. CoRR abs/1904.01754 (2019). http://arxiv.org/abs/1904.01754
Markovtsev, V., Long, W., Mougard, H., Slavnov, K., Bulychev, E.: Style-analyzer: fixing code style inconsistencies with interpretable unsupervised algorithms, pp. 468–478, May 2019. https://doi.org/10.1109/MSR.2019.00073. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85072331325 &doi=10.1109%2fMSR.2019.00073 &partnerID=40 &md5=1c53eb83d17352bd9e21fc03c40f7ef3
Miara, R.J., Musselman, J.A., Navarro, J.A., Shneiderman, B.: Program indentation and comprehensibility. Commun. ACM 26(11), 861–867 (1983). https://doi.org/10.1145/182.358437
Ogura, N., Matsumoto, S., Hata, H., Kusumoto, S.: Bring your own coding style. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 527–531 (2018). https://doi.org/10.1109/SANER.2018.8330253
Parr, T., Vinju, J.: Towards a universal code formatter through machine learning. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering, SLE 2016, pp. 137–151. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2997364.2997383
Posnett, D., Hindle, A., Devanbu, P.: A simpler model of software readability. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR 2011, pp. 73–82. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/1985441.1985454
Prabhu, R., Phutane, N., Dhar, S., Doiphode, S.: Dynamic formatting of source code in editors. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–6 (2017). https://doi.org/10.1109/ICIIECS.2017.8276008
Prettier: Prettier (2017). https://prettier.io/. Accessed 27 Sept 2020
Santos, E.A., Campbell, J.C., Patel, D., Hindle, A., Amaral, J.N.: Syntax and sensibility: using language models to detect and correct syntax errors. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 311–322 (2018)
Scalabrino, S., Linares-Vásquez, M., Poshyvanyk, D., Oliveto, R.: Improving code readability models with textual features. In: 2016 IEEE 24th International Conference on Program Comprehension (ICPC), pp. 1–10 (2016). https://doi.org/10.1109/ICPC.2016.7503707
Scalabrino, S., Linares-Vásquez, M., Oliveto, R., Poshyvanyk, D.: A comprehensive model for code readability. J. Softw. Evol. Process 30 (2018). https://doi.org/10.1002/smr.1958
Seo, K.K.: An application of one-class support vector machines in content-based image retrieval. Exp. Syst. Appl. 33(2), 491–498 (2007) https://doi.org/10.1016/j.eswa.2006.05.030. http://www.sciencedirect.com/science/article/pii/S0957417406001655
Tysell Sundkvist, L., Persson, E.: Code styling and its effects on code readability and interpretation. Ph.D. thesis, KTH Royal Institute of Technology (2017). http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209576
Wang, X., Pollock, L., Vijay-Shanker, K.: Automatic segmentation of method code into meaningful blocks to improve readability. In: 2011 18th Working Conference on Reverse Engineering, pp. 35–44 (2011)
White, M., Vendome, C., Linares-Vásquez, M., Poshyvanyk, D.: Toward deep learning software repositories. In: Proceedings of the 12th Working Conference on Mining Software Repositories, MSR 2015, pp. 334–345. IEEE Press (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Karanikiotis, T., Chatzidimitriou, K.C., Symeonidis, A.L. (2022). A Personalized Code Formatter: Detection and Fixing. In: Fill, HG., van Sinderen, M., Maciaszek, L.A. (eds) Software Technologies. ICSOFT 2021. Communications in Computer and Information Science, vol 1622. Springer, Cham. https://doi.org/10.1007/978-3-031-11513-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-11513-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11512-7
Online ISBN: 978-3-031-11513-4
eBook Packages: Computer ScienceComputer Science (R0)