Skip to main content

Visualization of Source Code Similarity Using 2.5D Semantic Software Maps

  • Conference paper
  • First Online:
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021)

Abstract

For various program comprehension tasks, software visualization techniques can be beneficial by displaying aspects related to the behavior, structure, or evolution of software. In many cases, the question is related to the semantics of the source code files, e.g., the localization of files that implement specific features or the detection of files with similar semantics. This work presents a general software visualization technique for source code documents, which uses 3D glyphs placed on a two-dimensional reference plane. The relative positions of the glyphs captures their semantic relatedness. Our layout originates from applying Latent Dirichlet Allocation and Multidimensional Scaling on the comments and identifier names found in the source code files. Though different variants for 3D glyphs can be applied, we focus on cylinders, trees, and avatars. We discuss various mappings of data associated with source code documents to the visual variables of 3D glyphs for selected use cases and provide details on our visualization system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Source code taken from https://github.com/cginternals/globjects.

  2. 2.

    Source code taken from https://github.com/notepad-plus-plus/notepad-plus-plus.

  3. 3.

    Source code taken from github.com/bitcoin/bitcoin.

  4. 4.

    https://www.nltk.org/.

  5. 5.

    https://spacy.io/.

  6. 6.

    https://radimrehurek.com/gensim/.

  7. 7.

    https://scikit-learn.org/stable/.

  8. 8.

    https://webgl-operate.org/.

  9. 9.

    https://sketchfab.com/feed.

  10. 10.

    https://www.blender.org/.

  11. 11.

    Source code taken from https://github.com/tensorflow/tensorflow.

References

  1. Aggarwal, C.C., Zhai, C.: Mining text data. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-3223-4

  2. Atzberger, D., et al.: Software forest: a visualization of semantic similarities in source code using a tree metaphor. In: Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3 IVAPP, IVAPP 2021, pp. 112–122. INSTICC, SciTePress (2021). https://doi.org/10.5220/0010267601120122

  3. Atzberger, D., et al.: Visualization of knowledge distribution across development teams using 2.5d semantic software maps. In: Proceedings of 13th International Conference on Information Visualization Theory and Applications, IVAPP 2022, INSTICC, SciTePress (2022)

    Google Scholar 

  4. Atzberger, D., Scheibel, W., Limberger, D., Döllner, J.: Software galaxies: displaying coding activities using a galaxy metaphor. In: Proceedings of 14th International Symposium on Visual Information Communication and Interaction, VINCI 2021, pp. 18:1–2. ACM (2021). https://doi.org/10.1145/3481549.3481573

  5. Beck, F.: Software feathers - figurative visualization of software metrics. In: Proceedings of 5th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, IVAPP 2014, pp. 5–16. INSTICC, SciTePress (2014). https://doi.org/10.5220/0004650100050016

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.5555/944919.944937

    Article  MATH  Google Scholar 

  7. Chen, T.-H., Thomas, S.W., Hassan, A.E.: A survey on the use of topic models when mining software repositories. Empirical Softw. Eng. 21(5), 1843–1919 (2015). https://doi.org/10.1007/s10664-015-9402-8

    Article  Google Scholar 

  8. Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. J. Am. Stat. Assoc. 68(342), 361–368 (1973). https://doi.org/10.1080/01621459.1973.10482434

    Article  Google Scholar 

  9. Chuah, M., Eick, S.: Glyphs for software visualization. In: Proceedings of 5th International Workshop on Program Comprehension, IWPC 1997, pp. 183–191. IEEE (1997). https://doi.org/10.1109/WPC.1997.601291

  10. Cornelissen, B., Zaidman, A., van Deursen, A.: A controlled experiment for program comprehension through trace visualization. IEEE Trans. Softw. Eng. 37(3), 341–355 (2011). https://doi.org/10.1109/TSE.2010.47

    Article  Google Scholar 

  11. Cornelissen, B., Zaidman, A., Holten, D., Moonen, L., van Deursen, A., van Wijk, J.J.: Execution trace analysis through massive sequence and circular bundle views. J. Syst. Softw. 81(12), 2252–2268 (2008). https://doi.org/10.1016/j.jss.2008.02.068

    Article  Google Scholar 

  12. Cox, M.A.A., Cox, T.F.: Multidimensional scaling. In: Handbook of Data Visualization, pp. 315–347. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-33037-0_14

  13. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

  14. Dehaghani, S.M.H., Hajrahimi, N.: Which factors affect software projects maintenance cost more? Acta Informatica Medica 21(1), 63–66 (2013). https://doi.org/10.5455/aim.2012.21.63-66

    Article  Google Scholar 

  15. Dübel, S., Röhlig, M., Schumann, H., Trapp, M.: 2d and 3d presentation of spatial data: a systematic review. In: Proceedings of VIS International Workshop on 3DVis, 3DVis ’14, pp. 11–18. IEEE (2014). https://doi.org/10.1109/3DVis.2014.7160094

  16. Erra, U., Scanniello, G.: Towards the visualization of software systems as 3d forests: the CodeTrees environment. In: Proceedings of 27th Annual ACM Symposium on Applied Computing, SAC 2012, pp. 981–988. ACM (2012). https://doi.org/10.1145/2245276.2245467

  17. Erra, U., Scanniello, G., Capece, N.: Visualizing the evolution of software systems using the forest metaphor. In: Proceedings of 16th International Conference on Information Visualisation, iV 2012, pp. 87–92 (2012). https://doi.org/10.1109/IV.2012.25

  18. Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S.T., Telea, A.C.: Toward a quantitative survey of dimension reduction techniques. Trans. Vis. Comput. Graph. 27(3), 2153–2173 (2021). https://doi.org/10.1109/TVCG.2019.2944182

    Article  Google Scholar 

  19. Fernandez, I., Bergel, A., Alcocer, J.P.S., Infante, A., Gîrba, T.: Glyph-based software component identification. In: Proceedings of 24th International Conference on Program Comprehension, ICPC 2016, pp. 1–10. IEEE (2016). https://doi.org/10.1109/ICPC.2016.7503713

  20. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101

    Article  Google Scholar 

  21. Hawes, N., Marshall, S., Anslow, C.: CodeSurveyor: Mapping large-scale software to aid in code comprehension. In: Proceedings of 3rd Working Conference on Software Visualization, VISSOFT 2015, pp. 96–105. IEEE (2015). https://doi.org/10.1109/VISSOFT.2015.7332419

  22. Hoffman, M., Bach, F., Blei, D.: Online learning for latent dirichlet allocation. In: Advances in Neural Information Processing Systems. NIPS 2010, vol. 23, pp. 856–864 (2010)

    Google Scholar 

  23. Systems and software engineering-Vocabulary: Standard. International Organization for Standardization (2017). https://doi.org/10.1109/IEEESTD.2017.8016712

  24. Kleiberg, E., van de Wetering, H., van Wijk, J.J.: Botanical visualization of huge hierarchies. In: Proceedings of Symposium on Information Visualization, INFOVIS 2001, pp. 87–87. IEEE (2001). https://doi.org/10.1109/INFVIS.2001.963285

  25. Kleiner, B., Hartigan, J.A.: Representing points in many dimensions by trees and castles. J. Am. Stat. Assoc. 76(374), 260–269 (1981). https://doi.org/10.1080/01621459.1981.10477638

    Article  Google Scholar 

  26. Kohonen, T.: Exploration of very large databases by self-organizing maps. In: Proceedings of International Conference on Neural Networks, ICNN 1997, pp. 1–6. IEEE (1997). https://doi.org/10.1109/ICNN.1997.611622

  27. Kuhn, A., Loretan, P., Nierstrasz, O.: Consistent layout for thematic software maps. In: Proceedings of 15th Working Conference on Reverse Engineering, WCRE 2008, pp. 209–218. IEEE (2008). https://doi.org/10.1109/WCRE.2008.45

  28. Kuhn, A., Erni, D., Loretan, P., Nierstrasz, O.: Software cartography: thematic software visualization with consistent layout. J. Softw. Maintenance Evol. Res. Pract. 22(3), 191–210 (2010)

    Google Scholar 

  29. Lanza, M.: The evolution matrix: recovering software evolution using software visualization techniques. In: Proceedings of 4th International Workshop on Principles of Software Evolution, IWPSE 2001, pp. 37–42. ACM (2001). https://doi.org/10.1145/602461.602467

  30. Lewis, J.P., Rosenholtz, R., Fong, N., Neumann, U.: VisualIDs: automatic distinctive icons for desktop interfaces. Trans. Graph. 23(3), 416–423 (2004). https://doi.org/10.1145/1015706.1015739

    Article  Google Scholar 

  31. Limberger, D., Scheibel, W., Dieken, J., Döllner, J.: Visualization of data changes in 2.5d treemaps using procedural textures and animated transitions. In: Proceedings of 14th International Symposium on Visual Information Communication and Interaction, VINCI 2021, pp. 6:1–5. ACM (2021). https://doi.org/10.1145/3481549.3481570

  32. Limberger, D., Scheibel, W., Döllner, J., Trapp, M.: Advanced visual metaphors and techniques for software maps. In: Proceedings of 12th International Symposium on Visual Information Communication and Interaction, VINCI 2019, pp. 11:1–8. ACM (2019). https://doi.org/10.1145/3356422.3356444

  33. Limberger, D., Trapp, M., Döllner, J.: Depicting uncertainty in 2.5d treemaps. In: Proceedings of 13th International Symposium on Visual Information Communication and Interaction, VINCI 2020, pp. 28:1–2. ACM (2020). https://doi.org/10.1145/3430036.3432753

  34. Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., Baldi, P.: Mining eclipse developer contributions via author-topic models. In: Proceedings of 4th International Workshop on Mining Software Repositories, MSR 2007, pp. 30:1–4. IEEE (2007). https://doi.org/10.1109/MSR.2007.20

  35. Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18(2), 300–336 (2009). https://doi.org/10.1007/s10618-008-0118-x

    Article  Google Scholar 

  36. Malony, A., Hammerslag, D., Jablonowski, D.: Traceview: a trace visualization tool. IEEE Softw. 8(5), 19–28 (1991). https://doi.org/10.1109/52.84213

    Article  Google Scholar 

  37. Markovtsev, V., Kant, E.: Topic modeling of public repositories at scale using names in source code. arXiv CoRR cs.PL (2017). https://arxiv.org/abs/1704.00135

  38. Maskeri, G., Sarkar, S., Heafield, K.: Mining business topics in source code using latent dirichlet allocation. In: Proceedings of 1st India Software Engineering Conference, ISEC 2008, pp. 113–120. ACM (2008). https://doi.org/10.1145/1342211.1342234

  39. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 487–494. AUAI Press (2004). https://doi.org/10.5555/1036843.1036902

  40. Scheibel, W., Limberger, D., Döllner, J.: Survey of treemap layout algorithms. In: Proceedings of 13th International Symposium on Visual Information Communication and Interaction, VINCI 2020, pp. 1:1–9. ACM (2020). https://doi.org/10.1145/3430036.3430041

  41. Scheibel, W., Trapp, M., Limberger, D., Döllner, J.: A taxonomy of treemap visualization techniques. In: Proceedings of 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP, IVAPP 2020, pp. 273–280. INSTICC, SciTePress (2020). https://doi.org/10.5220/0009153902730280

  42. Schreiber, A., Misiak, M.: Visualizing software architectures in virtual reality with an island metaphor. In: Chen, J.Y.C., Fragomeni, G. (eds.) VAMR 2018. LNCS, vol. 10909, pp. 168–182. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91581-4_13

    Chapter  Google Scholar 

  43. Sievert, C., Shirley, K.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70. ACL (2014). https://doi.org/10.3115/v1/W14-3110

  44. Skupin, A.: The world of geography: visualizing a knowledge domain with cartographic means. Proc. Natl. Acad. Sci. 101(suppl 1), 5274–5278 (2004). https://doi.org/10.1073/pnas.0307654100

    Article  Google Scholar 

  45. Steinbrückner, F., Lewerentz, C.: Representing development history in software cities. In: Proceedings of 5th International Symposium on Software Visualization, SOFTVIS 2010, pp. 193–202. ACM (2010). https://doi.org/10.1145/1879211.1879239

  46. Steinbrückner, F., Lewerentz, C.: Understanding software evolution with software cities. Inf. Visual. 12(2), 200–216 (2013). https://doi.org/10.1177/1473871612438785

    Article  Google Scholar 

  47. S̆tĕpánek, A.: Procedurally generated landscape as a visualization of C# code. Technical Report, Masaryk University, Faculty of Informatics (2020). bachelor’s Thesis

    Google Scholar 

  48. Vollmer, J.O., Döllner, J.: 2.5d dust & magnet visualization for large multivariate data. In: Proceedings of 13th International Symposium on Visual Information Communication and Interaction, VINCI 2020, pp. 21:1–8. ACM (2020). https://doi.org/10.1145/3430036.3430045

  49. Wagner, L., Scheibel, W., Limberger, D., Trapp, M., Döllner, J.: A framework for interactive exploration of clusters in massive data using 3d scatter plots and webgl. In: Proceedings of 25th International Conference on 3D Web Technology, Web3D 2020, pp. 31:1–2. ACM (2020). https://doi.org/10.1145/3424616.3424730

  50. Ward, M.O.: A taxonomy of glyph placement strategies for multidimensional data visualization. Inf. Visual. 1(3–4), 194–210 (2002)

    Article  Google Scholar 

  51. Ward, M.O., Grinstein, G., Keim, D.: Interactive Data Visualization: Foundations, Techniques, and Applications. CRC Press, Boca Raton (2010)

    Google Scholar 

  52. Wettel, R., Lanza, M.: Visualizing software systems as cities. In: Proceedings of International Workshop on Visualizing Software for Understanding and Analysis, VISSOFT 2007, pp. 92–99. IEEE (2007). https://doi.org/10.1109/VISSOF.2007.4290706

  53. Wettel, R., Lanza, M.: CodeCity: 3d visualization of large-scale software. In: Companion of the 30th International Conference on Software Engineering, ICSE Companion 2008, pp. 921–922. Association for Computing Machinery (2008). https://doi.org/10.1145/1370175.1370188

  54. Würfel, H., Trapp, M., Limberger, D., Döllner, J.: Natural phenomena as metaphors for visualization of trend data in interactive software maps. In: Proceedings of Conference on Computer Graphics and Visual Computing, CGVC 2015, pp. 69–76. EG (2015). https://doi.org/10.2312/cgvc.20151246

Download references

Acknowledgements

This work is part of the “Software-DNA” project, which is funded by the European Regional Development Fund (ERDF or EFRE in German) and the State of Brandenburg (ILB). This work is part of the KMU project “KnowhowAnalyzer” (Förderkennzeichen 01IS20088B), which is funded by the German Ministry for Education and Research (Bundesministerium für Bildung und Forschung). We further thank the students Maximilian Söchting and Merlin de la Haye for their work during their master’s project at the Hasso Plattner Institute during the summer term 2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Atzberger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Atzberger, D., Cech, T., Scheibel, W., Limberger, D., Döllner, J. (2023). Visualization of Source Code Similarity Using 2.5D Semantic Software Maps. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021. Communications in Computer and Information Science, vol 1691. Springer, Cham. https://doi.org/10.1007/978-3-031-25477-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25477-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25476-5

  • Online ISBN: 978-3-031-25477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics