Abstract
For various program comprehension tasks, software visualization techniques can be beneficial by displaying aspects related to the behavior, structure, or evolution of software. In many cases, the question is related to the semantics of the source code files, e.g., the localization of files that implement specific features or the detection of files with similar semantics. This work presents a general software visualization technique for source code documents, which uses 3D glyphs placed on a two-dimensional reference plane. The relative positions of the glyphs captures their semantic relatedness. Our layout originates from applying Latent Dirichlet Allocation and Multidimensional Scaling on the comments and identifier names found in the source code files. Though different variants for 3D glyphs can be applied, we focus on cylinders, trees, and avatars. We discuss various mappings of data associated with source code documents to the visual variables of 3D glyphs for selected use cases and provide details on our visualization system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Source code taken from https://github.com/cginternals/globjects.
- 2.
Source code taken from https://github.com/notepad-plus-plus/notepad-plus-plus.
- 3.
Source code taken from github.com/bitcoin/bitcoin.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
Source code taken from https://github.com/tensorflow/tensorflow.
References
Aggarwal, C.C., Zhai, C.: Mining text data. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-3223-4
Atzberger, D., et al.: Software forest: a visualization of semantic similarities in source code using a tree metaphor. In: Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3 IVAPP, IVAPP 2021, pp. 112–122. INSTICC, SciTePress (2021). https://doi.org/10.5220/0010267601120122
Atzberger, D., et al.: Visualization of knowledge distribution across development teams using 2.5d semantic software maps. In: Proceedings of 13th International Conference on Information Visualization Theory and Applications, IVAPP 2022, INSTICC, SciTePress (2022)
Atzberger, D., Scheibel, W., Limberger, D., Döllner, J.: Software galaxies: displaying coding activities using a galaxy metaphor. In: Proceedings of 14th International Symposium on Visual Information Communication and Interaction, VINCI 2021, pp. 18:1–2. ACM (2021). https://doi.org/10.1145/3481549.3481573
Beck, F.: Software feathers - figurative visualization of software metrics. In: Proceedings of 5th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, IVAPP 2014, pp. 5–16. INSTICC, SciTePress (2014). https://doi.org/10.5220/0004650100050016
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.5555/944919.944937
Chen, T.-H., Thomas, S.W., Hassan, A.E.: A survey on the use of topic models when mining software repositories. Empirical Softw. Eng. 21(5), 1843–1919 (2015). https://doi.org/10.1007/s10664-015-9402-8
Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. J. Am. Stat. Assoc. 68(342), 361–368 (1973). https://doi.org/10.1080/01621459.1973.10482434
Chuah, M., Eick, S.: Glyphs for software visualization. In: Proceedings of 5th International Workshop on Program Comprehension, IWPC 1997, pp. 183–191. IEEE (1997). https://doi.org/10.1109/WPC.1997.601291
Cornelissen, B., Zaidman, A., van Deursen, A.: A controlled experiment for program comprehension through trace visualization. IEEE Trans. Softw. Eng. 37(3), 341–355 (2011). https://doi.org/10.1109/TSE.2010.47
Cornelissen, B., Zaidman, A., Holten, D., Moonen, L., van Deursen, A., van Wijk, J.J.: Execution trace analysis through massive sequence and circular bundle views. J. Syst. Softw. 81(12), 2252–2268 (2008). https://doi.org/10.1016/j.jss.2008.02.068
Cox, M.A.A., Cox, T.F.: Multidimensional scaling. In: Handbook of Data Visualization, pp. 315–347. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-33037-0_14
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Dehaghani, S.M.H., Hajrahimi, N.: Which factors affect software projects maintenance cost more? Acta Informatica Medica 21(1), 63–66 (2013). https://doi.org/10.5455/aim.2012.21.63-66
Dübel, S., Röhlig, M., Schumann, H., Trapp, M.: 2d and 3d presentation of spatial data: a systematic review. In: Proceedings of VIS International Workshop on 3DVis, 3DVis ’14, pp. 11–18. IEEE (2014). https://doi.org/10.1109/3DVis.2014.7160094
Erra, U., Scanniello, G.: Towards the visualization of software systems as 3d forests: the CodeTrees environment. In: Proceedings of 27th Annual ACM Symposium on Applied Computing, SAC 2012, pp. 981–988. ACM (2012). https://doi.org/10.1145/2245276.2245467
Erra, U., Scanniello, G., Capece, N.: Visualizing the evolution of software systems using the forest metaphor. In: Proceedings of 16th International Conference on Information Visualisation, iV 2012, pp. 87–92 (2012). https://doi.org/10.1109/IV.2012.25
Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S.T., Telea, A.C.: Toward a quantitative survey of dimension reduction techniques. Trans. Vis. Comput. Graph. 27(3), 2153–2173 (2021). https://doi.org/10.1109/TVCG.2019.2944182
Fernandez, I., Bergel, A., Alcocer, J.P.S., Infante, A., Gîrba, T.: Glyph-based software component identification. In: Proceedings of 24th International Conference on Program Comprehension, ICPC 2016, pp. 1–10. IEEE (2016). https://doi.org/10.1109/ICPC.2016.7503713
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004). https://doi.org/10.1073/pnas.0307752101
Hawes, N., Marshall, S., Anslow, C.: CodeSurveyor: Mapping large-scale software to aid in code comprehension. In: Proceedings of 3rd Working Conference on Software Visualization, VISSOFT 2015, pp. 96–105. IEEE (2015). https://doi.org/10.1109/VISSOFT.2015.7332419
Hoffman, M., Bach, F., Blei, D.: Online learning for latent dirichlet allocation. In: Advances in Neural Information Processing Systems. NIPS 2010, vol. 23, pp. 856–864 (2010)
Systems and software engineering-Vocabulary: Standard. International Organization for Standardization (2017). https://doi.org/10.1109/IEEESTD.2017.8016712
Kleiberg, E., van de Wetering, H., van Wijk, J.J.: Botanical visualization of huge hierarchies. In: Proceedings of Symposium on Information Visualization, INFOVIS 2001, pp. 87–87. IEEE (2001). https://doi.org/10.1109/INFVIS.2001.963285
Kleiner, B., Hartigan, J.A.: Representing points in many dimensions by trees and castles. J. Am. Stat. Assoc. 76(374), 260–269 (1981). https://doi.org/10.1080/01621459.1981.10477638
Kohonen, T.: Exploration of very large databases by self-organizing maps. In: Proceedings of International Conference on Neural Networks, ICNN 1997, pp. 1–6. IEEE (1997). https://doi.org/10.1109/ICNN.1997.611622
Kuhn, A., Loretan, P., Nierstrasz, O.: Consistent layout for thematic software maps. In: Proceedings of 15th Working Conference on Reverse Engineering, WCRE 2008, pp. 209–218. IEEE (2008). https://doi.org/10.1109/WCRE.2008.45
Kuhn, A., Erni, D., Loretan, P., Nierstrasz, O.: Software cartography: thematic software visualization with consistent layout. J. Softw. Maintenance Evol. Res. Pract. 22(3), 191–210 (2010)
Lanza, M.: The evolution matrix: recovering software evolution using software visualization techniques. In: Proceedings of 4th International Workshop on Principles of Software Evolution, IWPSE 2001, pp. 37–42. ACM (2001). https://doi.org/10.1145/602461.602467
Lewis, J.P., Rosenholtz, R., Fong, N., Neumann, U.: VisualIDs: automatic distinctive icons for desktop interfaces. Trans. Graph. 23(3), 416–423 (2004). https://doi.org/10.1145/1015706.1015739
Limberger, D., Scheibel, W., Dieken, J., Döllner, J.: Visualization of data changes in 2.5d treemaps using procedural textures and animated transitions. In: Proceedings of 14th International Symposium on Visual Information Communication and Interaction, VINCI 2021, pp. 6:1–5. ACM (2021). https://doi.org/10.1145/3481549.3481570
Limberger, D., Scheibel, W., Döllner, J., Trapp, M.: Advanced visual metaphors and techniques for software maps. In: Proceedings of 12th International Symposium on Visual Information Communication and Interaction, VINCI 2019, pp. 11:1–8. ACM (2019). https://doi.org/10.1145/3356422.3356444
Limberger, D., Trapp, M., Döllner, J.: Depicting uncertainty in 2.5d treemaps. In: Proceedings of 13th International Symposium on Visual Information Communication and Interaction, VINCI 2020, pp. 28:1–2. ACM (2020). https://doi.org/10.1145/3430036.3432753
Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., Baldi, P.: Mining eclipse developer contributions via author-topic models. In: Proceedings of 4th International Workshop on Mining Software Repositories, MSR 2007, pp. 30:1–4. IEEE (2007). https://doi.org/10.1109/MSR.2007.20
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18(2), 300–336 (2009). https://doi.org/10.1007/s10618-008-0118-x
Malony, A., Hammerslag, D., Jablonowski, D.: Traceview: a trace visualization tool. IEEE Softw. 8(5), 19–28 (1991). https://doi.org/10.1109/52.84213
Markovtsev, V., Kant, E.: Topic modeling of public repositories at scale using names in source code. arXiv CoRR cs.PL (2017). https://arxiv.org/abs/1704.00135
Maskeri, G., Sarkar, S., Heafield, K.: Mining business topics in source code using latent dirichlet allocation. In: Proceedings of 1st India Software Engineering Conference, ISEC 2008, pp. 113–120. ACM (2008). https://doi.org/10.1145/1342211.1342234
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 487–494. AUAI Press (2004). https://doi.org/10.5555/1036843.1036902
Scheibel, W., Limberger, D., Döllner, J.: Survey of treemap layout algorithms. In: Proceedings of 13th International Symposium on Visual Information Communication and Interaction, VINCI 2020, pp. 1:1–9. ACM (2020). https://doi.org/10.1145/3430036.3430041
Scheibel, W., Trapp, M., Limberger, D., Döllner, J.: A taxonomy of treemap visualization techniques. In: Proceedings of 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP, IVAPP 2020, pp. 273–280. INSTICC, SciTePress (2020). https://doi.org/10.5220/0009153902730280
Schreiber, A., Misiak, M.: Visualizing software architectures in virtual reality with an island metaphor. In: Chen, J.Y.C., Fragomeni, G. (eds.) VAMR 2018. LNCS, vol. 10909, pp. 168–182. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91581-4_13
Sievert, C., Shirley, K.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70. ACL (2014). https://doi.org/10.3115/v1/W14-3110
Skupin, A.: The world of geography: visualizing a knowledge domain with cartographic means. Proc. Natl. Acad. Sci. 101(suppl 1), 5274–5278 (2004). https://doi.org/10.1073/pnas.0307654100
Steinbrückner, F., Lewerentz, C.: Representing development history in software cities. In: Proceedings of 5th International Symposium on Software Visualization, SOFTVIS 2010, pp. 193–202. ACM (2010). https://doi.org/10.1145/1879211.1879239
Steinbrückner, F., Lewerentz, C.: Understanding software evolution with software cities. Inf. Visual. 12(2), 200–216 (2013). https://doi.org/10.1177/1473871612438785
S̆tĕpánek, A.: Procedurally generated landscape as a visualization of C# code. Technical Report, Masaryk University, Faculty of Informatics (2020). bachelor’s Thesis
Vollmer, J.O., Döllner, J.: 2.5d dust & magnet visualization for large multivariate data. In: Proceedings of 13th International Symposium on Visual Information Communication and Interaction, VINCI 2020, pp. 21:1–8. ACM (2020). https://doi.org/10.1145/3430036.3430045
Wagner, L., Scheibel, W., Limberger, D., Trapp, M., Döllner, J.: A framework for interactive exploration of clusters in massive data using 3d scatter plots and webgl. In: Proceedings of 25th International Conference on 3D Web Technology, Web3D 2020, pp. 31:1–2. ACM (2020). https://doi.org/10.1145/3424616.3424730
Ward, M.O.: A taxonomy of glyph placement strategies for multidimensional data visualization. Inf. Visual. 1(3–4), 194–210 (2002)
Ward, M.O., Grinstein, G., Keim, D.: Interactive Data Visualization: Foundations, Techniques, and Applications. CRC Press, Boca Raton (2010)
Wettel, R., Lanza, M.: Visualizing software systems as cities. In: Proceedings of International Workshop on Visualizing Software for Understanding and Analysis, VISSOFT 2007, pp. 92–99. IEEE (2007). https://doi.org/10.1109/VISSOF.2007.4290706
Wettel, R., Lanza, M.: CodeCity: 3d visualization of large-scale software. In: Companion of the 30th International Conference on Software Engineering, ICSE Companion 2008, pp. 921–922. Association for Computing Machinery (2008). https://doi.org/10.1145/1370175.1370188
Würfel, H., Trapp, M., Limberger, D., Döllner, J.: Natural phenomena as metaphors for visualization of trend data in interactive software maps. In: Proceedings of Conference on Computer Graphics and Visual Computing, CGVC 2015, pp. 69–76. EG (2015). https://doi.org/10.2312/cgvc.20151246
Acknowledgements
This work is part of the “Software-DNA” project, which is funded by the European Regional Development Fund (ERDF or EFRE in German) and the State of Brandenburg (ILB). This work is part of the KMU project “KnowhowAnalyzer” (Förderkennzeichen 01IS20088B), which is funded by the German Ministry for Education and Research (Bundesministerium für Bildung und Forschung). We further thank the students Maximilian Söchting and Merlin de la Haye for their work during their master’s project at the Hasso Plattner Institute during the summer term 2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Atzberger, D., Cech, T., Scheibel, W., Limberger, D., Döllner, J. (2023). Visualization of Source Code Similarity Using 2.5D Semantic Software Maps. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021. Communications in Computer and Information Science, vol 1691. Springer, Cham. https://doi.org/10.1007/978-3-031-25477-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-25477-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25476-5
Online ISBN: 978-3-031-25477-2
eBook Packages: Computer ScienceComputer Science (R0)