Abstract
In the last decade, privacy has gained a significant interest in software and information systems engineering mainly due to the emergence of privacy regulations, including the General Data Protection Regulation (GDPR). However, checking privacy compliance is challenging and depends on many factors, such as the programming language and the software architecture, as well as the underlying regulation. In this exploratory research, we aim to study whether positive discussions on privacy-related issues in Open-Source Software (OSS) environments can predict privacy compliance of the software. Such predictions are beneficial in different scenarios, including in software reuse. Our main contribution will lie in conceptually modeling and understanding the relations between privacy compliance and positive discussions of privacy-related OSS issues. The research comprises three parts: (1) identifying privacy-related issues using supervised machine learning techniques; (2) improving the identification of privacy-related issues utilizing ontologies; and (3) identifying the sentiment of privacy-related issues and analyzing relations to privacy compliance. This paper describes the design and results of part 1, as well as the design of parts 2 and 3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The dataset is available at https://zenodo.org/record/8351237.
References
Hennig, A., Schulte, L., Mayer, P.: Understanding issues related to personal data and data protection in open source projects on GitHub. In: Proceedings of International Conference on Mining Software Repositories (MSR 2023) (2023)
Khalajzadeh, H., Shahin, M., Obie, H.O., Grundy, J.: How are diverse end-user human-centric issues discussed on GitHub? In: Association for Computing Machinery (2022)
Gharib, M., Giorgini, P., Mylopoulos, J.: Towards an ontology for privacy requirements via a systematic literature review. In: Mayr, H., Guizzardi, G., Ma, H., Pastor, O. (eds.) Conceptual Modeling, ER 2017, vol. 10650, pp. 193–208. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69904-2_16
Tom, J., Sing, E., Matulevičius, R.: Conceptual representation of the GDPR: model and application directions. In: Zdravkovic, J., Grabis, J., Nurcan, S., Stirna, J. (eds.) Perspectives in Business Informatics Research, BIR 2018, vol. 330, pp. 18–28. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99951-7_2
Torre, D., Alferez, M., Soltana, G., Sabetzadeh, M., Briand, L.: Modeling data protection and privacy: application and experience with GDPR. Softw. Syst. Model. 20(6), 2071–2087 (2021). https://doi.org/10.1007/s10270-021-00935-5
Sangaroonsilp, P., Dam, H.K., Choetkiertikul, M., Ragkhitwetsagul, C., Ghose, A.: A taxonomy for mining and classifying privacy requirements in issue reports. Inf. Softw. Technol. 157, 107162 (2023). https://doi.org/10.1016/j.infsof.2023.107162
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020). https://doi.org/10.1016/j.ins.2019.09.013
Jayalakshmi, T., Santhakumaran, A.: Statistical normalization and back propagation for classification. Int. J. Comput. Theory Eng. 3(1), 1–6 (2011). https://doi.org/10.7763/IJCTE.2011.V3.288
Quinlan, J.R.: Simplifying decision trees. Int. J. Hum. Comput. Stud. 27, 221–234 (1987). https://doi.org/10.1006/ijhc.1987.0321
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jegou, H., Mikolov, T.: FASTTEXT.ZIP: compressing text classification models. In: ICLR 2017, pp. 1–13 (2017)
Ding, J., Sun, H., Wang, X., Liu, X.: Entity-level sentiment analysis of issue comments. In: IEEE/ACM 3rd International Workshop on Emotion Awareness in Software Engineering, SEmotion 2018, pp. 7–13 (2018). https://doi.org/10.1145/3194932.3194935
Hoepman, J.-H.: Privacy design strategies. In: Cuppens-Boulahia, N., Cuppens, F., Jajodia, S., Abou El Kalam, A., Sans, T. (eds.) SEC 2014. IAICT, vol. 428, pp. 446–459. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55415-5_38
Farhadi, M., Haddad, H., Shahriar, H.: Compliance checking of open source EHR applications for HIPAA and ONC security and privacy requirements. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), pp. 704–713 (2019). https://doi.org/10.1109/COMPSAC.2019.00106
Farhadi, M., Pierre, G., Miorandi, D.: Towards automated privacy compliance checking of applications in cloud and fog environments. In: 2021 8th International Conference on Future Internet of Things and Cloud, pp. 11–18 (2021). https://doi.org/10.1109/FiCloud49777.2021.00010
Malik, S, Jain, S.: Semantic ontology-based approach to enhance text classification. In: ISIC 2021 (2021)
Sanchez-pi, N., Martí, L., Cristina, A., Garcia, B.: Improving ontology-based text classification : an occupational health and security application. J. Appl. Log. 17, 48–58 (2016). https://doi.org/10.1016/j.jal.2015.09.008
Allahyari, M., Kochut, K.J., Janik, M.: Ontology-based text classification into dynamically defined topics. In: 2014 IEEE International Conference on Semantic Computing (2014)
Murgia, A., Adams, B.: Do developers feel emotions ? an exploratory analysis of emotions in software artifacts. In: MSR 2014, pp. 262–271 (2014). https://doi.org/10.1145/2597073.2597086
Junior, R.S.C., Carneiro, G.D.F.: Impact of developers sentiments on practices and artifacts in open source software projects : a systematic literature review. In: Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020), vol. 2, pp. 978–989. (2020). https://doi.org/10.5220/0009313200310042
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Guber, J., Reinhartz-Berger, I., Litvak, M. (2023). Empirical Exploration of Open-Source Issues for Predicting Privacy Compliance. In: Sales, T.P., Araújo, J., Borbinha, J., Guizzardi, G. (eds) Advances in Conceptual Modeling. ER 2023. Lecture Notes in Computer Science, vol 14319. Springer, Cham. https://doi.org/10.1007/978-3-031-47112-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-47112-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47111-7
Online ISBN: 978-3-031-47112-4
eBook Packages: Computer ScienceComputer Science (R0)