Abstract
[Context and Motivation] Security and Privacy (SP) compliance is an important aspect of running businesses successfully. Compliance with SP requirements by Software Engineering (SE) vendors, both in terms of the systems they implement and the practices they follow while implementing, gives customers an assurance that their data is accessed, stored, and processed securely. Failure to comply on the other hand, can entail heavy fines, lawsuits, and may even lead to loss of business through prohibition of those software in corresponding jurisdictions. SE contracts are known to be a useful source for deriving software requirements. [Question/problem] Mining any kind of information from contracts is a dauting task given that contracts are large and complex documents employing Legalese. [Principal ideas/results] We employ an exploratory study to come up with a model for a governance-focused classification of the SP requirements present in SE contracts for governance. Next, we report experiments conducted with Recurrent Neural Networks and Transformer-based models to automate this classification. Experiments conducted on 960 SE contracts received from a large vendor organization indicate that T5 performs best for both SP identification and classification tasks. With T5, we obtained an average F1 score of 0.90 each for identification of SP requirements. For the governance-focused classification, we obtained an average F1 score of 0.81 for the Security class and 0.80 for the Privacy class. [Contribution] Through an exploratory study, we present a model for a governance-focused classification of the SP requirements present in SE contracts. We further automate the extraction and the governance-focused classification of SP requirements by conducting experiments using 960 real-life SE contracts received from a large vendor organization.
References
https://www.financierworldwide.com/data-privacy-and-cyber-security-the-importance-of-a-proactive-approach. Accessed 02 Nov 2023
IBM report: https://www.ibm.com/security/data-breach. Accessed 02 Nov 2023
Whatsapp case: https://www.bloomberg.com/news/articles/2021-09-02/whatsapp-fined-266-million-over-data-transparency-violations. Accessed 05 Nov 2023
https://www.bbc.com/news/technology-54722362. Accessed 02 Nov 2023
Casillo, F., Deufemia, V., Gravino, C.: Detecting privacy requirements from user stories with NLP transfer learning models. Inf. Softw. Technol., 106853 (2022)
Sainani, A., Anish, P.R., Joshi, V., Ghaisas, S.: Extracting and classifying requirements from software engineering contracts. In: 2020 IEEE 28th International Requirements Engineering Conference (RE) (pp. 147–157). IEEE (2020)
https://www.infosysbpm.com/offerings/functions/legal-process-outsourcing/white-papers/Documents/contract-process-helping-hurting.pdf. Accessed 02 Nov 2023
Contract Governance: https://www.linkedin.com/pulse/simple-keys-contract-governance-kelly-smith/. Accessed 02 Nov 2023
Devlin, J., Chang, M., Lee, K.: BERT: pre-training of deep bidirectional trans-formers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, pp. 4171–4186 (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
GPT-3: Models - OpenAI API. Accessed 02 Nov 2023
Weber-Jahnke, J., Onabajo, A.: Mining and analysing security goal models in health information systems. In: Workshop on Software Engineering in Health Care, pp. 42–52. IEEE Computer Society (2009)
Jindal, R., Malhotra, R., Jain, A.: Automated classification of security requirements. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2027–2033 (2016)
Xiao, X., Paradkar, A., Thummalapenta, S., Xie, T.: Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), pp. 12:1–12:11 (2012)
Breaux, T., Anton, A.: Analyzing regulatory rules for privacy and security requirements. IEEE Trans. Softw. Eng. 34(1), 5–20 (2008)
Islam, S., Mouratidis, H., Wagner, S.: Towards a framework to elicit and manage security and privacy requirements from laws and regulations. In: Wieringa, R., Persson, A. (eds.) REFSQ 2010. LNCS, vol. 6182, pp. 255–261. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14192-8_23
Janpitak, N., Sathitwiriyawong, C.: Information security requirement extraction from regulatory documents using GATE/ANNIC. In: 7th International Electrical Engineering Congress (iEECON) (2019)
Munaiah, N., Meneely, A., Murukannaiah, P.K.: A domain dependent model for identifying security requirements. In: Proceedings of the IEEE 25th International Requirements Engineering Conference (RE), Lisbon, pp. 506–511 (2017)
Farkhani, T.R., Razzazi, M.R.: Examination and classification of security requirements of software systems. Inf. Commun. Technol. 2, 2778–2783 (2006)
Jain, C., Anish, P.R., Ghaisas, S.: Automated identification of security and privacy requirements from software engineering contracts. In: 2023 IEEE 31st International Requirements Engineering Conference Workshops (REW) (pp. 234–238) (2023)
Hoda, R.: Socio-Technical grounded theory for software engineering. IEEE Trans. Softw. Eng. (2021). https://doi.org/10.1109/TSE.2021.3106280
Nunes, J.M.B., Martins, J.T., Zhou, L., Alajamy, M., Al-Mamari, S.: Contextual sensitivity in grounded theory: The role of pilot studies. Electr. J. Bus. Res. Methods 8(2), 73–84 (2010)
Glaser, B., Strauss, A.: The Discovery of Grounded Theory. Aldine, Chicago (1967)
Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint cs/0205028 (2002)
CUAD dataset. https://www.atticusprojectai.org/cuad. Accessed 02 Nov 2023
Simonson, D., Broderick, D., Herr, J.: The extent of repetition in contract language. In: Proceedings of the Natural Legal Language Processing Workshop 2019 (pp. 21–30) (2019)
Ratner, A.J., De Sa, C.M., Wu, S., Selsam, D., Ré, C.: Data programming: creating large training sets, quickly. In: Advances in Neural Information Processing Systems 3567–3575 (2016)
Amini, M.-R., Feofanov, V., Pauletto, L., Devijver, E., Maximov, Y.: Self-training: a survey (2022)
Sharifi, S., Parvizimosaed, A., Amyot, D., Logrippo, L., Mylopoulos, J.: Symboleo: towards a specification language for legal contracts. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), Zurich, Switzerland, pp. 364–369 (2020). https://doi.org/10.1109/RE48521.2020.00049
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anish, P.R., Verma, A., Venkatesan, S., V., L., Ghaisas, S. (2024). Governance-Focused Classification of Security and Privacy Requirements from Obligations in Software Engineering Contracts. In: Mendez, D., Moreira, A. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2024. Lecture Notes in Computer Science, vol 14588. Springer, Cham. https://doi.org/10.1007/978-3-031-57327-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-57327-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57326-2
Online ISBN: 978-3-031-57327-9
eBook Packages: Computer ScienceComputer Science (R0)