Abstract
A common misstep in the development of security and privacy solutions is the failure to keep the demands resulting from high-level policies in line with the actual implementation that is supposed to operationalize those policies. This is especially problematic in the domain of social networks, where software typically predates policies and then evolves alongside its user base and any changes in policies that arise from their interactions with (and the demands that they place on) the system. Our contribution targets this specific problem, drawing together the assurances actually presented to users in the form of policies and the large codebases with which developers work. We demonstrate that a mapping between policies and code can be inferred from the semantics of the natural language. These semantics manifest not only in the policy statements but also coding conventions. Our technique, implemented in a tool (CASTOR), can infer semantic mappings with F1 accuracy of 70 % and 78 % for two social networks, Diaspora and Friendica respectively – as compared with a ground truth mapping established through manual examination of the policies and code.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
See example policies at http://www.paulineanthonysamy.com/myData.html.
References
Naive bayes. http://www.nltk.org/_modules/nltk/classify/naivebayes.html
Code contracts (2010). http://research.microsoft.com/en-us/projects/contracts/
EU data directive 95/46/ec, February 2014. http://eur-lex.europa.eu/
Facebook photo leak flaw raises security concerns, March 2015. http://www.computerweekly.com/news/2240242708/Facebook-photo-leak-flaw-raises-security-concerns
Anthonysamy, P.: A framework to detect information asymmetries between privacy policies and controls of OSNs. Ph.D. thesis, Lancaster University (2014)
Anthonysamy, P., Greenwood, P., Rashid, A.: Social networking privacy: understanding the disconnect from policy to controls. IEEE Computer, June 2013
Anthonysamy, P., Greenwood, P., Rashid, A.: A method for analysing traceability between privacy policies and privacy controls of online social networks. In: Preneel, B., Ikonomou, D. (eds.) APF 2012. LNCS, vol. 8319, pp. 187–202. Springer, Heidelberg (2014)
Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Tracing object-oriented code into functional requirements. In: 8th International Workshop on Program Comprehension, 2000, Proceedings IWPC 2000, pp. 79–86 (2000)
Antoniol, G., Canfora, G., de Lucia, A., Casazza, G.: Information retrieval models for recovering traceability links between code and documentation. In: Proceedings of the International Conference on Software Maintenance (ICSM 2000). IEEE Computer Society, Washington, DC (2000)
Ashley, P., Hada, S., Karjoth, G., Powers, C., Schunter, M.: Enterprise Privacy Authorization Language (EPAL). Technical report, Rschlikon (2003)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). http://dx.doi.org/10.1023/A%3A1010933404324
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Cleland-Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A ML approach for tracing regulatory codes to product specific requirements. In: ICSE (2010)
Cranor, L., Langheinrich, M., Marchiori, M.: A P3P preference exchange language 1.0 (appel 1.0). World Wide Web Consortium, Working Draft WD-P3P-preferences-20020415, April 2002
Fisler, K., Krishnamurthi, S., Meyerovich, L.A., Tschantz, M.C.: Verification and change-impact analysis of access-control policies. In: Proceedings of the 27th International Conference on Software Engineering, ICSE 2005, pp. 196–205. ACM, New York (2005)
Haiduc, S., Bavota, G., Oliveto, R., De Lucia, A., Marcus, A.: Automatic query performance assessment during the retrieval of software artifacts. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, pp. 90–99. ACM, New York (2012)
Jang, D., Jhala, R., Lerner, S., Shacham, H.: An empirical study of privacy-violating information flows in javascript web applications. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 270–283. ACM, New York (2010)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL 2003) - vol. 1. pp. 423–430, Stroudsburg, PA, USA (2003)
Ma, L., Torney, R., Watters, P., Brown, S.: Automatically generating classifier for phishing email prediction. In: 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp. 779–783, December 2009
Massey, A., Otto, P., Hayward, L., Antn, A.: Evaluating existing security and privacy requirements for legal compliance. Requirements Engineering (2010)
May, M.J., Gunter, C.A., Lee, I.: Privacy APIs: access control techniques to analyze and verify legal privacy policies. In: Proceedings of the 19th IEEE Workshop on Computer Security Foundations, CSFW 2006, pp. 85–97. IEEE Computer Society, Washington, DC (2006)
Meyer, B.: Object-Oriented Software Construction, 1st edn. Prentice-Hall Inc, Upper Saddle River (1988)
Pandita, R., Xiao, X., Zhong, H., Xie, T., Oney, S., Paradkar, A.: Inferring method specifications from natural language api descriptions. In: Proceedings of the 34th International Conference on Software Engineering, ICSE 2012 (2012)
Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., Lorensen, W.E., et al.: Object-Oriented Modeling and Design, vol. 199. Prentice Hall, Upper Saddle River (1991)
Wagner, D.: Static analysis and computer security: new techniques for software assurance. Ph.D. thesis, University of California at Berkeley, December 2000
Acknowledgements
This research was funded by Lancaster University 40th Anniversary Research Studentship and has no ties to the first author’s current employment at Google.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Implementation: CASTOR
We have implemented our technique in a tool called CASTOR. Figure 4 illustrates the architecture of CASTOR. CASTOR accepts as inputs policy statements and source code; and outputs a set of semantic mappings between policy statements and functions. Briefly, CASTOR works on the input as follows:
Policy Engine: CASTOR’s policy engine is composed of a parser and a statement analyser which transforms the natural language policy into an intermediate representation (as described in Sect. 3.2). This intermediate representation maintains the relevant policy primitives of a statement, namely action (verbs) and data (nouns).
Code Engine: CASTOR’s code engine is composed of a minimal recursive-descent parser that extracts a function’s name, associated class and parameters, along with information identifying the source file and line number where the function can be found. This is inline with our source model construction in Sect. 3.3.
Mapping Engine: CASTOR’s mapping engine infers the mapping between the privacy policy \(\mathcal {PP}\) and source code functions \(\mathcal {F}\) using its inbuilt WordNet corpora and classifier. The output of this engine is a set of semantic mappings between policy statement(s) and functions.
B Formulae
Recall (TPR) \(= \frac{tp}{tp + fn}\); False-Positive Rate (FPR) \(= \frac{fp}{fp + tn}\); Precision (PPV) \(= \frac{tp}{tp + fp}\); and F1 \(= 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}\).
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Anthonysamy, P., Edwards, M., Weichel, C., Rashid, A. (2016). Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language. In: Caballero, J., Bodden, E., Athanasopoulos, E. (eds) Engineering Secure Software and Systems. ESSoS 2016. Lecture Notes in Computer Science(), vol 9639. Springer, Cham. https://doi.org/10.1007/978-3-319-30806-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-30806-7_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30805-0
Online ISBN: 978-3-319-30806-7
eBook Packages: Computer ScienceComputer Science (R0)