Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language

Anthonysamy, Pauline; Edwards, Matthew; Weichel, Chris; Rashid, Awais

doi:10.1007/978-3-319-30806-7_15

Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language

Pauline Anthonysamy^16,17,
Matthew Edwards¹⁷,
Chris Weichel¹⁷ &
…
Awais Rashid¹⁷

Conference paper

1105 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9639))

Abstract

A common misstep in the development of security and privacy solutions is the failure to keep the demands resulting from high-level policies in line with the actual implementation that is supposed to operationalize those policies. This is especially problematic in the domain of social networks, where software typically predates policies and then evolves alongside its user base and any changes in policies that arise from their interactions with (and the demands that they place on) the system. Our contribution targets this specific problem, drawing together the assurances actually presented to users in the form of policies and the large codebases with which developers work. We demonstrate that a mapping between policies and code can be inferred from the semantics of the natural language. These semantics manifest not only in the policy statements but also coding conventions. Our technique, implemented in a tool (CASTOR), can infer semantic mappings with F1 accuracy of 70 % and 78 % for two social networks, Diaspora and Friendica respectively – as compared with a ground truth mapping established through manual examination of the policies and code.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Naive bayes. http://www.nltk.org/_modules/nltk/classify/naivebayes.html
SVM. http://www.nltk.org/_modules/nltk/classify/svm.html
Code contracts (2010). http://research.microsoft.com/en-us/projects/contracts/
EU data directive 95/46/ec, February 2014. http://eur-lex.europa.eu/
Facebook photo leak flaw raises security concerns, March 2015. http://www.computerweekly.com/news/2240242708/Facebook-photo-leak-flaw-raises-security-concerns
Anthonysamy, P.: A framework to detect information asymmetries between privacy policies and controls of OSNs. Ph.D. thesis, Lancaster University (2014)
Google Scholar
Anthonysamy, P., Greenwood, P., Rashid, A.: Social networking privacy: understanding the disconnect from policy to controls. IEEE Computer, June 2013
Google Scholar
Anthonysamy, P., Greenwood, P., Rashid, A.: A method for analysing traceability between privacy policies and privacy controls of online social networks. In: Preneel, B., Ikonomou, D. (eds.) APF 2012. LNCS, vol. 8319, pp. 187–202. Springer, Heidelberg (2014)
Chapter Google Scholar
Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Tracing object-oriented code into functional requirements. In: 8th International Workshop on Program Comprehension, 2000, Proceedings IWPC 2000, pp. 79–86 (2000)
Google Scholar
Antoniol, G., Canfora, G., de Lucia, A., Casazza, G.: Information retrieval models for recovering traceability links between code and documentation. In: Proceedings of the International Conference on Software Maintenance (ICSM 2000). IEEE Computer Society, Washington, DC (2000)
Google Scholar
Ashley, P., Hada, S., Karjoth, G., Powers, C., Schunter, M.: Enterprise Privacy Authorization Language (EPAL). Technical report, Rschlikon (2003)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). http://dx.doi.org/10.1023/A%3A1010933404324
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Cleland-Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A ML approach for tracing regulatory codes to product specific requirements. In: ICSE (2010)
Google Scholar
Cranor, L., Langheinrich, M., Marchiori, M.: A P3P preference exchange language 1.0 (appel 1.0). World Wide Web Consortium, Working Draft WD-P3P-preferences-20020415, April 2002
Google Scholar
Fisler, K., Krishnamurthi, S., Meyerovich, L.A., Tschantz, M.C.: Verification and change-impact analysis of access-control policies. In: Proceedings of the 27th International Conference on Software Engineering, ICSE 2005, pp. 196–205. ACM, New York (2005)
Google Scholar
Haiduc, S., Bavota, G., Oliveto, R., De Lucia, A., Marcus, A.: Automatic query performance assessment during the retrieval of software artifacts. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, pp. 90–99. ACM, New York (2012)
Google Scholar
Jang, D., Jhala, R., Lerner, S., Shacham, H.: An empirical study of privacy-violating information flows in javascript web applications. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 270–283. ACM, New York (2010)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL 2003) - vol. 1. pp. 423–430, Stroudsburg, PA, USA (2003)
Google Scholar
Ma, L., Torney, R., Watters, P., Brown, S.: Automatically generating classifier for phishing email prediction. In: 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp. 779–783, December 2009
Google Scholar
Massey, A., Otto, P., Hayward, L., Antn, A.: Evaluating existing security and privacy requirements for legal compliance. Requirements Engineering (2010)
Google Scholar
May, M.J., Gunter, C.A., Lee, I.: Privacy APIs: access control techniques to analyze and verify legal privacy policies. In: Proceedings of the 19th IEEE Workshop on Computer Security Foundations, CSFW 2006, pp. 85–97. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Meyer, B.: Object-Oriented Software Construction, 1st edn. Prentice-Hall Inc, Upper Saddle River (1988)
MATH Google Scholar
Pandita, R., Xiao, X., Zhong, H., Xie, T., Oney, S., Paradkar, A.: Inferring method specifications from natural language api descriptions. In: Proceedings of the 34th International Conference on Software Engineering, ICSE 2012 (2012)
Google Scholar
Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., Lorensen, W.E., et al.: Object-Oriented Modeling and Design, vol. 199. Prentice Hall, Upper Saddle River (1991)
MATH Google Scholar
Wagner, D.: Static analysis and computer security: new techniques for software assurance. Ph.D. thesis, University of California at Berkeley, December 2000
Google Scholar

Download references

Acknowledgements

This research was funded by Lancaster University 40th Anniversary Research Studentship and has no ties to the first author’s current employment at Google.

Author information

Authors and Affiliations

Google Switzerland, Zürich, Switzerland
Pauline Anthonysamy
Security Lancaster, Lancaster University, Lancaster, UK
Pauline Anthonysamy, Matthew Edwards, Chris Weichel & Awais Rashid

Authors

Pauline Anthonysamy
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Chris Weichel
View author publications
You can also search for this author in PubMed Google Scholar
Awais Rashid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pauline Anthonysamy .

Editor information

Editors and Affiliations

IMDEA Software Institute, Madrid, Spain
Juan Caballero
Paderborn University & Fraunhofer IEM, Paderborn, Germany
Eric Bodden
VU University, Amsterdam, The Netherlands
Elias Athanasopoulos

Appendices

A Implementation: CASTOR

We have implemented our technique in a tool called CASTOR. Figure 4 illustrates the architecture of CASTOR. CASTOR accepts as inputs policy statements and source code; and outputs a set of semantic mappings between policy statements and functions. Briefly, CASTOR works on the input as follows:

Policy Engine: CASTOR’s policy engine is composed of a parser and a statement analyser which transforms the natural language policy into an intermediate representation (as described in Sect. 3.2). This intermediate representation maintains the relevant policy primitives of a statement, namely action (verbs) and data (nouns).

Code Engine: CASTOR’s code engine is composed of a minimal recursive-descent parser that extracts a function’s name, associated class and parameters, along with information identifying the source file and line number where the function can be found. This is inline with our source model construction in Sect. 3.3.

Mapping Engine: CASTOR’s mapping engine infers the mapping between the privacy policy \(\mathcal {PP}\) and source code functions \(\mathcal {F}\) using its inbuilt WordNet corpora and classifier. The output of this engine is a set of semantic mappings between policy statement(s) and functions.

B Formulae

Recall (TPR) \(= \frac{tp}{tp + fn}\); False-Positive Rate (FPR) \(= \frac{fp}{fp + tn}\); Precision (PPV) \(= \frac{tp}{tp + fp}\); and F1 \(= 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}\).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anthonysamy, P., Edwards, M., Weichel, C., Rashid, A. (2016). Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language. In: Caballero, J., Bodden, E., Athanasopoulos, E. (eds) Engineering Secure Software and Systems. ESSoS 2016. Lecture Notes in Computer Science(), vol 9639. Springer, Cham. https://doi.org/10.1007/978-3-319-30806-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-30806-7_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30805-0
Online ISBN: 978-3-319-30806-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics