Skip to main content

Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9639))

Abstract

A common misstep in the development of security and privacy solutions is the failure to keep the demands resulting from high-level policies in line with the actual implementation that is supposed to operationalize those policies. This is especially problematic in the domain of social networks, where software typically predates policies and then evolves alongside its user base and any changes in policies that arise from their interactions with (and the demands that they place on) the system. Our contribution targets this specific problem, drawing together the assurances actually presented to users in the form of policies and the large codebases with which developers work. We demonstrate that a mapping between policies and code can be inferred from the semantics of the natural language. These semantics manifest not only in the policy statements but also coding conventions. Our technique, implemented in a tool (CASTOR), can infer semantic mappings with F1 accuracy of 70 % and 78 % for two social networks, Diaspora and Friendica respectively – as compared with a ground truth mapping established through manual examination of the policies and code.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://wordnet.princeton.edu/.

  2. 2.

    http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html.

  3. 3.

    https://github.com/diaspora.

  4. 4.

    https://github.com/friendica/friendica.

  5. 5.

    http://www.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/corpora/list/private/brown/brown.html.

  6. 6.

    See example policies at http://www.paulineanthonysamy.com/myData.html.

References

  1. Naive bayes. http://www.nltk.org/_modules/nltk/classify/naivebayes.html

  2. SVM. http://www.nltk.org/_modules/nltk/classify/svm.html

  3. Code contracts (2010). http://research.microsoft.com/en-us/projects/contracts/

  4. EU data directive 95/46/ec, February 2014. http://eur-lex.europa.eu/

  5. Facebook photo leak flaw raises security concerns, March 2015. http://www.computerweekly.com/news/2240242708/Facebook-photo-leak-flaw-raises-security-concerns

  6. Anthonysamy, P.: A framework to detect information asymmetries between privacy policies and controls of OSNs. Ph.D. thesis, Lancaster University (2014)

    Google Scholar 

  7. Anthonysamy, P., Greenwood, P., Rashid, A.: Social networking privacy: understanding the disconnect from policy to controls. IEEE Computer, June 2013

    Google Scholar 

  8. Anthonysamy, P., Greenwood, P., Rashid, A.: A method for analysing traceability between privacy policies and privacy controls of online social networks. In: Preneel, B., Ikonomou, D. (eds.) APF 2012. LNCS, vol. 8319, pp. 187–202. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  9. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Tracing object-oriented code into functional requirements. In: 8th International Workshop on Program Comprehension, 2000, Proceedings IWPC 2000, pp. 79–86 (2000)

    Google Scholar 

  10. Antoniol, G., Canfora, G., de Lucia, A., Casazza, G.: Information retrieval models for recovering traceability links between code and documentation. In: Proceedings of the International Conference on Software Maintenance (ICSM 2000). IEEE Computer Society, Washington, DC (2000)

    Google Scholar 

  11. Ashley, P., Hada, S., Karjoth, G., Powers, C., Schunter, M.: Enterprise Privacy Authorization Language (EPAL). Technical report, Rschlikon (2003)

    Google Scholar 

  12. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). http://dx.doi.org/10.1023/A%3A1010933404324

  13. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  14. Cleland-Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A ML approach for tracing regulatory codes to product specific requirements. In: ICSE (2010)

    Google Scholar 

  15. Cranor, L., Langheinrich, M., Marchiori, M.: A P3P preference exchange language 1.0 (appel 1.0). World Wide Web Consortium, Working Draft WD-P3P-preferences-20020415, April 2002

    Google Scholar 

  16. Fisler, K., Krishnamurthi, S., Meyerovich, L.A., Tschantz, M.C.: Verification and change-impact analysis of access-control policies. In: Proceedings of the 27th International Conference on Software Engineering, ICSE 2005, pp. 196–205. ACM, New York (2005)

    Google Scholar 

  17. Haiduc, S., Bavota, G., Oliveto, R., De Lucia, A., Marcus, A.: Automatic query performance assessment during the retrieval of software artifacts. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, pp. 90–99. ACM, New York (2012)

    Google Scholar 

  18. Jang, D., Jhala, R., Lerner, S., Shacham, H.: An empirical study of privacy-violating information flows in javascript web applications. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 270–283. ACM, New York (2010)

    Google Scholar 

  19. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL 2003) - vol. 1. pp. 423–430, Stroudsburg, PA, USA (2003)

    Google Scholar 

  20. Ma, L., Torney, R., Watters, P., Brown, S.: Automatically generating classifier for phishing email prediction. In: 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp. 779–783, December 2009

    Google Scholar 

  21. Massey, A., Otto, P., Hayward, L., Antn, A.: Evaluating existing security and privacy requirements for legal compliance. Requirements Engineering (2010)

    Google Scholar 

  22. May, M.J., Gunter, C.A., Lee, I.: Privacy APIs: access control techniques to analyze and verify legal privacy policies. In: Proceedings of the 19th IEEE Workshop on Computer Security Foundations, CSFW 2006, pp. 85–97. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  23. Meyer, B.: Object-Oriented Software Construction, 1st edn. Prentice-Hall Inc, Upper Saddle River (1988)

    MATH  Google Scholar 

  24. Pandita, R., Xiao, X., Zhong, H., Xie, T., Oney, S., Paradkar, A.: Inferring method specifications from natural language api descriptions. In: Proceedings of the 34th International Conference on Software Engineering, ICSE 2012 (2012)

    Google Scholar 

  25. Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., Lorensen, W.E., et al.: Object-Oriented Modeling and Design, vol. 199. Prentice Hall, Upper Saddle River (1991)

    MATH  Google Scholar 

  26. Wagner, D.: Static analysis and computer security: new techniques for software assurance. Ph.D. thesis, University of California at Berkeley, December 2000

    Google Scholar 

Download references

Acknowledgements

This research was funded by Lancaster University 40th Anniversary Research Studentship and has no ties to the first author’s current employment at Google.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pauline Anthonysamy .

Editor information

Editors and Affiliations

Appendices

A Implementation: CASTOR

We have implemented our technique in a tool called CASTOR. Figure 4 illustrates the architecture of CASTOR. CASTOR accepts as inputs policy statements and source code; and outputs a set of semantic mappings between policy statements and functions. Briefly, CASTOR works on the input as follows:

Policy Engine: CASTOR’s policy engine is composed of a parser and a statement analyser which transforms the natural language policy into an intermediate representation (as described in Sect. 3.2). This intermediate representation maintains the relevant policy primitives of a statement, namely action (verbs) and data (nouns).

Code Engine: CASTOR’s code engine is composed of a minimal recursive-descent parser that extracts a function’s name, associated class and parameters, along with information identifying the source file and line number where the function can be found. This is inline with our source model construction in Sect. 3.3.

Fig. 4.
figure 4figure 4

CASTOR’s architecture.

Mapping Engine: CASTOR’s mapping engine infers the mapping between the privacy policy \(\mathcal {PP}\) and source code functions \(\mathcal {F}\) using its inbuilt WordNet corpora and classifier. The output of this engine is a set of semantic mappings between policy statement(s) and functions.

B Formulae

Recall (TPR) \(= \frac{tp}{tp + fn}\); False-Positive Rate (FPR) \(= \frac{fp}{fp + tn}\); Precision (PPV) \(= \frac{tp}{tp + fp}\); and F1 \(= 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Anthonysamy, P., Edwards, M., Weichel, C., Rashid, A. (2016). Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language. In: Caballero, J., Bodden, E., Athanasopoulos, E. (eds) Engineering Secure Software and Systems. ESSoS 2016. Lecture Notes in Computer Science(), vol 9639. Springer, Cham. https://doi.org/10.1007/978-3-319-30806-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30806-7_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30805-0

  • Online ISBN: 978-3-319-30806-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics