Skip to main content

Identifying incompleteness in privacy policy goals using semantic frames

  • RE 2018
  • Published:
Requirements Engineering Aims and scope Submit manuscript

Abstract

Companies that collect personal information online often maintain privacy policies that are required to accurately reflect their data practices and privacy goals. To be comprehensive and flexible for future practices, policies contain ambiguity that summarizes practices over multiple types of products and business contexts. Ambiguity in data practice descriptions undermines policies as an effective way to communicate system design choices to users and as a reliable regulatory mechanism. In this paper, we report an investigation to identify incompleteness by representing data practice descriptions as semantic frames. The approach is a grounded analysis to discover which semantic roles corresponding to a data action are needed to construct complete data practice descriptions. Our results include 698 data action instances obtained from 949 manually annotated statements across 15 privacy policies and three domains: health, news and shopping. Therein, we identified 2316 instances of 17 types of semantic roles and found that the distribution of semantic roles across the three domains was similar. Incomplete data practice descriptions undermine user comprehension and can affect the user’s perceived privacy risk, which we measure using factorial vignette surveys. We observed that user risk perception decreases when two roles are present in a statement: the condition under which a data action is performed, and the purpose for which the user’s information is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Sally French, “Snapchat’s new ‘scary’ privacy policy has left users outraged,” Market Watch, November 2, 2015. http://www.marketwatch.com/story/snapchats-new-scary-privacy-policy-has-left-users-outraged-2015-10-29.

  2. Zack Whittaker, “Google must review privacy policy, EU data regulators rule,” ZDNet, October 16, 2012. http://www.zdnet.com/article/google-must-review-privacy-policy-eu-data-regulators-rule/.

References

  1. Aarts B (2011) Oxford modern english grammar. Oxford University Press, Oxford

    Google Scholar 

  2. Acquisti A, Grossklags J (2012) An online survey experiment on ambiguity and privacy. Commun Strateg 88(4):19–39

    Google Scholar 

  3. Acquisti A, Gritzalis S, Lambrinoudakis C, di Vimercati S (2007) Digital privacy: theory, technologies, and practices. CRC Press, Boca Raton

    Book  Google Scholar 

  4. Antón AI, Earp JB (2004) A requirements taxonomy for reducing web site privacy vulnerabilities. Requir Eng J 9(3):169–185

    Article  Google Scholar 

  5. Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics—volume 1 (ACL’98), vol 1. Association for Computational Linguistics, Stroudsburg, pp 86–90

  6. Bellman S, Johnson EJ, Kobrin SJ, Lohse GL (2004) International differences in information privacy concerns: a global survey of consumers. Inf Soc 20(5):313–324

    Article  Google Scholar 

  7. Bhatia J, Breaux TD, Reidenberg JR, Norton TB (2016) A theory of vagueness and privacy risk perception. In: IEEE 24th international requirements engineering conference (RE’16), Beijing, China, 2016

  8. Bhatia J, Breaux TD (2017) A data purpose case study of privacy policies. In: 25th IEEE international requirements engineering conference, RE: Next! Track, Lisbon, Portugal, 2017

  9. Bhatia J, Breaux T (2018a) Semantic incompleteness in privacy policy goals. In: 2018 IEEE 26th international requirements engineering conference (RE), Banff, AB, Canada, 2018, pp 159–169. https://doi.org/10.1109/re.2018.00025

  10. Bhatia J, Breaux T (2018) Empirical measurement of perceived privacy risk. ACM Trans Hum Comput Interact (TOCHI) 25(6):34

    Article  Google Scholar 

  11. Breaux TD, Antón AI (2007) Impalpable constraints: framing requirements for formal methods. Technical report technical report TR-2006-06, Department of Computer Science, North Carolina State University, Raleigh, North Carolina, February 2007

  12. Breaux TD, Vail MW, Antón AI (2006) Towards compliance: extracting rights and obligations to align requirements with regulations. In: Proceedings of IEEE 14th international requirements engineering conference (RE’06), Minneapolis, Minnesota, pp 49–58

  13. Clark LA, Watson D (1995) Constructing validity: basic issues in objective scale development. Psychol Assess 7(3):309–319

    Article  Google Scholar 

  14. Dalpiaz F, van der Schalk I, Lucassen G (2018) Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Requirements engineering: foundation for software quality 2018, pp 119–135

  15. Das D, Chen D, Martins AFT, Schneider N, Smith NA (2014) Frame-semantic parsing. Comput Linguist 40:1

    Article  Google Scholar 

  16. de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language. In: National conference on artificial intelligence (AAAI), pp 1678–1679

  17. Fernández DM, Wagner S (2015) Naming the pain in requirements engineering: a design for a global family of surveys and first results from Germany. Inf Softw Technol 57:616–643

    Article  Google Scholar 

  18. Fikes RE, Kehler T (1985) The role of frame-based representation in knowledge representation and reasoning. Commun ACM 28(9):904–920

    Article  Google Scholar 

  19. Fischhoff B, Slovic P, Lichtenstein S, Read S, Combs B (1978) How safe is safe enough? A psychometric study of attitudes towards technological risks and benefits. Policy Sci 9:127–152

    Article  Google Scholar 

  20. Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Book  Google Scholar 

  21. Gruber JS (1965) Studies in lexical relations. Ph.D. thesis, MIT

  22. Fillmore CJ (1976) Frame semantics and the nature of language. Ann N Y Acad Sci 280:20–32

    Article  Google Scholar 

  23. Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  24. Kaisser M, Webber B (2007) Question answering based on semantic roles. In: Proceedings of the workshop on deep linguistic processing (DeepLP’07). Association for Computational Linguistics, Stroudsburg, PA, USA, pp 41–48

  25. Knijnenburg B, Kobsa A (2014) Increasing sharing tendency without reducing satisfaction: finding the best privacy-settings user interface for social networks. In: 35th international conference on information systems, pp 1–21

  26. Massey A, Rutledge RL, Antón AI, Swire PP (2014) Identifying and classifying ambiguity for regulatory requirements. In: 22nd IEEE international requirement engineering conference, pp 83–92

  27. Minsky M (1981) A framework for representing knowledge. In: Haugeland J (ed) Mind design. MIT Press, Cambridge

    Google Scholar 

  28. Perrin A, Duggan M (2015) Americans’ internet access: 2000–2015. In: PEW internet and American life project, June 26, 2015

  29. Roth M, Lapata M (2015) Context-aware frame-semantic role labeling. Trans Assoc Comput Linguist 3:449–460

    Article  Google Scholar 

  30. Saldaña J (2012) The coding manual for qualitative researchers. SAGE Publications, Thousand Oaks

    Google Scholar 

  31. Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company, Boston

    Google Scholar 

  32. Surdeanu M, Harabagiu S, Williams J, Aarseth P (2003) Using predicate-argument structures for information extraction. In: Proceedings of 41st annual meeting on association for computational linguistics—volume 1 (ACL’03), vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 8–15

  33. Tsai JY, Egelman S, Cranor L, Acquisti A (2011) The effect of online privacy information on purchasing behavior: an experimental study. Inf Syst Res 22(2):254–268

    Article  Google Scholar 

  34. Wakslak C, Trope Y (2009) The effect of construal level on subjective probability estimates. Psychol Sci 20(1):52–58

    Article  Google Scholar 

  35. Wallander L (2009) 25 years of factorial surveys in sociology: a review. Soc Sci Res 38(3):505–520

    Article  Google Scholar 

  36. Wang Y (2015) Semantic information extraction for software requirements using semantic role labeling. In: 2015 IEEE international conference on progress in informatics and computing (PIC), Nanjing, 2015, pp 332–337

  37. Yin RK (2013) Case study research: design and methods, 5th edn. Sage Publication, Cambridge

    Google Scholar 

Download references

Acknowledgements

We thank the CMU RE Lab for their helpful feedback. This research was funded by NSF Frontier Award #1330596 and NSF CAREER Award #1453139.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaspreet Bhatia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: extracted semantic roles

We identified 17 total semantic roles in our analysis, six of which are described in Sect. 3.2. The remaining roles are as follows:

  • Action location The location where the action is performed.

  • Comparison Comparison of the action with other action(s).

  • Constraint The restrictions on the action.

  • Duration The duration for which the action will be performed.

  • Exception Describes an exception to the action.

  • Retention property This role describes how the information is retained. Example role value from Costco policy: separately from other member databases.

  • Hypernymy A more generic semantic role value with specific values.

  • Instrument The medium with which the action is performed.

  • Negation The presence of this role signals that the action will not be performed.

  • Retention location The location at which the object of the retention action is retained.

  • Time of action The time at which the action is performed.

Appendix B: semantic roles frequency

The following table presents statistics, including the total number of data actions identified in each data action category (Total Actions); the number of role value instances for the most frequent roles and the total number of roles attached to each data actions category (Total Roles), for each policy (Tables 15, 16 and 17).

Table 15 Frequency of semantic roles across health policies
Table 16 Frequency of semantic roles across news policies
Table 17 Frequency of semantic roles across shopping policies

Appendix C: lexical and syntactic pattern

The following table presents all the unique lexical and syntactic patterns we discovered in our dataset (Table 18).

Table 18 All lexical and syntactic patterns discovered

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhatia, J., Evans, M.C. & Breaux, T.D. Identifying incompleteness in privacy policy goals using semantic frames. Requirements Eng 24, 291–313 (2019). https://doi.org/10.1007/s00766-019-00315-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00766-019-00315-y

Keywords