Abstract
The notion of data provenance was formally introduced a decade ago and has since been investigated, but mainly from a functional perspective, which follows the historical pattern of introducing new technologies with the expectation that security and privacy can be added later. Despite very recent interests from the cyber security community on some specific aspects of data provenance, there is no long-haul, overarching, systematic framework for the security and privacy of provenance. The importance of secure provenance R&D has been emphasized in the recent report on Federal game-changing R&D for cyber security especially with respect to the theme of Tailored Trustworthy Spaces. Secure data provenance can significantly enhance data trustworthiness, which is crucial to various decision-making processes. Moreover, data provenance can facilitate accountability and compliance (including compliance with privacy preferences and policies of relevant users), can be an important factor in access control and usage control decisions, and can be valuable in data forensics. Along with these potential benefits, data provenance also poses a number of security and privacy challenges. For example, sometimes provenance needs to be confidential so it is visible only to properly authorized users, and we also need to protect the identity of entities in the provenance from exposure. We thus need to achieve high assurance of provenance without comprising privacy of those in the chain that produced the data. Moreover, if we expect voluntary large-scale participation in provenance-aware applications, we must assure that the privacy of the individuals or organizations involved will be maintained. It is incumbent on the cyber security community to develop a technical and scientific framework to address the security and privacy challenges so that our society can gain maximum benefit from this technology. In this paper, we discuss a framework of theoretical foundations, models, mechanisms and architectures that allow applications to benefit from privacy-enhanced and secure use of provenance in a modular fashion. After introducing the main components of such a framework and the notion of provenance life cycle, we discuss in details research questions and issues concerning each such component and related approaches.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10844-014-0322-7/MediaObjects/10844_2014_322_Fig9_HTML.gif)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Buneman, P., Khanna, S., Tan, W.C. (2001). Database Theory-ICDT, (pp. 316–330).
Buneman, P., Khanna, S., Tan, W.C. (2000). FST TCS 2000: Foundations of software technology and theoretical computer science. In S. Kapoor & S. Prasad (Eds.) Lecture notes in computer science (vol. 1974, pp. 87–93). Berlin: Springer. doi:10.1007/3-540-44450-5_6.
Cheney, J., Chong, S., Foster, N., Seltzer, M., Vansummeren, S. (2009). In Proceeding of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications, OOPSLA ’09 (pp. 957–964). New York: ACM. doi:10.1145/1639950.1640064.
Moreau, L. (2009). Foundations and trends in web science. http://eprints.ecs.soton.ac.uk/18176/1/psurvey.pdf.
Moreau, L., Groth, P., Miles, S., Vazquez-Salceda, J., Ibbotson, J., Jiang, S., Munroe, S., Rana, O., Schreiber, A., Tan, V., Varga, L. (2008). Communications of the ACM, 51, 52. doi:10.1145/1330311.1330323.
Sahoo, S., Sheth, A., Henson, C. (2008). IEEE Internet Computing, 12(4), 46.
Simmhan, Y.L., Plale, B., Gannon, D. (2005). SIGMOD Record, 34, 31. doi:10.1145/1084805.1084812.
Curbera, F., Doganata, Y., Martens, A., Mukhi, N., Slominski, A. (2008). On the move to meaningful internet systems: OTM, (pp. 100–119).
Hui, P., Bruce, J., Fink, G., Gregory, M., Best, D., McGrath, L., Endert, A. (2010). In International symposium on collaborative technologies and systems (CTS) (pp. 489–498). doi:10.1109/CTS.2010. 5478473 .
Moitra, A., Barnett, B., Crapo, A., Dill, S. (2009). In Military communications conference, MILCOM 2009. IEEE (pp. 1–7). doi:10.1109/MILCOM.2009.5379854.
Hajnal, A., Kifor, T., Pedone, G., Varga, L. (2007). In Proceedings of HealthGrid 2007 (pp. 330–341).
Kifor, T., Varga, L., Vazquez-Salceda, J., Alvarez, S., Willmott, S., Miles, S., Moreau, L. (2006). IEEE Intelligent Systems, 21(6), 38. doi:9D04F813-E31E-416F-99B7-DBC4D177ACA7.
Liu, Y., Futrelle, J., Myers, J., Rodriguez, A., Kooper, R. (2010). In 2010 international symposium on collaborative technologies and systems (CTS) (pp. 330–339). doi:10.1109/CTS.2010.5478496.
Groth, P., Miles, S., Moreau, L. (2009). ACM Transactions Internet Technology, 9(3), 1. doi:10.1145/ 1462159.1462162 .
Golbeck, J. (2006). Provenance and annotation of data. In L. Moreau & I. Foster (Eds.), Lecture notes in computer Science (vol. 4145, pp. 101–108). Berlin: Springer. doi:10.1007/11890850_12.
Lu, R., Lin, X., Liang, X., Shen, X.S. (2010). In Proceedings of the 5th ACM symposium on information, computer and communications security, ASIACCS ’10 (pp. 282–292). New York: ACM. doi:10.1145/1755688.1755723.
Vijayakumar, N., & Plale, B. (2006).
Networking, F., Research, I.T., Program, D.N. (2010). (May 2010). http://www.nitrd.gov/pubs/CSIA_IWG_%Cybersecurity_%20Gamechange_RD_%20Recommendations_20100513.pdf .
Networking, F., Research, I.T., Program, D.N. (2009). (September 2009). http://www.nitrd.gov/pubs/CSIA_IWG_%Cybersecurity_%20Gamechange_RD_%20Recommendations_20100513.pdf.
Muniswamy-Reddy, K., Holland, D., Braun, U., Seltzer, M. (2006). In Proceedings of the 2006 USENIX annual technical conference (pp. 43–56).
Agrawal, P., Benjelloun, O., Sarma, A., Hayworth, C., Nabar, S., Sugihara, T., Widom, J. (2006). In VLDB (pp. 1151–1154).
Green, T., Karvounarakis, G., Ives, Z., Tannen V. (2007). In VLDB.
Ives, Z., Khandelwal, N., Kapur, A., Cakir, M. (2005). In CIDR (pp. 107–118).
Taylor, N., & Ives, Z. (2006). In SIGMOD’06 (pp. 13–24).
Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S. (2006). In International provenance and annotation workshop (IPAW) (pp. 133–147).
Cohen, S., Boulakia, S., Davidson, S. (2006). In Third international workshop on data integration in the life sciences (DILS) (pp. 264–279).
Davidson, S., Boulakia, S., Eyal, A., Ludascher, B., McPhillips, T., Bowers, S., Anand, M., Freire, J. (2007). IEEE Data Engineering Bulletin, 30(4), 44.
Golbeck, J., & Hendler, J. (2008). Concurrency and Computation: Practice and Experience, 20(5), 431.
Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., Moreau, L. (2006). An architecture for provenance systems. Technical report, University of Southampton. http://eprints.ecs.soton.ac.uk/13216/1/provenanceArchitecture10.pdf.
Simmhan, Y., Plale, B., Gannon, D. (2008). International Journal Web Service Research, 5(2), 1.
Braun, U., Shinnar, A., Seltzer, M. (2008). In Proceedings of the 3rd conference on hot topics in security USENIX association (p. 4).
Hasan, R., Sion, R., Winslett, M. (2007). In Proceedings of the 2007 ACM workshop on storage security and survivability, StorageSS ’07 (pp. 13–18). New York: ACM. doi:10.1145/1314313.1314318.
Hasan, R., Sion, R., Winslett, M. (2009). In Proceedings of the 7th conference on file and storage technologies (FAST’09) (pp. 1–14) .
Zhang, J., Chapman, A., Lefevre, K. (2009). In Proceedings of the 6th VLDB workshop on secure data management (SDM’09) (pp. 17–32).
McDaniel, P., Butler, K., McLaughlin, S., Sion, R., Zadok, E., Winslett, M. (2010). In 2nd USENIX workshop on the theory and practice of provenance (TaPP 10).
Lyle, J., & Martin, A. (2010). In 2nd USENIX workshop on the theory and practice of provenance (TaPP 10).
Sultana, S., & Bertino, E. (2012). In 4th international provenance and annotation workshop.
Chapman, A.P., Jagadish, H.V., Ramanan, P. (2008). Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 993–1006).
Heinis, T., & Alonso, G. (2008). In Proceedings of the 2008 ACM SIGMOD international conference on management of data (pp. 1007–1018).
Samarati, P., & Sweeney, L. (1998). In Proceedings of principles of database systems (p. 188).
Syalim, A., Hori, Y., Sakurai, K. (2009). In Advances in information security and assurance, (pp. 51–59).
Corcoran, B., Swamy, N., Hicks, M. (2007). In On-line proceedings of the workshop on principles of provenance (PrOPr) (Citeseer).
Ni, Q., Xu, S., Bertino, E., Sandhu, R., Han, W. (2009). Secure data management (pp. 68–88).
Perez, J., Arenas, M., Gutierrez, C. (2009). ACM Transactions on Database Systems (TODS), 34(3), 1.
PrudHommeaux, E., Seaborne, A., et al. (2006). W3C working draft, 4.
Cadenhead, T., Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. (2011). In Proceedings of the first ACM conference on data and application security and privacy, CODASPY ’11 (pp. 133–144). New York: ACM. doi:10.1145/1943513.1943532.
Cadenhead, T., Khadilkar, T., Kantarcioglu, M., Thuraisingham, B. (2012). In Proceedings of the 17th ACM symposium on access control models and technologies, SACMAT ’12 (pp. 113–116) New York: ACM. doi:10.1145/2295136.2295157.
Nguyen, D., Park, J., Sandhu, R. (2012). In 4th USENIX workshop on the theory and practice of provenance (USENIX Association), TaPP’12.
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., den Bussche, J.V. (2011). Future Generation Computer Systems, 27(6), 743. doi:10.1016/j.future.2010.07.005. http://www.sciencedirect.com/science/article/pii/S0167739X10001275.
Park, J., Nguyen, D., Sandhu, R. (2012). In 10th annual conference on privacy, security and trust (IEEE), PST 2012.
Nguyen, D., Park, J., Sandhu, R. (2012). In 2012 IEEE international Conference on information reuse and integration (IRI).
Park, J., Nguyen, D., Sandhu, R. (2011). In 7th international conferenceon collaborative computing: Networking applications and worksharing (CollaborateCom) (pp. 221–230).
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). In ICDE.
Li, N., Li, T., Venkatasubramanian, S. (2007). In ICDE.
Dwork, C. (2008). In TAMC (pp. 1–19).
Rachapalli, J., Kantarcioglu, M., Thuraisingham, B. (2012). In 4th USENIX workshop on the theory and practice of provenance (USENIX Association), TaPP’12.
Boneh, D., Di Crescenzo, R., Ostrovsky, R., Persiano, G. (2004). In Advances in Cryptology-Eurocrypt 2004 (pp. 506–522). Springer.
Boneh, D., & Waters, B. (2007). Theory of cryptography, (pp. 535–554).
Goyal, V., Pandey, O., Sahai, A., Waters, B. (2006). In ACM Conference on computer and communications security (pp. 89–98).
Lewko, A., Okamoto, T., Sahai, A., Takashima, K., Waters, B. (2010). In EUROCRYPT (pp. 62–91).
Ostrovsky, R., Sahai, A., Waters, B. (2007). In ACM Conference on Computer and Communications Security (pp. 195–203).
Pirretti, M., Traynor, P., McDaniel, P., Waters, B. (2010). Journal of Computer Security, 18(5), 799.
Kiayias, A., Xu, S., Yung, M. (2008). In Proceedings of 6th international conference security and cryptography for networks (SCN’08). Lecture notes in computer science (vol. 5229, pp. 57–76). Springer.
Chaum, D., & van Heyst, E. (1991). In EUROCRYPT (pp. 257–265).
Cheney, J. (2007). IEEE Data Engineering Bulletin, 30(4), 22.
Groth, P. (2007). The origin of data: Enabling the determination of provenance in multi-institutional scientific systems through the documentation of processes.Ph.D. thesis University of Southampton. http://eprints.ecs.soton.ac.uk/14649/1/ThesisSubmitted.pdf.
Xu, S., Qian, H., Wang, F., Zhan, Z., Bertino, E., Sandhu, R. (2010). In Proceedings of 11th International Conference Web-Age Information Management (WAIM’10) (pp. 398–404).
Lysyanskaya, A., Micali, S., Reyzin, L., Shacham, H. (2004). Advances in cryptology - EUROCRYPT. In C. Cachin & J. Camenisch (Eds.), Lecture notes in computer science (vol. 3027, pp. 74–90). Springer.
Bellare, M., & Neven, G. (2006). In ACM conference on computer and communications security (CCS’06) (pp. 390–399).
Qian, H., & Xu, S. (2010). Information Processing Letter (accepted in 2010).
Boneh, D., Gentry, C., Lynn, B., Shacham, H. (2003). In EUROCRYPT’03 (pp. 416–432).
Ateniese, G., & Hohenberger, S. (2005). In ACM conference on computer and communications security (CCS’05) (pp. 310–319).
Blaze, M., Bleumer, G., Strauss, M. (1998). In EUROCRYPT’98date (pp. 127–144).
Libert, B., & Vergnaud, D. (2008). In ACM conference on computer and communications security 2008 (pp. 511–520).
Waters, B. (2005). In EUROCRYPT’05 (pp. 114–127).
Qian, H., & Xu, S. (2011). In To appear in the Proceedings of First ACM Conference on Data and Application Security and Privacy (ACM CODASPY’11).
Ding, X., Tsudik, G., Xu, S. (2009). Journal of Computer Security, 17(4), 489.
Tsudik, G., & Xu, S. (2003). In ASIACRYPT (pp. 269–286).
Xu, S., & Yung, M. (2009). First international conference on trusted systems (INTRUST’09). In Lecture notes in computer science (vol. 6163, pp. 104–128).
Demsky, B. (2009). In Proceedings of the 4rd conference on hot topics in security (USENIX Association).
Weitzner D.J., Abelson, H., Berners-Lee, T., Feigenbaum, J., Hendler, J., Sussman, G.J. (2008). Communication ACM, 51(82). doi:10.1145/1349026.1349043.
Kantarcioglu, M., & Clifton, C. (2004). IEEE TKDE, 16(9), 1026. http://ieeexplore.ieee.org/iel5/69/29187/01316832.pdf?isnumber=29187&prod=JNL&arnumber=1316832&arnumber=1316832&arSt=+1026&ared=+1037&arAuthor=Kantarcioglu%2C+M.%3B+Clifton%2C+C..
Kantarcioglu, M., & Kardes, O. (2009). International Journal of Information and Computer Security, 2(353). doi:10.1504/IJICS.2008.022488. http://www.ingentaconnect.com/content/ind/ijics/2009/00000002/00000004/art00002.
Cederquist, J., Conn, R., Dekker, M., Etalle, S., den Hartog, J. (2005). In Sixth IEEE international workshop on policies for distributed systems and networks (pp. 34–43). doi:10.1109/POLICY.2005.5.
Celikel, E., Kantarcioglu, M., Thuraisingham, B., Bertino, E. (2007). In Proceedings of the 2007 OTM confederated international conference on the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II, OTM’07 (pp. 1548–1566). Berlin / Heidelberg: Springer-Verlag. http://portal.acm.org/citation.cfm?id=1784707.1784750.
Dimmock, N., Belokosztolszki, A., Eyers, D., Bacon, J., Moody, K. (2004). In Proceedings of the ninth ACM symposium on access control models and technologies, SACMAT ’04 (pp. 156–162). New York: ACM. doi:10.1145/990036.990062.
Hong, J.I., Ng, J.D., Lederer, S., Landay, J.A. (2004). In Proceedings of the 5th conference on designing interactive systems: Processes, practices, methods,and techniques, DIS ’04 (pp. 91–100). New York: ACM. doi:10.1145/1013115.1013129.
Cadenhead, T., Kantarcioglu, M., Thuraisingham, B. (2011). In 3th USENIX workshop on the theory and practice of provenance (USENIX Association), TaPP’11.
Dai, C., Lin, D., Kantarcioglu, M., Bertino, E., Celikel, E., Thuraisingham, B.M. (2009). In Secure data management (pp. 49–67).
Krishnan, S., Snow, K.Z., Monrose, F. (2010). In Proceedings of the 17th ACM conference on computer and communications security (pp. 50–60).
Jones, S.T., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H. (2006). In Proceedings of the annual conference on USENIX ’06 annual technical conference (pp. 1–1).
Jones, S.T., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H. (2006). SIGOPS Operations Systematics Review, 40, 14.
Luby, M. (2002). In Annual IEEE symposium on foundations of computer science (p. 271).
Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., Moreau, L. (2006). In Technical report D3.1.1, Ver 0.6, www.pasoa.org.
Stevens, R.D., Robinson, A.J., Goble, C.A. (2003). Bioinformatics Journal, 19(302).
Simmhan, Y.L., Plale, B., Gannon, D. (2006). In IEEE international conference on web services (pp. 18–22).
Gentry, C. (2009). In Proceedings of the 41st annual ACM symposium on theory of computing, STOC ’09 (pp. 169–178).
Acknowledgments
The work reported in this paper has been partially supported by NSF under grant CNS-1111512.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bertino, E., Ghinita, G., Kantarcioglu, M. et al. A roadmap for privacy-enhanced secure data provenance. J Intell Inf Syst 43, 481–501 (2014). https://doi.org/10.1007/s10844-014-0322-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-014-0322-7