Skip to main content
Log in

Constructing data supply chain based on layered PROV

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The inability to effectively construct data supply chain in distributed environments is becoming one of the top concerns in big data area. Aiming at this problem, a novel method of constructing data supply chain based on layered PROV is proposed. First, to abstractly describe the data transfer processes from creation to distribution, a data provenance specification presented by W3C is used to standardize the information records of data activities within and across data platforms. Then, a distributed PROV data generation algorithm for multi-platform is designed. Further, we propose a tiered storage management of provenance based on summarization technology, which reduces the provenance records by compressing mid versions so as to realize multi-level management of PROV. In specific, we propose a hierarchical visual technique based on a layered query mechanism, which allows users to visualize data supply chain from general to detail. The experimental results show that the proposed approach can effectively improve the construction performance for data supply chain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Groth P (2013) Transparency and reliability in the data supply chain. Internet Comput IEEE 17(2):69–71

    Article  Google Scholar 

  2. Zhou W, Fei Q, Narayan A et al (2011) Secure network provenance. The 23rd ACM Symposium on Operating Systems Principles (SOSP 2011), pp295–310, 23–26 October 2011

  3. Xie Y, Feng D, Tan Z et al (2013) Design and evaluation of a provenance-based rebuild framework. IEEE Trans Magn 49(6):2805–2811

    Article  Google Scholar 

  4. Stamatogiannakis M, Groth P, Bos H (2015) Looking inside the black-box: capturing data provenance using dynamic instrumentation. Provenance and Annotation of Data and Processes, vol 8628, pp 155–167

  5. Ko RKL, Will M (2014) Progger: an efficient, Tamper-evident Kernel-space logger for cloud data provenance tracking. In: IEEE 7th International Conference on Cloud Computing (CLOUD). IEEE, New York, pp 881–889

  6. Yu T, Ko RKL, Holmes G (2013) Security and data accountability in dis- tributed systems a: provenance survey. In: 2013 IEEE 10th International Conference On High Performance Computing and Communications 2013 IEEE International Conference On Embedded and Ubiquitous Computing (HPCC EUC). IEEE, New York, pp 1571–1578

  7. Xie Y, Muniswamy-Reddy KK, Feng D et al (2013) Evaluation of a hybrid approach for efficient provenance storage[J]. ACM Trans Storage 9(4):1752–1756

    Article  Google Scholar 

  8. Moreau L, Clifford B, Freire J et al (2010) The open provenance model core specification (vl.l). Future Gen Comput Syst 27(6):743–756

    Article  Google Scholar 

  9. Moreau L, Missier P (2013) PROV-DM: the PROV data model. http://www.w3.org/TR/2013/REC-prov-dm-20130430/

  10. Jones S , Strong C, Parker-Wood A, Holloway A, LongD D E (2011) Easing the burdens of HPC file management. PDSW ’11 Proceedings of the sixth workshop on Parallel Data Storage, New York, NY, USA pp 25–30 November 2011

  11. Mattoso M, Dias J, OcanaKary ACS et al (2015) Dynamic steering of HPC scientific workflows: A survey. Future Gen Comput Syst 46:100–113

  12. Korolev V, Joshi A (2014) PROB: a tool for tracking provenance and reproducibility of big data experiments. The 20th IEEE International Symposium on High Performance Computer Architecture (HPCA2014), 02 March 2014

  13. Imran A, Agrawal R, Walker J et al (2014) A layer based architecture for provenance in big data. In: 2014 IEEE International Conference on Big Data (big data). IEEE, New York, pp 29–31

  14. Gehani A, Tariq D (2012) SPADE: support for provenance auditing in distributed environments. ACM/IFIP/USENIX 13th International Middleware Conference, pp 101–120, 3–7 December 2012

  15. Zhao D, Shou C, Malik T et al (2013) Distributed data provenance for large-scale data-intensive computing. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, New York, pp 1–8

  16. Suen CH, Ko RKL, Yu ST et al (2013) S2Logger: End-to-End Data Tracking Mechanism for Cloud Data Provenance. TrustCom2013:12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp 16–18 July 2013

  17. Jacobson V, Braynard RL, Diebert T et al (2012) Custodian-based information sharing. IEEE Commun Mag 50(7):38–43

    Article  Google Scholar 

  18. Zhang C, Li S (2016) Secure information sharing in internet-based supply chain management systems. J Comput Inf Syst 46(4):18–24

    Google Scholar 

  19. Freire J, Miles S, Missier P et al (2011) The open provenance model core specification (v1.1)[J]. Future Gen Comput Syst 27(6):743–756

    Article  Google Scholar 

Download references

Acknowledgments

This work is partly supported by the National Natural Science Foundation of China under Grant 61272520, 61370196, 61532012; the Research Fund for the Doctoral Program of Higher Education under Grant No.20110005110007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tin-Yu Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, P., Wu, TY., Li, XM. et al. Constructing data supply chain based on layered PROV. J Supercomput 73, 1509–1531 (2017). https://doi.org/10.1007/s11227-016-1838-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1838-0

Keywords

Navigation