Abstract
The inability to effectively construct data supply chain in distributed environments is becoming one of the top concerns in big data area. Aiming at this problem, a novel method of constructing data supply chain based on layered PROV is proposed. First, to abstractly describe the data transfer processes from creation to distribution, a data provenance specification presented by W3C is used to standardize the information records of data activities within and across data platforms. Then, a distributed PROV data generation algorithm for multi-platform is designed. Further, we propose a tiered storage management of provenance based on summarization technology, which reduces the provenance records by compressing mid versions so as to realize multi-level management of PROV. In specific, we propose a hierarchical visual technique based on a layered query mechanism, which allows users to visualize data supply chain from general to detail. The experimental results show that the proposed approach can effectively improve the construction performance for data supply chain.









Similar content being viewed by others
References
Groth P (2013) Transparency and reliability in the data supply chain. Internet Comput IEEE 17(2):69–71
Zhou W, Fei Q, Narayan A et al (2011) Secure network provenance. The 23rd ACM Symposium on Operating Systems Principles (SOSP 2011), pp295–310, 23–26 October 2011
Xie Y, Feng D, Tan Z et al (2013) Design and evaluation of a provenance-based rebuild framework. IEEE Trans Magn 49(6):2805–2811
Stamatogiannakis M, Groth P, Bos H (2015) Looking inside the black-box: capturing data provenance using dynamic instrumentation. Provenance and Annotation of Data and Processes, vol 8628, pp 155–167
Ko RKL, Will M (2014) Progger: an efficient, Tamper-evident Kernel-space logger for cloud data provenance tracking. In: IEEE 7th International Conference on Cloud Computing (CLOUD). IEEE, New York, pp 881–889
Yu T, Ko RKL, Holmes G (2013) Security and data accountability in dis- tributed systems a: provenance survey. In: 2013 IEEE 10th International Conference On High Performance Computing and Communications 2013 IEEE International Conference On Embedded and Ubiquitous Computing (HPCC EUC). IEEE, New York, pp 1571–1578
Xie Y, Muniswamy-Reddy KK, Feng D et al (2013) Evaluation of a hybrid approach for efficient provenance storage[J]. ACM Trans Storage 9(4):1752–1756
Moreau L, Clifford B, Freire J et al (2010) The open provenance model core specification (vl.l). Future Gen Comput Syst 27(6):743–756
Moreau L, Missier P (2013) PROV-DM: the PROV data model. http://www.w3.org/TR/2013/REC-prov-dm-20130430/
Jones S , Strong C, Parker-Wood A, Holloway A, LongD D E (2011) Easing the burdens of HPC file management. PDSW ’11 Proceedings of the sixth workshop on Parallel Data Storage, New York, NY, USA pp 25–30 November 2011
Mattoso M, Dias J, OcanaKary ACS et al (2015) Dynamic steering of HPC scientific workflows: A survey. Future Gen Comput Syst 46:100–113
Korolev V, Joshi A (2014) PROB: a tool for tracking provenance and reproducibility of big data experiments. The 20th IEEE International Symposium on High Performance Computer Architecture (HPCA2014), 02 March 2014
Imran A, Agrawal R, Walker J et al (2014) A layer based architecture for provenance in big data. In: 2014 IEEE International Conference on Big Data (big data). IEEE, New York, pp 29–31
Gehani A, Tariq D (2012) SPADE: support for provenance auditing in distributed environments. ACM/IFIP/USENIX 13th International Middleware Conference, pp 101–120, 3–7 December 2012
Zhao D, Shou C, Malik T et al (2013) Distributed data provenance for large-scale data-intensive computing. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, New York, pp 1–8
Suen CH, Ko RKL, Yu ST et al (2013) S2Logger: End-to-End Data Tracking Mechanism for Cloud Data Provenance. TrustCom2013:12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp 16–18 July 2013
Jacobson V, Braynard RL, Diebert T et al (2012) Custodian-based information sharing. IEEE Commun Mag 50(7):38–43
Zhang C, Li S (2016) Secure information sharing in internet-based supply chain management systems. J Comput Inf Syst 46(4):18–24
Freire J, Miles S, Missier P et al (2011) The open provenance model core specification (v1.1)[J]. Future Gen Comput Syst 27(6):743–756
Acknowledgments
This work is partly supported by the National Natural Science Foundation of China under Grant 61272520, 61370196, 61532012; the Research Fund for the Doctoral Program of Higher Education under Grant No.20110005110007.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, P., Wu, TY., Li, XM. et al. Constructing data supply chain based on layered PROV. J Supercomput 73, 1509–1531 (2017). https://doi.org/10.1007/s11227-016-1838-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1838-0