ABSTRACT
Network provenance, which records the execution history of network events as meta-data, is becoming increasingly important for network accountability and failure diagnosis. For example, network provenance may be used to trace the path that a message traversed in a network, or to reveal how a particular routing entry was derived and the parties involved in its derivation. A challenge when storing the provenance of a live network is that the large number of the arriving messages may incur substantial storage overhead. In this paper, we explore techniques to dynamically compress distributed provenance stored at scale. Logically, the compression is achieved by grouping equivalent provenance trees and maintaining only one concrete copy for each equivalence class. To efficiently identify equivalent provenance, we (1) introduce distributed event-based linear programs (DELP) to specify distributed network applications, and (2) statically analyze DELPs to allow for quick detection of provenance equivalence at runtime. Our experimental results demonstrate that our approach leads to significant storage reduction and query latency improvement over alternative approaches.
- Y. Amsterdamer, D. Deutch, T. Milo, and V. Tannen. On provenance minimization. ACM Trans. Database Syst., 37(4):30, 2012. Google ScholarDigital Library
- Z. Bao, H. Köhler, L. Wang, X. Zhou, and S. W. Sadiq. Efficient provenance storage for relational queries. In CIKM, pages 1352--1361, 2012. Google ScholarDigital Library
- A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In Proceedings of ACM SIGMOD, pages 993--1006, 2008. Google ScholarDigital Library
- A. Chen, Y. Wu, A. Haeberlen, W. Zhou, and B. T. Loo. The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance. In Proceedings of ACM SIGCOMM, Aug. 2016. Google ScholarDigital Library
- C. Chen, L. Jia, H. Xu, C. Luo, W. Zhou, and B. T. Loo. A program logic for verifying secure routing protocols. In Proceedings of FORTE, pages 117--132, 2014.Google ScholarCross Ref
- C. Chen, H. Lehri, L. K. Loh, A. Alur, L. Jia, B. T. Loo, and W. Zhou. Provably correct distributed provenance compression (cmu-cylab-17-001). Technical report, CyLab, Carnegie Mellon University, Jan. 2017.Google Scholar
- R. Droms. Dynamic host configuration protocol. 1997. RFC 2131.Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proceedings of PODS, pages 31--40, 2007. Google ScholarDigital Library
- J. Jung, E. Sit, H. Balakrishnan, and R. Morris. DNS performance and the effectiveness of caching. IEEE/ACM Trans. Netw., 10(5):589--603, 2002. Google ScholarDigital Library
- G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In Proceedings of ACM SIGMOD, pages 951--962, 2010. Google ScholarDigital Library
- B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative Networking Language, Execution and Optimization. In Proceedings of ACM SIGMOD, 2006. Google ScholarDigital Library
- B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking. In Communications of the ACM, 2009. Google ScholarDigital Library
- P. V. Mockapetris. Domain names - implementation and specification, Nov. 1987. RFC 1035. Google ScholarDigital Library
- S. C. Muthukumar, X. Li, C. Liu, J. B. Kopena, M. Oprea, and B. T. Loo. Declarative toolkit for rapid network protocol simulation and experimentation. In SIGCOMM (demo), 2009.Google Scholar
- ns 3 project. Network Simulator 3. http://www.nsnam.org/.Google Scholar
- D. Olteanu and J. Závodný. On factorisation of provenance polynomials. In Proceedings of TaPP, 2011.Google Scholar
- D. Olteanu and J. Závodný. Factorised representations of query results: size bounds and readability. In Proceedings of ICDT, pages 285--298, 2012. Google ScholarDigital Library
- D. C. Plummer. An ethernet address resolution protocol. 1982. RFC 826. Google ScholarDigital Library
- M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and D. Walker. Abstractions for network update. In Proceedings of ACM SIGCOMM, pages 323--334, 2012. Google ScholarDigital Library
- Robert Ramey. http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/index.html.Google Scholar
- A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of ICDE, pages 91--102, 1997. Google ScholarDigital Library
- Y. Wu, A. Chen, A. Haeberlen, W. Zhou, and B. T. Loo. Automated network repair with meta provenance. In Proceedings of HotNets, pages 26:1--26:7, 2015. Google ScholarDigital Library
- Y. Wu, M. Zhao, A. Haeberlen, W. Zhou, and B. T. Loo. Diagnosing missing events in distributed systems with negative provenance. In Proceeding of ACM SIGCOMM, pages 383--394, 2014. Google ScholarDigital Library
- Y. Xie, K. Muniswamy-Reddy, D. Feng, Y. Li, and D. D. E. Long. Evaluation of a hybrid approach for efficient provenance storage. TOS, 9(4):14, 2013. Google ScholarDigital Library
- E. W. Zegura, K. L. Calvert, and S. Bhattacharjee. How to model an internetwork. In Proceedings IEEE INFOCOM, pages 594--602, 1996. Google ScholarDigital Library
- W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In Proceedings of SOSP, pages 295--310, 2011. Google ScholarDigital Library
- W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen, Z. G. Ives, B. T. Loo, and M. Sherr. Distributed time-aware provenance. PVLDB, 6(2):49--60, 2012. Google ScholarDigital Library
- W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. In Proceedings of ACM SIGMOD, pages 615--626, 2010. Google ScholarDigital Library
Index Terms
- Distributed Provenance Compression
Recommendations
The perm provenance management system in action
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataIn this demonstration we present the Perm provenance management system (PMS). Perm is capable of computing, storing and querying provenance information for the relational data model. Provenance is computed by using query rewriting techniques to annotate ...
Efficient querying of distributed provenance stores
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed ComputingCurrent projects that automate the collection of provenance information use a centralized architecture for managing the resulting metadata - that is, provenance is gathered at remote hosts and submitted to a central provenance management service. In ...
Evaluation of a Hybrid Approach for Efficient Provenance Storage
Provenance is the metadata that describes the history of objects. Provenance provides new functionality in a variety of areas, including experimental documentation, debugging, search, and security. As a result, a number of groups have built systems to ...
Comments