skip to main content
10.1145/2484425.2484431acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Using substructure mining to identify misbehavior in network provenance graphs

Published:23 June 2013Publication History

ABSTRACT

As distributed systems become more ubiquitous and more complex, the need for efficient, scalable tools to analyze these systems increases. Network provenance graphs offer a rich framework for this task, mapping dependencies between system states and allowing one to explain these states. In this paper, we investigate methods for more efficient substructure mining in the context of network provenance graphs. Specifically, we are interested in identifying frequent substructures that can be used as a feature set for modeling common execution patterns. Knowing these will help network administrators detect nodes in the distributed system that are misbehaving. Therefore, this paper focuses on applying and scaling up substructure mining for network provenance graphs by incorporating a graph database (neo4j) into the substructure mining process and implementing optimizations that improve the efficiency of the substructure mining task. Our results show that the use of the neo4j graph database combined with our algorithmic optimizations greatly improves the run time of our algorithm while not significantly affecting the quality of the substructures returned.

References

  1. R. Balachandran, S. Padmanabhan, and S. Chakravarthy. Enhanced db-subdue: Supporting subtle aspects of graph mining using a relational approach. In Advances in Knowledge Discovery and Data Mining. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Cheng, J. Yu, B. Ding, P. Yu, and H. Wang. Fast graph pattern matching. In Proc. ICDE, pages 913--922, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. A. Cook. The complexity of theorem-proving procedures. In Proc. STOC, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. SIGMOD, pages 405--418, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. B. Holder, D. J. Cook, S. Djoko, et al. Substructure discovery in the subdue system. In Proc. of the AAAI Workshop on Knowledge Discovery in Databases, 1994.Google ScholarGoogle Scholar
  6. G. Jiang, H. Chen, and K. Yoshihira. Efficient and scalable algorithms for inferring likely invariants in distributed systems. IEEE TKDE, 19(11), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Jiang, H. Wang, P. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In Proc. ICDE, pages 566--575, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. N. S. Ketkar, L. B. Holder, and D. J. Cook. Subdue: compression-based frequent pattern discovery in graph data. In Proc. OSDM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li. Mining invariants from console logs for system problem detection. In Proc. of ATC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Padmanabhan and S. Chakravarthy. Hdb-subdue: A scalable approach to graph mining. Data Warehousing and Knowledge Discovery, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Spring, R. Mahajan, and D. Wetherall. Measuring isp topologies with rocketfuel. ACM SIGCOMM CCR, 32(4), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow., 5(9):788--799, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Tian, R. C. Mceachin, C. Santos, D. J. States, and J. M. Patel. Saga: a subgraph matching tool for biological graphs. Bioinformatics, 23(2):232--239, Jan. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In Proc. SIGKDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow., 5(9):800--811, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Zhang, S. Li, and J. Yang. Gaddi: distance index based subgraph matching in biological networks. In Proc. EDBT, pages 192--203, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Zhao and J. Han. On graph query optimization in large networks. Proc. VLDB Endow., 3(1-2):340--351, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In Proc. SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Zhou, S. Mapara, Y. Ren, A. Haeberlen, Z. Ives, B. T. Loo, and M. Sherr. Distributed time-aware provenance. In Proc. VLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at Internet-scale. In Proc. SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Zou, L. Chen, and M. T. Özsu. Distancejoin: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    GRADES '13: First International Workshop on Graph Data Management Experiences and Systems
    June 2013
    101 pages
    ISBN:9781450321884
    DOI:10.1145/2484425

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 June 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate29of61submissions,48%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader