research-article

Using substructure mining to identify misbehavior in network provenance graphs

Authors:
David DeBoer

Georgetown University

Georgetown University
View Profile

,
Wenchao Zhou

Georgetown University

Georgetown University
View Profile

,
Lisa Singh

Georgetown University

Georgetown University
View Profile

GRADES '13: First International Workshop on Graph Data Management Experiences and SystemsJune 2013Article No.: 6Pages 1–6https://doi.org/10.1145/2484425.2484431

Published:23 June 2013Publication History

GRADES '13: First International Workshop on Graph Data Management Experiences and Systems

Pages 1–6

ABSTRACT

As distributed systems become more ubiquitous and more complex, the need for efficient, scalable tools to analyze these systems increases. Network provenance graphs offer a rich framework for this task, mapping dependencies between system states and allowing one to explain these states. In this paper, we investigate methods for more efficient substructure mining in the context of network provenance graphs. Specifically, we are interested in identifying frequent substructures that can be used as a feature set for modeling common execution patterns. Knowing these will help network administrators detect nodes in the distributed system that are misbehaving. Therefore, this paper focuses on applying and scaling up substructure mining for network provenance graphs by incorporating a graph database (neo4j) into the substructure mining process and implementing optimizations that improve the efficiency of the substructure mining task. Our results show that the use of the neo4j graph database combined with our algorithmic optimizations greatly improves the run time of our algorithm while not significantly affecting the quality of the substructures returned.

References

R. Balachandran, S. Padmanabhan, and S. Chakravarthy. Enhanced db-subdue: Supporting subtle aspects of graph mining using a relational approach. In Advances in Knowledge Discovery and Data Mining. 2006. Google ScholarDigital Library
J. Cheng, J. Yu, B. Ding, P. Yu, and H. Wang. Fast graph pattern matching. In Proc. ICDE, pages 913--922, 2008. Google ScholarDigital Library
S. A. Cook. The complexity of theorem-proving procedures. In Proc. STOC, 1971. Google ScholarDigital Library
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. SIGMOD, pages 405--418, 2008. Google ScholarDigital Library
L. B. Holder, D. J. Cook, S. Djoko, et al. Substructure discovery in the subdue system. In Proc. of the AAAI Workshop on Knowledge Discovery in Databases, 1994.Google Scholar
G. Jiang, H. Chen, and K. Yoshihira. Efficient and scalable algorithms for inferring likely invariants in distributed systems. IEEE TKDE, 19(11), 2007. Google ScholarDigital Library
H. Jiang, H. Wang, P. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In Proc. ICDE, pages 566--575, 2007.Google ScholarCross Ref
N. S. Ketkar, L. B. Holder, and D. J. Cook. Subdue: compression-based frequent pattern discovery in graph data. In Proc. OSDM, 2005. Google ScholarDigital Library
J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li. Mining invariants from console logs for system problem detection. In Proc. of ATC, 2010. Google ScholarDigital Library
S. Padmanabhan and S. Chakravarthy. Hdb-subdue: A scalable approach to graph mining. Data Warehousing and Knowledge Discovery, 2009. Google ScholarDigital Library
N. Spring, R. Mahajan, and D. Wetherall. Measuring isp topologies with rocketfuel. ACM SIGCOMM CCR, 32(4), 2002. Google ScholarDigital Library
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow., 5(9):788--799, May 2012. Google ScholarDigital Library
Y. Tian, R. C. Mceachin, C. Santos, D. J. States, and J. M. Patel. Saga: a subgraph matching tool for biological graphs. Bioinformatics, 23(2):232--239, Jan. 2007. Google ScholarDigital Library
X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In Proc. SIGKDD, 2003. Google ScholarDigital Library
Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow., 5(9):800--811, May 2012. Google ScholarDigital Library
S. Zhang, S. Li, and J. Yang. Gaddi: distance index based subgraph matching in biological networks. In Proc. EDBT, pages 192--203, 2009. Google ScholarDigital Library
P. Zhao and J. Han. On graph query optimization in large networks. Proc. VLDB Endow., 3(1-2):340--351, Sept. 2010. Google ScholarDigital Library
W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In Proc. SOSP, 2011. Google ScholarDigital Library
W. Zhou, S. Mapara, Y. Ren, A. Haeberlen, Z. Ives, B. T. Loo, and M. Sherr. Distributed time-aware provenance. In Proc. VLDB, 2013. Google ScholarDigital Library
W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at Internet-scale. In Proc. SIGMOD, 2010. Google ScholarDigital Library
L. Zou, L. Chen, and M. T. Özsu. Distancejoin: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009. Google ScholarDigital Library

Recommendations

Provenance for data mining
TaPP '13: Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance

Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates ...
Read More
Provenance for data mining
TaPP'13: Proceedings of the 5th USENIX conference on Theory and Practice of Provenance

Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates ...
Read More
Metagraph-Based Substructure Pattern Mining
ICACTE '08: Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering

The need for mining structured data has increased in the past few years. One of the best studied data structures in computer science and discrete mathematics are graphs. Graph based data mining has become quite popular in the last few years. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GRADES '13: First International Workshop on Graph Data Management Experiences and Systems
June 2013
101 pages
ISBN:9781450321884
DOI:10.1145/2484425
Conference Chairs:
Peter Boncz
CWI
,
Thomas Neumann
TU Munich
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate29of61submissions,48%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 114
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using substructure mining to identify misbehavior in network provenance graphs

GRADES '13: First International Workshop on Graph Data Management Experiences and Systems

ABSTRACT

References

Cited By

Recommendations

Provenance for data mining

Provenance for data mining

Metagraph-Based Substructure Pattern Mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using substructure mining to identify misbehavior in network provenance graphs

GRADES '13: First International Workshop on Graph Data Management Experiences and Systems

ABSTRACT

References

Cited By

Recommendations

Provenance for data mining

Provenance for data mining

Metagraph-Based Substructure Pattern Mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media