Using advanced data structures to enable responsive security monitoring
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Stony Brook Univ., NY (United States)
- Rutgers Univ., New Brunswick, NJ (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Williams College, Williamstown, MA (United States)
Write-optimized data structures (WODS), offer the potential to keep up with cyberstream event rates and give sub-second query response for key items like IP addresses. These data structures organize logs as the events are observed. To work in a real-world environment and not fill up the disk, WODS must efficiently expire older events. As the basis for our research into organizing security monitoring data, we implemented a tool, called Diventi, to index IP addresses in connection logs using RocksDB (a write-optimized LSM tree). In this work, we extended Diventi to automatically expire data as part of the data structures’ normal operations. We guarantee that Diventi always tracks the N most recent events and tracks no more than N + k events for a parameter k < N, while ensuring the index is opportunistically pruned. To test Diventi at scale in a controlled environment, we used anonymized traces of IP communications collected at SuperComputing 2019. We synthetically extended the 2.4 billion connection events to 100 billion events. We tested Diventi vs. Elasticsearch, a common log indexing tool. In our test environment, Elasticsearch saw an ingestion rate of at best 37,000 events/s while Diventi sustained ingestion rates greater than 171,000 events/s. Our query response times were as much as 100 times faster, typically answering queries in under 80 ms. Furthermore, we saw no noticeable degradation in Diventi from expiration. We have deployed Diventi for many months where it has performed well and supported new security analysis capabilities.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program; National Science Foundation (NSF)
- Grant/Contract Number:
- NA0003525; CCF-2118832; CCF-2106827; CCF-1725543; CSR-1763680; CCF-1716252; CNS-1938709
- OSTI ID:
- 1883172
- Report Number(s):
- SAND2021-15479J; 702298
- Journal Information:
- Cluster Computing, Vol. 25, Issue 4; ISSN 1386-7857
- Publisher:
- SpringerCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Cache-oblivious streaming B-trees
|
conference | January 2007 |
Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoffs
|
conference | January 2010 |
The log-structured merge-tree (LSM-tree)
|
journal | June 1996 |
Bro: a system for detecting network intruders in real-time
|
journal | December 1999 |
Timely Reporting of Heavy Hitters using External Memory
|
conference | May 2020 |
Lethe: A Tunable Delete-Aware LSM Engine
|
conference | May 2020 |
Space/time trade-offs in hash coding with allowable errors
|
journal | July 1970 |
Cassandra: a decentralized structured storage system
|
journal | April 2010 |
MyRocks
|
journal | August 2020 |
Bigtable: A Distributed Storage System for Structured Data
|
journal | June 2008 |
Similar Records
Flexible visualization of a 3rd party Intrusion Prevention (Security) tool: A use case with the ELK stack
Final Report: Efficient Databases for MPC Microdata