research-article

LittleTable: A Time-Series Database and Its Uses

Authors:
Sean Rhea

Cisco Meraki, San Francisco, CA, USA

Cisco Meraki, San Francisco, CA, USA
View Profile

,
Eric Wang

Cisco Meraki, San Francisco, CA, USA

Cisco Meraki, San Francisco, CA, USA
View Profile

,
Edmund Wong

Cisco Meraki, San Francisco, CA, USA

Cisco Meraki, San Francisco, CA, USA
View Profile

,
Ethan Atkins

Cisco Meraki, San Francisco, CA, USA

Cisco Meraki, San Francisco, CA, USA
View Profile

,
Nat Storer

Cisco Meraki, San Francisco, CA, USA

Cisco Meraki, San Francisco, CA, USA
View Profile

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataMay 2017Pages 125–138https://doi.org/10.1145/3035918.3056102

Published:09 May 2017Publication History

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 125–138

ABSTRACT

We present LittleTable, a relational database that Cisco Meraki has used since 2008 to store usage statistics, event logs, and other time-series data from our customers' devices.

LittleTable optimizes for time-series data by clustering tables in two dimensions. By partitioning rows by timestamp, it allows quick retrieval of recent measurements without imposing any penalty for retaining older history. By further sorting within each partition by a hierarchically-delineated key, LittleTable allows developers to optimize each table for the specific patterns with which they intend to access it.

LittleTable further optimizes for time-series data by capitalizing on the reduced consistency and durability needs of our applications, three of which we present here. In particular, our applications are single-writer and append-only. At most one process inserts a given type of data collected from a given device, and applications never update rows written in the past, simplifying both lock management and crash recovery. Our most recently written data is also recoverable, as it can generally be re-read from the devices themselves, allowing LittleTable to safely lose some amount of recently-written data in the event of a crash.

As a result of these optimizations, LittleTable is fast and efficient, even on a single processor and spinning disk. Querying an uncached table of 128-byte rows, it returns the first matching row in 31 ms, and it returns 500,000 rows/second thereafter, approximately 50% of the throughput of the disk itself. Today Meraki stores 320 TB of data across several hundred LittleTable servers system-wide.

References

LevelDB. http://leveldb.org/.Google Scholar
LZO real-time data compression library. http://www.oberhumer.com/opensource/lzo/.Google Scholar
Round robin database tool. http://oss.oetiker.ch/rrdtool/.Google Scholar
The virtual table mechanism of SQLite. https://sqlite.org/vtab.html.Google Scholar
Welcome to Apache HBase. https://hbase.apache.org/.Google Scholar
J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of CIDR, 2011.Google Scholar
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: Facebook rights distributed data store for the social graph. In Proceedings of USENIX ATC, 2013. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Proceedings of OSDI, 2006. Google ScholarDigital Library
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. In Proceedings of OSDI, 2012. Google ScholarDigital Library
P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Proceedings of the International Conference on Analysis of Algorithms, 2007.Google ScholarCross Ref
B. Hegerfors. Date-tiered compaction in Apache Cassandra. https://labs.spotify.com/2014/12/18/date-tiered-compaction/, Dec. 2014.Google Scholar
C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. The VLDB Journal, 16(4):417--437, Oct. 2007. Google ScholarDigital Library
C. Kolovson and M. Stonebraker. Indexing techniques for historical databases. In Proceedings of ICDE, 1989. Google ScholarDigital Library
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, 2010. Google ScholarDigital Library
D. Lomet and B. Salzberg. Access methods for multiversion data. In Proceedings of SIGMOD, 1989. Google ScholarDigital Library
D. Lomet and B. Salzberg. The performance of a multiversion access method. In Proceedings of SIGMOD, 1990. Google ScholarDigital Library
Y. Matsunobu. MyRocks: A space- and write-optimized MySQL database. https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write-optimized-mysql-database/, Aug. 2016.Google Scholar
P. Muth, P. O'Neil, A. Pick, and G. Weikum. The LHAM log-structured history data access method. The VLDB Journal, 8(3--4):199--221, 2000. Google ScholarDigital Library
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4):351--385, 1996. Google ScholarDigital Library
S. Papadopoulos, K. Datta, S. Madden, and T. Mattson. The TileDB array data storage manager. In Proceedings of VLDB, 2017.Google Scholar
M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst., 10(1):26--52, 1992. Google ScholarDigital Library
R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In Proceedings of SIGMOD, 2012. Google ScholarDigital Library
M. Seltzer, K. A. Smith, H. Balakrishnan, J. Chang, S. McMains, and V. Padmanabhan. File system logging versus clustering: A performance comparison. In Proceedings of the USENIX Technical Conference, 1995. Google ScholarDigital Library
W. Tan, S. Tata, Y. Tang, and L. Fong. Diff-index: Differentiated index in distributed log-structured data stores. In Proceedings of EDBT, 2014.Google Scholar
H. T. Vo, S. Wang, D. Agrawal, G. Chen, and B. C. Ooi. LogBase: A scalable log-structured database system in the cloud. Proceedings of the VLDB Endowment, 5(10):1004--1015, 2012. Google ScholarDigital Library
T. Wolpe. MongoDB CTO: How our new WiredTiger storage engine will earn its stripes. http://www.zdnet.com/article/ mongodb-cto-how-our-new-wiredtiger-storage-engine-will-earn-its-stripes/, Nov. 2014.Google Scholar

Index Terms

Recommendations

Modular data storage with Anvil
SOSP '09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles

Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data -- for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines,...
Read More
An Efficient NoSQL-Based Storage Schema for Large-Scale Time Series Data

In IoT (internet of things), most data from the connected devices change with time and have sampling intervals, which are called time-series data. It is challenging to design a time series storage model that can write massive time-series data in a short ...
Read More
Optimization of RocksDB for Redis on Flash
ICCDA '17: Proceedings of the International Conference on Compute and Data Analysis

RocksDB is a popular key-value store, optimized for fast storage. With Solid-State Drives (SSDs) becoming prevalent, RocksDB gained widespread adoption and is now common in production settings. Specifically, various software stacks embed RocksDB as a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cloud computing
clustering
databases
internet of things
partitioning
time-series data
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 2,740
  Total Downloads
- Downloads (Last 12 months)111
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LittleTable: A Time-Series Database and Its Uses

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modular data storage with Anvil

An Efficient NoSQL-Based Storage Schema for Large-Scale Time Series Data

Optimization of RocksDB for Redis on Flash