ABSTRACT
With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any "time-correlated fields" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.
- S. Alsubaiee. Spatial Indexing in the Era of Social Media. Ph.D. thesis, UC Irvine, 2014.Google Scholar
- S. Alsubaiee et al. AsterixDB: A scalable, open source BDMS. VLDB, 2014. Google ScholarDigital Library
- S. Alsubaiee et al. Storage management in AsterixDB. VLDB, 2014. Google ScholarDigital Library
- R. Grover and M. J. Carey. Data ingestion in AsterixDB. EDBT, 2015.Google Scholar
- C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. The VLDB Journal., 16(4), 2007. Google ScholarDigital Library
- P. Muth et al. Design, implementation, and performance of the LHAM log-structured history data access method. VLDB, 1998. Google ScholarDigital Library
- P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4), 1996. Google ScholarDigital Library
- R. Sears and R. Ramakrishnan. bLSM: a general purpose log structured merge tree. SIGMOD, 2012. Google ScholarDigital Library
Index Terms
- LSM-Based Storage and Indexing: An Old Idea with Timely Benefits
Recommendations
Lightweight Cardinality Estimation in LSM-based Systems
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataData sources, such as social media, mobile apps and IoT sensors, generate billions of records each day. Keeping up with this influx of data while providing useful analytics to the users is a major challenge for today's data-intensive systems. A popular ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Efficient data ingestion and query processing for LSM-based storage systems
In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with only a single ...
Comments