skip to main content
10.1145/3394885.3431518acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

LiteIndex: Memory-Efficient Schema-Agnostic Indexing for JSON documents in SQLite

Published: 29 January 2021 Publication History

Abstract

SQLite with JSON (JavaScript Object Notation) format is widely adopted for local data storage in mobile applications such as Twitter and Instagram. With more data are generated and stored, it becomes vitally important to efficiently index and search JSON records in SQLite. However, current methods in SQLite either require full text search (that incurs big memory usage and long query latency) or indexing based on expression (that needs to be manually created by specifying search keys). On the other hand, existing JSON automatic indexing techniques, mainly focusing on big data and cloud environments, depend on a colossal tree structure that cannot be applied in memory-constrained mobile devices.
In this paper, we propose a novel schema-agnostic indexing technique called LiteIndex that can automatically index JSON records by extracting keywords from long text and maintaining user-preferred items within a given memory constraint. This is achieved by memory-efficient index organization with light-weight keyword extraction from long text and user-preference-aware reinforcement-learning-based index pruning mechanism. LiteIndex has been implemented in a Android smartphone platform and evaluated with a dataset from Tweet. Experimental results show that LiteIndex can significantly reduce the query latency by up to 18x with less memory usage compared with SQLite with FTS3/FTS4 extensions.

References

[1]
Willyan D Abilhoa et al. 2014. A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 240 (2014), 308--325.
[2]
Piotr Bojanowski et al. 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).
[3]
EGW Hermkens. 2016. ESQLite: A relational database solution for JSON data with applications in mobile computing. Technische Universiteit Eindhoven (2016).
[4]
Facebook Inc. 2020. Instagram (Messenger) platform. https://developers.facebook.com/docs/instagram(messenger-platform). (Accessed on 07/27/2020).
[5]
Twitter Inc. 2020. Archive Team: The Twitter Stream Grab: Free Web: Free Download, Borrow and Streaming: Internet Archive. https://archive.org/details/twitterstream?&sort=-downloads&page=2. (Accessed on 07/27/2020).
[6]
Twitter Inc. 2020. Introduction to Tweet JSON. Retrieved Jul 20, 2020 from https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json
[7]
Armand Joulin et al. 2016. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).
[8]
Luis Marujo et al. 2015. Automatic keyword extraction on twitter. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 637--643.
[9]
Francisco S Melo and M Isabel Ribeiro. 2007. Q-learning with linear function approximation. In International Conference on Computational Learning Theory. Springer, 308--322.
[10]
Tomas Mikolov et al. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).
[11]
Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. New Jersey, USA, 133--142.
[12]
Zhaoyan Shen, Yuanjing Shi, Zili Shao, and Yong Guan. 2018. An efficient LSM-tree-based SQLite-like database engine for mobile devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 9 (2018), 1635--1647.
[13]
Yuanjing Shi, Zhaoyan Shen, and Zili Shao. 2018. Sqlitekv: An efficient lsm-tree-based sqlite-like database engine for mobile devices. In 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 28--33.
[14]
Dharma Shukla et al. 2015. Schema-agnostic indexing with Azure DocumentDB. Proceedings of the Vldb Endowment 8, 12 (2015), 1668--1679.
[15]
SQLite. 2007. SQLite FTS3 and FTS4 Extensions. https://www.sqlite.org/fts3.html. (Accessed on 08/01/2020).
[16]
SQLite. 2015. Indexes On Expressions. https://www.sqlite.org/expridx.html. (Accessed on 08/01/2020).

Cited By

View all
  • (2022)Forensic Analysis of the Bumble Dating App for AndroidForensic Sciences10.3390/forensicsci20100162:1(201-221)Online publication date: 27-Feb-2022
  1. LiteIndex: Memory-Efficient Schema-Agnostic Indexing for JSON documents in SQLite

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference
    January 2021
    930 pages
    ISBN:9781450379991
    DOI:10.1145/3394885
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • The Research Grants Council of the Hong Kong Special Administrative Region, China
    • Direct Grant for Research, The Chinese University of Hong Kong

    Conference

    ASPDAC '21
    Sponsor:

    Acceptance Rates

    ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;
    Overall Acceptance Rate 466 of 1,454 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Forensic Analysis of the Bumble Dating App for AndroidForensic Sciences10.3390/forensicsci20100162:1(201-221)Online publication date: 27-Feb-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media