Handling query skew in large indexes: a view based approach

Huang, Weihuang; Yu, Jeffrey Xu; Shang, Zechao

doi:10.1007/s11704-016-5525-3

Handling query skew in large indexes: a view based approach

Research Article
Published: 05 August 2017

Volume 12, pages 146–162, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Weihuang Huang¹,
Jeffrey Xu Yu¹ &
Zechao Shang¹

62 Accesses
Explore all metrics

Abstract

Indexing is one of the most important techniques to facilitate query processing over a multi-dimensional dataset. A commonly used strategy for such indexing is to keep the tree-structured index balanced. This strategy reduces query processing cost in the worst case, and can handle all different queries equally well. In other words, this strategy implies that all queries are uniformly issued, which is partially because the query distribution is not possibly known and will change over time in practice. A key issue we study in this work is whether it is the best to fully rely on a balanced tree-structured index in particular when datasets become larger and larger in the big data era. This means that, when a dataset becomes very large, it becomes unreasonable to assume that all data in any subspace are equally important and are uniformly accessed by all queries at the index level. Given the existence of query skew and the possible changes of query skew, in this paper, we study how to handle such query skew and such query skew changes at the index level without sacrifice of supporting any possible queries in a wellbalanced tree index and without a high overhead. To tackle the issue, we propose index-view at the index level, where an index-view is a short-cut in a balanced tree-structured index to access objects in the subspaces that are more frequently accessed, and propose a new index-view-centric framework for query processing using index-views in a bottom-up manner. We study index-views selection problem in both static and dynamic setting, and we confirm the effectiveness of our approach using large real and synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resource-aware adaptive indexing for in situ visual exploration and analytics

Article 16 April 2022

A distributed B+Tree indexing method for processing range queries over streaming data

Article 07 May 2023

Query Optimization Using Indexation Techniques in Datawarehouse: Survey and Use Cases

References

Guttman A. R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM Special Interest Group on Management of Data. 1984, 47–57
Google Scholar
Finkel R A, Bentley J L. Quad trees: a data structure for retrieval on composite keys. Acta Informatica, 1974, 4(1): 1–9
Article MATH Google Scholar
Bentley J L. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975, 18(9): 509–517
Article MATH Google Scholar
Samet H. Foundations of Multidimensional and Metric Data Structures. San Francisco, CA: Morgan Kaufmann, 2006
MATH Google Scholar
Silva-Filho Y V. Average case analysis of region search in balanced k-d trees. Information Processing Letters, 1979, 8(5): 219–223
Article MathSciNet MATH Google Scholar
Silverstein C, Henzinger M R, Marais H, Moricz M. Analysis of a very large web search engine query log. SIGIR Forum, 1999, 33(1): 6–12
Article Google Scholar
Gonzalez M C, Hidalgo C A, Barabasi A L. Understanding individual human mobility patterns. Nature, 2008, 453(7196): 779–782
Article Google Scholar
Yuan J, Zheng Y, Zhang C Y, Xie W L, Xie X, Sun G Z, Huang Y. Tdrive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2010, 99–108
Google Scholar
Levandoski J J, Sarwat M, Eldawy A, Mokbel M F. Lars: a locationaware recommender system. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 450–461
Google Scholar
Lee R, Wakamiya S, Sumiya K. Discovery of unusual regional social activities using geo-tagged microblogs. World WideWeb, 2011, 14(4): 321–349
Article Google Scholar
Arya S, Mount D M, Netanyahu N S, Silverman R, Wu A Y. An optimal algorithm for approximate nearest neighbor searching. In: Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms. 1994, 573–582
Google Scholar
Roy S B, Chakrabarti K. Location-aware type ahead search on spatial databases: semantics and efficiency. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 361–372
Google Scholar
Friedman J H, Bentley J L, Finkel R A. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 1977, 3(3): 209–226
Article MATH Google Scholar
Papadias D, Shen QM, Tao Y F, Mouratidis K. Group nearest neighbor queries. In: Proceedings of the 20th IEEE International Conference on Data Engineering. 2004, 301–312
Chapter Google Scholar
Felipe I D, Hristidis V, Rishe N. Keyword search on spatial databases. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 656–665
Google Scholar
Cong G, Jensen C S, Wu D M. Efficient retrieval of the top-k most relevant spatial Web objects. The Proceedings of the VLDB Endowment, 2009, 2(1): 337–348
Article Google Scholar
Cao X, Cong G, Jensen C S, Ooi B C. Collective spatial keyword querying. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2011, 373–384
Google Scholar
Li G L, Feng J H, Xu J. Desks: direction-aware spatial keyword search. In: Proceedings of the 28th IEEE International Conference on Data Engineering. 2012, 474–485
Google Scholar
Sheng C, Tao Y F. FIFO indexes for decomposable problems. In: Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2011, 25–35
Google Scholar
Hjaltason G R, Samet H. Distance browsing in spatial databases. ACM Transactions on Database Systems, 1999, 24(2): 265–318
Article Google Scholar
Nemhauser G L, Wolsey L A, Fisher M L. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 1978, 14(1): 265–294
Article MathSciNet MATH Google Scholar
Feige U. A threshold of ln n for approximating set cover. Journal of the ACM, 1998, 45(4): 634–652
Article MathSciNet MATH Google Scholar
Sviridenko M. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 2004, 32(1): 41–43
Article MathSciNet MATH Google Scholar
Berinde R, Cormode G, Indyk P, Strauss M J. Space-optimal heavy hitters with strong error bounds. In: Proceedings of ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems. 2009, 157–166
Google Scholar
Metwally A, Agrawal D, El Abbadi A. Efficient computation of frequent and top-k elements in data streams. In: Proceedings of International Conference on Database Theory. 2005, 398–412
Google Scholar
Cudré-Mauroux P, Wu E, Madden S. Trajstore: an adaptive storage system for very large trajectory data sets. In: Proceedings of the 26th IEEE International Conference on Data Engineering. 2010, 109–120
Google Scholar
Achakeev D, Seeger B, Widmayer P. Sort-based query-adaptive loading of R-trees. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 2080–2084
Google Scholar
Sleator D D, Tarjan R E. Self-adjusting binary search trees. Journal of the ACM, 1985, 32(3): 652–686
Article MathSciNet MATH Google Scholar
Park E, Mount D M. A self-adjusting data structure for multidimensional point sets. In: Proceedings of European Symposium on Algorithms. 2012, 778–789
Google Scholar
Idreos S, Kersten M L, Manegold S. Database cracking. In: Proceedings of Innovative Data Systems Research. 2007, 68–78
Google Scholar
Tzoumas K, Yiu ML, Jensen C S. Workload-aware indexing of continuously moving objects. Proceedings of the VLDB Endowment, 2009, 2(1): 1186–1197
Article Google Scholar

Download references

Acknowledgements

This work was supported by grant of the Research Grants Council of the Hong Kong SAR, China (14209314).

Author information

Authors and Affiliations

Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Weihuang Huang, Jeffrey Xu Yu & Zechao Shang

Authors

Weihuang Huang
View author publications
Search author on:PubMed Google Scholar
Jeffrey Xu Yu
View author publications
Search author on:PubMed Google Scholar
Zechao Shang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Weihuang Huang.

Additional information

Weihuang Huang received her BS and MS degrees from Tsinghua University, China in 2010 and 2013. She is currently a PhD student at the Department of Systems Engineering and Engineering Management in The Chinese University of Hong Kong, China. Her main research interest is indexing.

Jeffrey Xu Yu has held teaching positions at the Institute of Information Sciences and Electronics, University of Tsukuba, Japan, and at the Department of Computer Science, Australian National University, Australia. Currently, he is a professor in the Department of Systems Engineering and Engineering Management, the Chinese University of Hong Kong, China.

Zechao Shang received his PhD degree from The Chinese University of Hong Kong (CUHK), China in 2015. He is currently a postdoctoral fellow at the Department of Systems Engineering and Engineering Management, CUHK. His main research interest is large scale graph data processing system.

Electronic supplementary material

Supplementary material, approximately 264 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, W., Yu, J.X. & Shang, Z. Handling query skew in large indexes: a view based approach. Front. Comput. Sci. 12, 146–162 (2018). https://doi.org/10.1007/s11704-016-5525-3

Download citation

Received: 06 December 2015
Accepted: 24 June 2016
Published: 05 August 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11704-016-5525-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling query skew in large indexes: a view based approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Resource-aware adaptive indexing for in situ visual exploration and analytics

A distributed B+Tree indexing method for processing range queries over streaming data

Query Optimization Using Indexation Techniques in Datawarehouse: Survey and Use Cases

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 264 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now