skip to main content
10.1145/3299869.3300075acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search?

Published: 25 June 2019 Publication History

Abstract

In this paper, we focus on the problem of searching sorted, in-memory datasets. This is a key data operation, and Binary Search is the de facto algorithm that is used in practice. We consider an alternative, namely Interpolation Search, which can take advantage of hardware trends by using complex calculations to save memory accesses. Historically, Interpolation Search was found to underperform compared to other search algorithms in this setting, despite its superior asymptotic complexity. Also, Interpolation Search is known to perform poorly on non-uniform data. To address these issues, we introduce SIP (Slope reuse Interpolation), an optimized implementation of Interpolation Search, and TIP (Three point Interpolation), a new search algorithm that uses linear fractions to interpolate on non-uniform distributions. We evaluate these two algorithms against a similarly optimized Binary Search method using a variety of real and synthetic datasets. We show that SIP is up to 4 times faster on uniformly distributed data and TIP is 2-3 times faster on non-uniformly distributed data in some cases. We also design a meta-algorithm to switch between these different methods to automate picking the higher performing search algorithm, which depends on factors like data distribution.

References

[1]
Boris Aronov, Tetsuo Asano, Naoki Katoh, Kurt Mehlhorn, and Takeshi Tokuyama. 2006. Polyline fitting of planar points under min-sum criteria. International journal of computational geometry & applications, Vol. 16, 02n03 (2006), 97--116.
[2]
Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 37--48.
[3]
Biagio Bonasera, Emilio Ferrara, Giacomo Fiumara, Francesco Pagano, and Alessandro Provetti. 2015. Adaptive search over sorted sets. Journal of Discrete Algorithms, Vol. 30 (2015), 128--133.
[4]
Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the Memory Wall in MonetDB. Commun. ACM, Vol. 51, 12 (Dec. 2008), 77--85.
[5]
Dominik Brodowski and N Golde. 2016. Linux CPUFreq--CPUFreq governors. Linux Kernel.{Online}: http://www. mjmwired. net/kernel/Documentation/cpufreq/governors. txt (2016).
[6]
Carlos Carvalho. 2002. The gap between processor and memory speeds. In Proc. of IEEE International Conference on Control and Automation .
[7]
Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM review, Vol. 51, 4 (2009), 661--703.
[8]
Carl De Boor, Carl De Boor, Etats-Unis Mathématicien, Carl De Boor, and Carl De Boor. 1978. A practical guide to splines . Vol. 27. Springer-Verlag New York.
[9]
Paul M. Dorfman. 1999. Array Lookup Tecnhiques. https://analytics.ncsu.edu/sesug/1999/016.pdf Retrieved 11/01/2018 from
[10]
Christos Faloutsos and HV Jagadish. 1992. On B-tree indices for skewed distributions. (1992).
[11]
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In Proceedings of IEEE INFOCOM '10. San Diego, CA.
[12]
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2011. Practical Recommendations on Crawling Online Social Networks . IEEE JSAC on Measurement of Internet Topologies (2011).
[13]
Gaston H Gonnet and Lawrence D Rogers. 1977. The interpolation-sequential search algorithm. Inform. Process. Lett., Vol. 6, 4 (1977), 136--139.
[14]
Gaston H Gonnet, Lawrence D Rogers, and J Alan George. 1980. An algorithmic and complexity analysis of interpolation search. Acta Informatica, Vol. 13, 1 (1980), 39--52.
[15]
Goetz Graefe. 2006. B-tree indexes, interpolation search, and skew. In Proceedings of the 2nd international workshop on Data management on new hardware. ACM, 5.
[16]
John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach .Elsevier.
[17]
Julian Huxley, Richard E Strauss, and Frederick B Churchill. 1932. Problems of relative growth. (1932).
[18]
Intel. 2017. IACA: Intel® Architecture Code Analyzer. https://software.intel.com/en-us/articles/intel-architecture-code-analyzer Retrieved 11/01/2018 from
[19]
Intel. 2018. Inteltextsuperscript® 64 and IA-32 Architectures Optimization Reference Manual. https://software.intel.com/en-us/articles/intel-sdm Retrieved 11/01/2018 from
[20]
P Jarratt and D Nudds. 1965. The use of rational functions in the iterative solution of equations on a digital computer. Comput. J., Vol. 8, 1 (1965), 62--65.
[21]
Kimberly Keeton. 2017. Memory-Driven Computing. USENIX Association, Santa Clara, CA.
[22]
Paul-Virak Khuong and Pat Morin. 2017. Array Layouts for Comparison-Based Searching. J. Exp. Algorithmics, Vol. 22, Article 1.3 (May 2017), bibinfonumpages39 pages.
[23]
Donald Ervin Knuth. 1997. The art of computer programming . Vol. 3. Pearson Education.
[24]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. ACM, 489--504.
[25]
LevelDB. 2018. http://leveldb.org/ Retrieved 11/01/2018 from
[26]
Wentian Li. 2002. Zipf's Law everywhere. Glottometrics, Vol. 5 (2002), 14--21.
[27]
J. McCalpin. Invited talk at SC16, 2016. Memory Bandwidth and System Balance in HPC Systems. https://tinyurl.com/yanlv29r Retrieved 11/01/2018 from
[28]
JM McNamee and VY Pan. 2013. Bisection and Interpolation Methods. In Studies in Computational Mathematics . Vol. 16. Elsevier, 1--138.
[29]
Numpy. 2018. Sorted Search. https://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html#numpy.searchsorted Retrieved 11/01/2018 from
[30]
Pandas. 2018. Sorted Search. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.searchsorted.html Retrieved 11/01/2018 from
[31]
Yehoshua Perl, Alon Itai, and Haim Avni. 1978. Interpolation search ­ a log log N search. Commun. ACM, Vol. 21, 7 (1978), 550--553.
[32]
Viswanath Poosala, Peter J Haas, Yannis E Ioannidis, and Eugene J Shekita. 1996. Improved histograms for selectivity estimation of range predicates. In ACM Sigmod Record, Vol. 25. ACM, 294--305.
[33]
David MW Powers. 1998. Applications and explanations of Zipf's law. In Proceedings of the joint conferences on new methods in language processing and computational natural language learning . Association for Computational Linguistics, 151--160.
[34]
William H Press, Saul A Teukolsky, William T Vetterling, and Brian P Flannery. 1996. Numerical recipes in Fortran 90 . Vol. 2. Cambridge university press Cambridge.
[35]
CE Price. 1971. Table lookup techniques. ACM Computing Surveys (CSUR), Vol. 3, 2 (1971), 49--64.
[36]
C Ridders. 1979. Three-point iterations derived from exponential curve fitting. IEEE Transactions on Circuits and Systems, Vol. 26, 8 (1979), 669--670.
[37]
Nicola Santoro and Jeffrey B Sidney. 1985. Interpolation-binary search. Information processing letters, Vol. 20, 4 (1985), 179--181.
[38]
Wm. A. Wulf and Sally A. McKee. 1995. Hitting the Memory Wall: Implications of the Obvious. SIGARCH Comput. Archit. News, Vol. 23, 1 (March 1995), 20--24.
[39]
Andrew C Yao and F Frances Yao. 1976. The complexity of searching an ordered random table. In Foundations of Computer Science, 1976., 17th Annual Symposium on. IEEE, 173--177.
[40]
Peifeng Yin, Ping Luo, Wang-Chien Lee, and Min Wang. 2013. Silence is also evidence: interpreting dwell time for recommendation from psychological perspective. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 989--997.

Cited By

View all
  • (2025)HLN-Tree: A memory-efficient B+-Tree with huge leaf nodes and locality predictorsACM Transactions on Storage10.1145/370764121:2(1-27)Online publication date: 6-Jan-2025
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: 2-Apr-2024
  • Show More Cited By

Index Terms

  1. Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search?

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
      June 2019
      2106 pages
      ISBN:9781450356435
      DOI:10.1145/3299869
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 June 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. binary search
      2. in-memory search
      3. interpolation search

      Qualifiers

      • Research-article

      Funding Sources

      • Gift donation by Huawei
      • Gift donation by Google
      • This work was supported in part by CRISP one of six centers in JUMP a Semiconductor Research Corporation (SRC) program sponsor

      Conference

      SIGMOD/PODS '19
      Sponsor:
      SIGMOD/PODS '19: International Conference on Management of Data
      June 30 - July 5, 2019
      Amsterdam, Netherlands

      Acceptance Rates

      SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)87
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)HLN-Tree: A memory-efficient B+-Tree with huge leaf nodes and locality predictorsACM Transactions on Storage10.1145/370764121:2(1-27)Online publication date: 6-Jan-2025
      • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
      • (2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: 2-Apr-2024
      • (2024)When Learned Indexes Meet Persistent Memory: The Analysis and the OptimizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334282536:12(9517-9531)Online publication date: Dec-2024
      • (2024)NV-QALSH+: Locality-Sensitive Hashing Optimized for Non-volatile MemoryWeb and Big Data10.1007/978-981-97-2421-5_17(246-260)Online publication date: 12-May-2024
      • (2023)Learned Sorted Table Search and Static Indexes in Small-Space Data ModelsData10.3390/data80300568:3(56)Online publication date: 3-Mar-2023
      • (2023)View-dependent Adaptive HLOD: real-time interactive rendering of multi-resolution modelsProceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production10.1145/3626495.3626507(1-10)Online publication date: 30-Nov-2023
      • (2023)Enabling Timely and Persistent Deletion in LSM-EnginesACM Transactions on Database Systems10.1145/359972448:3(1-40)Online publication date: 9-Aug-2023
      • (2023)When Tree Meets Hash: Reducing Random Reads for Index Structures on Persistent MemoriesProceedings of the ACM on Management of Data10.1145/35889591:1(1-26)Online publication date: 30-May-2023
      • (2023)Comparative Analysis of Binary and Interpolation Search Algorithms on Integer Data Using C Programming Language2023 International Conference on Information Management and Technology (ICIMTech)10.1109/ICIMTech59029.2023.10277955(340-345)Online publication date: 24-Aug-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media