New and improved search algorithms and precise analysis of their average-case complexity

https://doi.org/10.1016/j.future.2019.01.043Get rights and content

Highlights

  • We propose improved ternary search (ITS) algorithm.

  • We also propose a new Binary–Quaternary Search (BQS) algorithm.

  • We discuss weak and correct implementations of the binary search (BS) algorithm.

  • We calculate average number of comparisons for weak and correct implementations of the BS algorithm precisely.

  • We calculate average number of comparisons for the ITS and BQS algorithms precisely.

Abstract

In this paper, we consider the searching problem over ordered sequences. It is well known that Binary Search (BS) algorithm solves this problem with very efficient complexity, namely with the complexity θ(log2n). The developments of the BS algorithm, such as Ternary Search (TS) algorithm do not improve the efficiency. The rapid increase in the amount of data has made the search problem more important than in the past. And this made it important to reduce average number of comparisons in cases where the asymptotic improvement is not achieved. In this paper, we identify and analyze an implementation issue of BS. Depending on the location of the conditional operators, we classify two different implementations for BS which are widely used in the literature. We call these two implementations weak and correct implementations. We calculate precise number of comparisons in average case for both implementations. Moreover, we transform the TS algorithm into an improved ternary search (ITS) algorithm. We also propose a new Binary–Quaternary Search (BQS) algorithm by using a novel dividing strategy. We prove that an average number of comparisons for both presented algorithms ITS and BQS is less than for the case of correct implementation of the BS algorithm. We also provide the experimental results.

Introduction

Searching and sorting problems are classical problems of computer science. Due to excessive increase in the amount of data in recent years, these problems keep attracting the attention of researchers. In our previous work [1], we have made a short summary of the related works about sorting algorithms published recently [2], [3], [4], [5], [6], [7], [8], [9]. The study [10] conducted after our publication proposes two novel sorting algorithms, called as Brownian Motus insertion sort and Clustered Binary Insertion Sort. Both algorithms are based on the concept of classical Insertion Sort. Marszałek [11] describes how to use the parallelization of the sorting processes for the modified method of sorting by merging for large data sets.

Besides of these studies Woźniak et al. [12] modify Merge Sort algorithm for large scale data sets. Marszałek [13] proposes a new recursive version of fast sort algorithm for large data sets. Woźniak et al. [14] examine quick sort algorithm in two versions for large data sets. Dymora et al. [15] calculate the rate of existence of long-term correlations in processing dynamics of the quicksort algorithm basing on Hurst coefficient. Napoli et al. [16] propose the idea of applying the simplified firefly algorithm to search for key-areas in 2D images. Woźniak and Marszałek [17] use classic firefly algorithm to search for special areas in test images. Das and Khilar [18] propose a Randomized Searching Algorithm and compare its performance with the Binary Search and Linear Search Algorithms. They show that the performance of the algorithm lies between Binary Search and Linear Search. Ambainis et al. [19] study the classic binary search problem, with a delay between query and answer. They give upper and lower bounds of the matching depending on the number of queries for the constant delays. Finocchi and Italiano [20] investigate the design and analysis of the sorting and searching algorithms resilient to memory faults. Chadha et al. [21] propose a modification to the binary search algorithm in which it checks the presence of the input element at each iteration. Rahim et al. [22] provide the experimental comparison the linear, binary and interpolation search algorithms by testing to search data with different length with pseudo process approach. Kumar [23] proposes a new quadratic search algorithm based on binary search algorithm and he experimentally shows that this algorithm better than binary search algorithm.

Carmo et al. [24] consider the problem of searching for a given element in a partially ordered set. Bonasera et al. [25] propose an adaptive search algorithm over ordered sets. Proposed by Mohammed et al. [26] hybrid search algorithm on ordered data sets is similar to the adaptive search algorithm. Bender et al. [27] develop a library sort algorithm, which is developed based on insertion sort and binary search (BS) algorithm.

It is well known that BS algorithm is one of the widely used algorithms in computer applications due to obtaining a good performance for different data types and key distributions. It works on the principle of the divide-and-conquer approach [28]. This algorithm is used in solving several problems. For instance, Gao et al. [29] propose a scheduling algorithm for ridesharing using binary search strategy. Hatamlou [30] presents a binary search algorithm for data clustering. BS is a simple and understandable algorithm, although it may contain some tricks in implementation. Donald Knuth emphasized: “Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky” [31]. Most of the implementation issues in the binary search were described in the literature. Pattis [32] notes five implementation errors. The study [33] involves a program to compute the semi-sum of two integers. In turn, this approach solves the problem of overflow that happens in binary search for very large arrays. Bentley discusses some errors in the implementation of the binary search in the section titled the challenge of binary search [34].

In this paper, we discuss two different implementations of the BS algorithm, which we call as weak and correct implementations. We calculate an average number of comparisons for both implementations precisely. We discuss the TS algorithm which is known as slower than BS, and then we present an improved ternary search (ITS) algorithm which is faster than the correct implementation of the BS algorithm. We prove this fact by calculating an average number of comparisons for ITS algorithm precisely. Moreover, we offer a new searching algorithm called as Binary–Quaternary Search (BQS) algorithm. We calculate an average number of comparisons for the BQS algorithm and we show that this algorithm is better than the correct implementation of BS algorithm. Theoretically, BQS slightly shows more average comparisons number compared with presented ITS algorithm.

The rest of the paper is organized as follows: In Section 2 we discuss the weak and correct implementation of the BS algorithm. In this section we also calculate average number of the comparisons for weak and correct implementation of the BS algorithm. In Section 3 we discuss the TS algorithm. In Section 4 we propose ITS algorithm and we calculate average number of comparisons for this algorithm. In Section 5 we develop a new searching algorithm BQS and we find precisely average number of comparisons for BQS algorithm. In Section 6 we compare the implementations of the ITS and BQS algorithms. In Section 7 we demonstrate experimental results and comparison of these searching algorithms. Finally, we summarize our results in Section 8.

Section snippets

Binary search and its two different implementations

In this section, we discuss the weak and correct implementation of the BS algorithm. We also calculate average number of comparisons for both implementations. We take the correct implementation from the book [28]. The weak implementation we meet in many works, for example, see [35], [36], [37]. Table 1 contains the correct and the weak implementation that is used in this study. Difference between these two implementations occurs when the first “if” statement is made to search for the desired

The ternary search algorithm

The ternary search is presented as an alternative to the binary search. This algorithm provides less number of iterations compared to binary search however it has a higher number of comparisons per a single iteration. In this section we explain this circumstance in detailed.

In literature, there are several studies presented for ternary search such as the analysis study in [41], the following pseudo-code (Algorithm 1) which is presented in [39] as a ternary search. In regard, there is a similar

Proposed improved ternary search (ITS) algorithm

The following pseudo-code (Algorithm 2) is the improved ternary search. This algorithm divides the length of the given array by three. Then it calculates the left cut index (Lci) and the right cut index(Rci). This method approximately divides the array into three equal parts. If the required key X is less than the key which is located at the Lci, the left third of the array will be contained X. Correspondingly, If X is greater than the key located at Rci, the right third of the array will be

The proposed binary–quaternary search algorithm

The proposed Binary–Quaternary search (BQS) is similar to ITS regarding the implementation. The main difference that BQS divides the length of the given array over four instead of three in ITS. Consequently, the behavior of the algorithm changed. Fig. 4 shows the behavior of dividing technique in BQS.

When the required key X residents in the left quarter (Xarray[Lci]), BQS sets (right = Lci) which excludes 75% of the length of the array for the next iteration. Likewise, when X residents in the

Implementation of ITS and BQS algorithms

The compiler used in the experimental work was configured to optimize the source code by default. However, most compilers optimize the division operation into a multiplication operation since the CPU consumes less time compared to the division operation. Furthermore, compilers optimize division or multiplication into shift operations when possible because the shift operation is much faster than the division and multiplication operations.

The C++ line in the ternary search “Third =

Experimental results and comparisons

The experimental environment of this study is the same software and hardware configuration those were used in [26]. The experimental test has been done on empirical data that generated randomly using a C++ library [42]. Two types of generated data are used, a numeric array of 8-byte number (double) and text array of 100 characters’ key length. The cost of a comparison process obviously effects on the performance of the algorithms under check. This cost is influenced by data type and hardware

Conclusion

We examined the binary search algorithm in terms of comparisons. For BS we identified two implementations: weak and correct implementations. Our study explained that the correct implementation is faster than the weak implementation of BS.

We presented a new efficient improved ternary search algorithm (ITS). ITS has been analyzed and compared theoretically and experimentally with correct binary search. Comparison results showed that the improved algorithm is faster than the correct binary search.

Acknowledgments

We are grateful to the handling editor and anonymous reviewers for their careful reading of the paper and their valuable comments and suggestions.

Şahin Emrah Amrahov received B.Sc. and Ph.D. degrees in applied mathematics in 1984 and 1989, respectively, from Lomonosov Moscow State University, Russia. He works as Professor at Computer Engineering department, Ankara University, Ankara, Turkey. His research interests include the areas of mathematical modeling, algorithms, artificial intelligence, fuzzy sets and systems, optimal control, theory of stability and numerical methods in differential equations. [email protected]

References (42)

  • GrabowskiF. et al.

    Dynamic behavior of simple insertion sort algorithm

    Fund. Inform.

    (2006)
  • NebelM.E. et al.

    Analysis of pivot sampling in dual-pivot Quicksort: a holistic analysis of Yaroslavskiy’s partitioning scheme

    Algorithmica

    (2016)
  • NeiningerR.

    Refined Quicksort asymptotics

    Random Structures Algorithms

    (2015)
  • WildS. et al.

    Average case and distributional analysis of dual-pivot quicksort

    ACM Trans. Algorithms

    (2015)
  • MarszałekZ.

    Parallelization of modified merge sort algorithm

    Symmetry

    (2017)
  • WoźniakM. et al.
  • MarszałekZ.
  • WoźniakM. et al.
  • DymoraP. et al.

    Long-range dependencies in quick-sort algorithm

    Prz. Elektrotech.

    (2014)
  • NapoliC. et al.

    Implified firefly algorithm for 2d image key-points search

  • WoźniakM. et al.
  • Cited by (5)

    • Interpolated binary search: An efficient hybrid search algorithm on ordered datasets

      2021, Engineering Science and Technology, an International Journal
      Citation Excerpt :

      In this study, we propose a new searching algorithm IBS. And we use the correct implementation of binary search in [16], to make an experimental comparison between IBS, correct binary search, classical interpolation search, and adaptive search, which is proposed in [17]. Section 6 presents this comparison.

    • Incremental multi-agent path finding

      2021, Future Generation Computer Systems
    • Comparative Analysis of Binary and Interpolation Search Algorithms on Integer Data Using C Programming Language

      2023, Proceedings of 2023 International Conference on Information Management and Technology, ICIMTech 2023
    • A new lattice based artificial bee colony algorithm for EEG noise minimization

      2023, Journal of the Faculty of Engineering and Architecture of Gazi University

    Şahin Emrah Amrahov received B.Sc. and Ph.D. degrees in applied mathematics in 1984 and 1989, respectively, from Lomonosov Moscow State University, Russia. He works as Professor at Computer Engineering department, Ankara University, Ankara, Turkey. His research interests include the areas of mathematical modeling, algorithms, artificial intelligence, fuzzy sets and systems, optimal control, theory of stability and numerical methods in differential equations. [email protected]

    Adnan Saher Mohammed received B.Sc. degree in 1999 in computer engineering technology from College of Technology, Mosul, Iraq. In 2012 he obtained M.Sc. degree in communication and computer network engineering from UNITEN University, Kuala Lampur, Malaysia. He received his Ph.D. degree in computer engineering from graduate school of natural sciences, Ankara Yıldırım Beyazıt University, Ankara, Turkey. His research interests include Computer Network and computer algorithms. [email protected]

    Fatih Vehbi Çelebi obtained his B.Sc. degree in electrical and electronics engineering in 1988, M.Sc. degree in electrical and electronics engineering in 1996, and Ph.D. degree in electronics engineering in 2002 from Middle East Technical University, Gaziantep University and Erciyes University respectively. He is currently head of the Computer Engineering department and vice president of Ankara Yıldırım Beyazıt University, Ankara-Turkey. His research interests include Semiconductor Lasers, Automatic Control, Algorithms and Artificial Intelligence. [email protected]

    View full text