New and improved search algorithms and precise analysis of their average-case complexity

doi:10.1016/j.future.2019.01.043

Future Generation Computer Systems

Volume 95, June 2019, Pages 743-753

https://doi.org/10.1016/j.future.2019.01.043 Get rights and content

Highlights

•
We propose improved ternary search (ITS) algorithm.
•
We also propose a new Binary–Quaternary Search (BQS) algorithm.
•
We discuss weak and correct implementations of the binary search (BS) algorithm.
•
We calculate average number of comparisons for weak and correct implementations of the BS algorithm precisely.
•
We calculate average number of comparisons for the ITS and BQS algorithms precisely.

Abstract

In this paper, we consider the searching problem over ordered sequences. It is well known that Binary Search (BS) algorithm solves this problem with very efficient complexity, namely with the complexity $θ ({log}_{2} n)$ . The developments of the BS algorithm, such as Ternary Search (TS) algorithm do not improve the efficiency. The rapid increase in the amount of data has made the search problem more important than in the past. And this made it important to reduce average number of comparisons in cases where the asymptotic improvement is not achieved. In this paper, we identify and analyze an implementation issue of BS. Depending on the location of the conditional operators, we classify two different implementations for BS which are widely used in the literature. We call these two implementations weak and correct implementations. We calculate precise number of comparisons in average case for both implementations. Moreover, we transform the TS algorithm into an improved ternary search (ITS) algorithm. We also propose a new Binary–Quaternary Search (BQS) algorithm by using a novel dividing strategy. We prove that an average number of comparisons for both presented algorithms ITS and BQS is less than for the case of correct implementation of the BS algorithm. We also provide the experimental results.

Introduction

Searching and sorting problems are classical problems of computer science. Due to excessive increase in the amount of data in recent years, these problems keep attracting the attention of researchers. In our previous work [1], we have made a short summary of the related works about sorting algorithms published recently [2], [3], [4], [5], [6], [7], [8], [9]. The study [10] conducted after our publication proposes two novel sorting algorithms, called as Brownian Motus insertion sort and Clustered Binary Insertion Sort. Both algorithms are based on the concept of classical Insertion Sort. Marszałek [11] describes how to use the parallelization of the sorting processes for the modified method of sorting by merging for large data sets.

Besides of these studies Woźniak et al. [12] modify Merge Sort algorithm for large scale data sets. Marszałek [13] proposes a new recursive version of fast sort algorithm for large data sets. Woźniak et al. [14] examine quick sort algorithm in two versions for large data sets. Dymora et al. [15] calculate the rate of existence of long-term correlations in processing dynamics of the quicksort algorithm basing on Hurst coefficient. Napoli et al. [16] propose the idea of applying the simplified firefly algorithm to search for key-areas in 2D images. Woźniak and Marszałek [17] use classic firefly algorithm to search for special areas in test images. Das and Khilar [18] propose a Randomized Searching Algorithm and compare its performance with the Binary Search and Linear Search Algorithms. They show that the performance of the algorithm lies between Binary Search and Linear Search. Ambainis et al. [19] study the classic binary search problem, with a delay between query and answer. They give upper and lower bounds of the matching depending on the number of queries for the constant delays. Finocchi and Italiano [20] investigate the design and analysis of the sorting and searching algorithms resilient to memory faults. Chadha et al. [21] propose a modification to the binary search algorithm in which it checks the presence of the input element at each iteration. Rahim et al. [22] provide the experimental comparison the linear, binary and interpolation search algorithms by testing to search data with different length with pseudo process approach. Kumar [23] proposes a new quadratic search algorithm based on binary search algorithm and he experimentally shows that this algorithm better than binary search algorithm.

Carmo et al. [24] consider the problem of searching for a given element in a partially ordered set. Bonasera et al. [25] propose an adaptive search algorithm over ordered sets. Proposed by Mohammed et al. [26] hybrid search algorithm on ordered data sets is similar to the adaptive search algorithm. Bender et al. [27] develop a library sort algorithm, which is developed based on insertion sort and binary search (BS) algorithm.

It is well known that BS algorithm is one of the widely used algorithms in computer applications due to obtaining a good performance for different data types and key distributions. It works on the principle of the divide-and-conquer approach [28]. This algorithm is used in solving several problems. For instance, Gao et al. [29] propose a scheduling algorithm for ridesharing using binary search strategy. Hatamlou [30] presents a binary search algorithm for data clustering. BS is a simple and understandable algorithm, although it may contain some tricks in implementation. Donald Knuth emphasized: “Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky” [31]. Most of the implementation issues in the binary search were described in the literature. Pattis [32] notes five implementation errors. The study [33] involves a program to compute the semi-sum of two integers. In turn, this approach solves the problem of overflow that happens in binary search for very large arrays. Bentley discusses some errors in the implementation of the binary search in the section titled the challenge of binary search [34].

In this paper, we discuss two different implementations of the BS algorithm, which we call as weak and correct implementations. We calculate an average number of comparisons for both implementations precisely. We discuss the TS algorithm which is known as slower than BS, and then we present an improved ternary search (ITS) algorithm which is faster than the correct implementation of the BS algorithm. We prove this fact by calculating an average number of comparisons for ITS algorithm precisely. Moreover, we offer a new searching algorithm called as Binary–Quaternary Search (BQS) algorithm. We calculate an average number of comparisons for the BQS algorithm and we show that this algorithm is better than the correct implementation of BS algorithm. Theoretically, BQS slightly shows more average comparisons number compared with presented ITS algorithm.

The rest of the paper is organized as follows: In Section 2 we discuss the weak and correct implementation of the BS algorithm. In this section we also calculate average number of the comparisons for weak and correct implementation of the BS algorithm. In Section 3 we discuss the TS algorithm. In Section 4 we propose ITS algorithm and we calculate average number of comparisons for this algorithm. In Section 5 we develop a new searching algorithm BQS and we find precisely average number of comparisons for BQS algorithm. In Section 6 we compare the implementations of the ITS and BQS algorithms. In Section 7 we demonstrate experimental results and comparison of these searching algorithms. Finally, we summarize our results in Section 8.

Section snippets

Binary search and its two different implementations

In this section, we discuss the weak and correct implementation of the BS algorithm. We also calculate average number of comparisons for both implementations. We take the correct implementation from the book [28]. The weak implementation we meet in many works, for example, see [35], [36], [37]. Table 1 contains the correct and the weak implementation that is used in this study. Difference between these two implementations occurs when the first “if” statement is made to search for the desired

The ternary search algorithm

The ternary search is presented as an alternative to the binary search. This algorithm provides less number of iterations compared to binary search however it has a higher number of comparisons per a single iteration. In this section we explain this circumstance in detailed.

In literature, there are several studies presented for ternary search such as the analysis study in [41], the following pseudo-code (Algorithm 1) which is presented in [39] as a ternary search. In regard, there is a similar

Proposed improved ternary search (ITS) algorithm

The following pseudo-code (Algorithm 2) is the improved ternary search. This algorithm divides the length of the given array by three. Then it calculates the left cut index (Lci) and the right cut index(Rci). This method approximately divides the array into three equal parts. If the required key $X$ is less than the key which is located at the Lci, the left third of the array will be contained $X$ . Correspondingly, If $X$ is greater than the key located at Rci, the right third of the array will be

The proposed binary–quaternary search algorithm

The proposed Binary–Quaternary search (BQS) is similar to ITS regarding the implementation. The main difference that BQS divides the length of the given array over four instead of three in ITS. Consequently, the behavior of the algorithm changed. Fig. 4 shows the behavior of dividing technique in BQS.

When the required key $X$ residents in the left quarter ( $X \leq a r r a y [L c i]$ ), BQS sets (right $=$ Lci) which excludes 75% of the length of the array for the next iteration. Likewise, when $X$ residents in the

Implementation of ITS and BQS algorithms

The compiler used in the experimental work was configured to optimize the source code by default. However, most compilers optimize the division operation into a multiplication operation since the CPU consumes less time compared to the division operation. Furthermore, compilers optimize division or multiplication into shift operations when possible because the shift operation is much faster than the division and multiplication operations.

The C++ line in the ternary search “Third $=$

Experimental results and comparisons

The experimental environment of this study is the same software and hardware configuration those were used in [26]. The experimental test has been done on empirical data that generated randomly using a C $+ +$ library [42]. Two types of generated data are used, a numeric array of 8-byte number (double) and text array of 100 characters’ key length. The cost of a comparison process obviously effects on the performance of the algorithms under check. This cost is influenced by data type and hardware

Conclusion

We examined the binary search algorithm in terms of comparisons. For BS we identified two implementations: weak and correct implementations. Our study explained that the correct implementation is faster than the weak implementation of BS.

We presented a new efficient improved ternary search algorithm (ITS). ITS has been analyzed and compared theoretically and experimentally with correct binary search. Comparison results showed that the improved algorithm is faster than the correct binary search.

Acknowledgments

We are grateful to the handling editor and anonymous reviewers for their careful reading of the paper and their valuable comments and suggestions.

Şahin Emrah Amrahov received B.Sc. and Ph.D. degrees in applied mathematics in 1984 and 1989, respectively, from Lomonosov Moscow State University, Russia. He works as Professor at Computer Engineering department, Ankara University, Ankara, Turkey. His research interests include the areas of mathematical modeling, algorithms, artificial intelligence, fuzzy sets and systems, optimal control, theory of stability and numerical methods in differential equations. [email protected]

References (42)

MohammedA.S. et al.
Bidirectional conditional insertion sort algorithm; an efficient progress on the classical insertion sort
Future Gener. Comput. Syst.
(2017)
FredmanM.L.
An intuitive and simple bounding argument for Quicksort
Inform. Process. Lett.
(2014)
HadjicostasP. et al.
Recursive merge sort with erroneous comparisons
Discrete Appl. Math.
(2011)
GoelS. et al.
Brownian motus and clustered binary insertion sort methods: an efficient progress over traditional methods
Future Gener. Comput. Syst.
(2018)
CarmoR. et al.
Searching in random partially ordered sets
Theoret. Comput. Sci.
(2004)
BonaseraB. et al.
Adaptive search over sorted sets
J. Discrete Algorithms
(2015)
HatamlouA.
In search of optimal centroids on data clustering using a binary search algorithm
Pattern Recognit. Lett.
(2012)
RuggieriS.
On computing the semi-sum of two integers
Inf. Process. Lett.
(2003)
Abu DalhoumA. et al.
Enhancing QuickSort algorithm using a dynamic pivot selection technique
Wulfenia
(2012)
FuchsM.
A note on the quicksort asymptotics
Random Structures Algorithms
(2015)

GrabowskiF. et al.

Dynamic behavior of simple insertion sort algorithm

Fund. Inform.

(2006)

NebelM.E. et al.

Analysis of pivot sampling in dual-pivot Quicksort: a holistic analysis of Yaroslavskiy’s partitioning scheme

Algorithmica

(2016)

NeiningerR.

Refined Quicksort asymptotics

Random Structures Algorithms

(2015)

WildS. et al.

Average case and distributional analysis of dual-pivot quicksort

ACM Trans. Algorithms

(2015)

MarszałekZ.

Parallelization of modified merge sort algorithm

Symmetry

(2017)

WoźniakM. et al.

MarszałekZ.

WoźniakM. et al.

DymoraP. et al.

Long-range dependencies in quick-sort algorithm

Prz. Elektrotech.

(2014)

NapoliC. et al.

Implified firefly algorithm for 2d image key-points search

WoźniakM. et al.

Cited by (5)

A new approach to Mergesort algorithm: Divide smart and conquer
2024, Future Generation Computer Systems
It is well known that comparison-based algorithms cannot run faster than $O (n l o g n)$ . Therefore the running times of Mergesort, Heapsort, and Quicksort algorithms are asymptotically optimal. As well as asymptotic running times, how fast sorting algorithms work in practice has gained importance lately. One algorithm is preferred, if it is slightly more efficient than another, due to the rapid increase in data generation. In this study, we consider the Mergesort algorithm with a different approach. Unlike the Mergesort algorithm, we do not perform unconditional division. Instead, we divide the input array into ascending and descending sub-arrays. Then, we merge the sub-arrays obtained as a result of the division by slightly modifying the classical $M e r g e$ function. As a result of these two operations, we obtain three new sorting algorithms depending on how we get ascending and descending sub-arrays. Among these three algorithms, the third one, which is our main proposed algorithm, is more important both theoretically and practically. In this algorithm, we divide the input array into alternately ascending and descending sub-arrays. Moreover, the terms of the sub-arrays do not have to be consecutive terms of the input array. The asymptotic running time of our proposed algorithm is $O (n l o g n)$ in the worst case and $O (n)$ in the best case. In practice, the proposed algorithm performs better than the classical Mergesort algorithm in arrays with Gaussian and Uniform distributions of various sizes. In an array of four million, it provided a 17% improvement in the Gaussian distribution compared to the classical Mergesort algorithm. In addition, it provided a 29% improvement in the Uniform distribution of the same size. The proposed algorithm performs also much better than traditional sorting algorithms in real data collected from various sources.
Interpolated binary search: An efficient hybrid search algorithm on ordered datasets
2021, Engineering Science and Technology, an International Journal
Citation Excerpt :
In this study, we propose a new searching algorithm IBS. And we use the correct implementation of binary search in [16], to make an experimental comparison between IBS, correct binary search, classical interpolation search, and adaptive search, which is proposed in [17]. Section 6 presents this comparison.
The exponential increase in the rate of data size is much higher than the increase in the speed of the computer, which has given much focus to search algorithms in the research literature. Finding an item in an ordered dataset is an efficient method in the data processing. However, binary and interpolation algorithms are commonly used to search ordered datasets in many applications. In this paper, we propose a hybrid algorithm for searching ordered datasets based on the idea of interpolation and binary search. The proposed algorithm is called Interpolated Binary Search (IBS). It is well known that the performance of traditional interpolation search depends specifically on key distribution, and its performance degrades significantly in non-uniform distributed datasets. Therefore, our proposed algorithm works efficiently on various distribution datasets. In particular, IBS aims to search datasets of unknown distribution or datasets that change dynamically and produce a dynamic distribution. Experimental results show that IBS performs better compared to other algorithms that use a similar approach.
Incremental multi-agent path finding
2021, Future Generation Computer Systems
Existing multi-agent path finding (MAPF) algorithms are offline methods that aim at finding conflict-free paths for more than one agent. In many real-life applications it is possible that a multi-agent plan cannot be fully executed due to some changes in the environment (represented as a graph), or in missions in which the agents are involved. Even in the case of a minor change, the offline planning algorithm must be re-started from scratch to generate a new plan, and this often requires a substantial amount of time. Motivated by this real-life requirement, we introduced the Incremental Multi-Agent Path Finding (I-MAPF) problem. Any location (node) in the initial environment (graph) can become unavailable for some time and then become available again. Agents can be informed about these changes before they occur and some agents have to update their plans if they planned to use that location. The Conflict Based Search (CBS) is one of most the successful algorithms in solving MAPF problems. To our best knowledge, there are no currently existing studies that attempt at solving the I-MAPF problem. In this paper, we propose a new method to solve the I-MAPF problem, called CBS-D*-lite. CBS-D*-lite is built upon CBS and avoids re-planning for agents that are not affected by the environmental changes. To achieve this, CBS-D*-lite employs D*-lite, an incremental single-agent path-finding algorithm as the lower-level search method in CBS. We show that the number of time-steps required to solve a problem is generally lower than with regular CBS. Empirically, we show that the CBS-D*-lite provided faster results than regular CBS, and the total cost provided CBS-D*-lite is generally close to the total cost values provided by the regular CBS when there are environmental changes.
Comparative Analysis of Binary and Interpolation Search Algorithms on Integer Data Using C Programming Language
2023, Proceedings of 2023 International Conference on Information Management and Technology, ICIMTech 2023
A new lattice based artificial bee colony algorithm for EEG noise minimization
2023, Journal of the Faculty of Engineering and Architecture of Gazi University

Adnan Saher Mohammed received B.Sc. degree in 1999 in computer engineering technology from College of Technology, Mosul, Iraq. In 2012 he obtained M.Sc. degree in communication and computer network engineering from UNITEN University, Kuala Lampur, Malaysia. He received his Ph.D. degree in computer engineering from graduate school of natural sciences, Ankara Yıldırım Beyazıt University, Ankara, Turkey. His research interests include Computer Network and computer algorithms. [email protected]

Fatih Vehbi Çelebi obtained his B.Sc. degree in electrical and electronics engineering in 1988, M.Sc. degree in electrical and electronics engineering in 1996, and Ph.D. degree in electronics engineering in 2002 from Middle East Technical University, Gaziantep University and Erciyes University respectively. He is currently head of the Computer Engineering department and vice president of Ankara Yıldırım Beyazıt University, Ankara-Turkey. His research interests include Semiconductor Lasers, Automatic Control, Algorithms and Artificial Intelligence. [email protected]

View full text

New and improved search algorithms and precise analysis of their average-case complexity

Highlights

Abstract

Introduction

Section snippets

Binary search and its two different implementations

The ternary search algorithm

Proposed improved ternary search (ITS) algorithm

The proposed binary–quaternary search algorithm

Implementation of ITS and BQS algorithms

Experimental results and comparisons

Conclusion

Acknowledgments

Future Gener. Comput. Syst.

Inform. Process. Lett.

Discrete Appl. Math.

Future Gener. Comput. Syst.

Theoret. Comput. Sci.

J. Discrete Algorithms

Pattern Recognit. Lett.

Inf. Process. Lett.

Enhancing QuickSort algorithm using a dynamic pivot selection technique

Wulfenia

A note on the quicksort asymptotics

Random Structures Algorithms

Dynamic behavior of simple insertion sort algorithm

Fund. Inform.

Analysis of pivot sampling in dual-pivot Quicksort: a holistic analysis of Yaroslavskiy’s partitioning scheme

Algorithmica

Refined Quicksort asymptotics

Random Structures Algorithms

Average case and distributional analysis of dual-pivot quicksort

ACM Trans. Algorithms

Parallelization of modified merge sort algorithm

Symmetry

Long-range dependencies in quick-sort algorithm

Prz. Elektrotech.

Implified firefly algorithm for 2d image key-points search