Sets of binary sequences with small total Hamming distances

doi:10.1016/j.ipl.2018.10.005

Information Processing Letters

Volume 142, February 2019, Pages 27-29

https://doi.org/10.1016/j.ipl.2018.10.005 Get rights and content

Highlights

•
Sets of binary sequences that are close in terms of Hamming distance.
•
Minimum total Hamming distance of sets of binary sequences.
•
Explicit sets of n sequences, for each positive integer n, that are as close as possible.

Abstract

The sum of the Hamming distances between pairs of binary sequences in a set is considered. It is shown that this sum is at least 8 and 48 for sets of four and eight sequences, respectively, and is at least ${(n - 1)}^{2}$ for sets of n sequences where n is not equal to 4 or 8. Sets meeting this minimum are explicitly specified.

Introduction

In order to save power and increase speed of data processing, it is preferable to represent data in the form of sequences that are close together. To quantify closeness, we may consider the number of pairs of sequences at distance 1 from each other [1]. Other criteria for closeness such as the diameter, connectivity, and neighborhood of a hypercube representing the data are also considered [2], [3], [4]. In this paper, we consider a different criterion, namely, the total Hamming distance between all pairs of sequences and specify sets of binary sequences as close as possible under this criterion. Although in practice, data is represented by finite sequences, in order to simplify notation and not restrict the lengths of sequences, we consider infinite sequences. Actually, to construct a set of n binary sequences with minimum total Hamming distance, it suffices to consider sequences of lengths no more than $n - 1$ .

Let $s = (s_{1} s_{2} \dots)$ and $s^{'} = (s_{1}^{'} s_{2}^{'} \dots)$ be binary sequences. The Hamming distance, $d (s, s^{'})$ , between s and $s^{'}$ is the number of positions i for which $s_{i} \neq s_{i}^{'}$ [3]. Henceforth we refer to the Hamming distance simply as distance. Since s and $s^{'}$ are binary sequences, $d (s, s^{'}) = \sum_{i} | s_{i} - s_{i}^{'} |$ .

Let $S$ be a finite set of binary sequences. We define the total distance of $S$ , $d (S)$ , to be the sum of the distances between pairs of sequences in $S$ , i.e., $d (S) = \sum_{{s, s^{'}} \subseteq S} d (s, s^{'}) .$

Given a positive integer n, we are interested in finding the minimum possible total distance, denoted by $d_{\min} (n)$ , among all sets of n binary sequences and sets that achieve this minimum. For each $n \geq 1$ , we define a set of n binary sequences, $S_{n}^{\circ}$ , which plays an important role in our investigation. This set consists of the all-0's sequence and the $n - 1$ sequences, each having a single 1 and this 1 is in one of the first $n - 1$ positions.

The main result is stated in the next section which gives an expression for $d_{\min} (n)$ and sets that achieve this minimum total distance. The proof is provided in Section 3.

Section snippets

Result

Our main result is the following theorem.

Theorem 1

For $n = 4$ , $d_{\min} (4) = 8$ which is achieved by the set composed of the four sequences $(000 \dots)$ , $(100 \dots)$ , $(010 \dots)$ , and $(110 \dots)$ , where the dots stand for 0's. For $n = 8$ , $d_{\min} (8) = 48$ which is achieved by the set composed of the eight sequences $(0000 \dots)$ , $(1000 \dots)$ , $(0100 \dots)$ , $(0010 \dots)$ , $(1100 \dots)$ , $(1010 \dots)$ , $(0110 \dots)$ , and $(1110 \dots)$ . For all other values of $n \geq 1$ , $d_{\min} (n) = {(n - 1)}^{2}$ which is achieved by $S_{n}^{\circ}$ .

Proof of Theorem 1

Interestingly, as shown next, the total distance of a set $S$ of binary sequences can be determined easily from the number of sequences and their sum over the real numbers. Define the set sum of $S$ to be $σ (S) = \sum_{s \in S} s$ .

Lemma 1

Let $S$ be a set of n binary sequences with set sum $σ (S) = (σ_{1} σ_{2} \dots)$ . Then, the total distance of $S$ is given by $d (S) = \sum_{i} σ_{i} (n - σ_{i}) .$

Proof

From the definitions of the distance between two sequences and the total distance of a set, we have $\begin{matrix} d (S) & = & \sum_{{(s_{1} s_{2} \dots), (s_{1}^{'} s_{2}^{'} \dots)} \subseteq S} \sum_{i} | s_{i} - s_{i}^{'} | \\ = & \sum_{i} \sum_{{(s_{1} s_{2} \dots), (s_{1}^{'} s_{2}^{'} \dots)} \subseteq S} | s_{i} - s_{i} \end{matrix}$

References (4)

K.A.S. Abdel-Ghaffar
Maximum number of edges joining vertices on a cube
Inf. Process. Lett.
(2003)
X. Tan et al.
A note about some properties of BC graphs
Inf. Process. Lett.
(2008)

There are more references available in the full text version of this article.

Cited by (0)

View full text