1 Introduction

Dating back to the 1980s, quantum computing has been shown to be more computationally powerful in solving certain kinds of problems than classical computing. In the past decades, to achieve computational advantages, quantum computing has been brought into the field of machine learning. This gives birth to a new disciplinary research field, quantum machine learning. It is a cross-field of computer science and quantum physics on studying how to learn from training data and make predictions on new data in quantum settings [1,2,3,4,5, 16, 17]. Since its inception, quantum machine learning has become a hot topic that attracting world wide attentions, and a number of efficient quantum algorithms have been proposed for various machine learning tasks [6,7,8,9].

Quantum mechanics is well known to produce atypical patterns in data. Classical machine learning methods such as deep neural networks frequently have the feature that they can both recognize statistical patterns in data and produce data that possess the same statistical patterns: they recognize the patterns that they produce. This observation suggests the following hope. If small quantum information processors can produce statistical patterns that are computationally difficult for a classical computer to produce, then perhaps they can also recognize patterns that are equally difficult to recognize classically. The realization of this hope depends on whether efficient quantum algorithms can be found for machine learning. A quantum algorithm is a set of instructions solving a problem, such as determining whether two graphs are isomorphic, that can be performed on a quantum computer. Quantum machine learning software makes use of quantum algorithms as part of a larger implementation. By analysing the steps that quantum algorithms prescribe, it becomes clear that they have the potential to outperform classical algorithms for specific problems (that is, reduce the number of steps required). This potential is known as quantum speedup. The notion of a quantum speedup depends on whether one takes a formal computer science perspective—which demands mathematical proofs—or a perspective based on what can be done with realistic, finite size devices—which requires solid statistical evidence of a scaling advantage over some finite range of problem sizes [18,19,20,21,22,23,24,25]. For the case of quantum machine learning, the best possible performance of classical algorithms is not always known. This is similar to the case of Shor’s polynomial-time quantum algorithm for integer factorization: no sub-exponential-time classical algorithm has been found, but the possibility is not provably ruled out. Determination of a scaling advantage contrasting quantum and classical machine learning would rely on the existence of a quantum computer and is called a ‘benchmarking’ problem. Such advantages could include improved classification accuracy and sampling of classically inaccessible systems. Accordingly, quantum speedups in machine learning are currently characterized using idealized measures from complexity theory: query complexity and gate complexity Query complexity measures the number of queries to the information source for the classical or quantum algorithm. A quantum speedup results if the number of queries needed to solve a problem is lower for the quantum algorithm than for the classical algorithm. To determine the gate complexity, the number of elementary quantum operations (or gates) required to obtain the desired result are counted [26,27,28,29,30,31].

The most fundamental examples of supervised machine learning algorithms are linear support vector machines (SVM) and perceptions. The task of these methods is to find an optimal separating hyperplane between two classes of data such that all training examples of one class are found only on one side of the hyperplane with high probability. When the margin between the hyperplane and the data are maximized, the most robust classifier could be obtained. The SVM model can be solved in time \( {\mathcal{O}}(\log 1/\varepsilon \,poly(N,M)) \) [10], where N is the dimension of feature space, M is the number of training vectors and \( \varepsilon \) is the accuracy. However, classical computing cannot solve this problem if the handling data sets extend to million level.

Similar to the classical counterpart, the quantum SVM is a paradigmatic example of quantum machine learning [8]. The first quantum SVM algorithm was proposed in the early 2000s, using a variant Grover’s search for function minimization, which can find s support vectors out of N vectors consequently takes \( \sqrt {N/s} \) iterations [11]. Recently, Rebentrost et al. showed that a quantum SVM can be implemented in \( {\mathcal{O}}(poly(1/\varepsilon \log MN)) \) for training and classification. In this paper, we present an improved quantum SVM algorithm with \( {\mathcal{O}}(poly(\varepsilon^{ - 1} \log 1/\varepsilon \log MN)) \) running time. Specifically, we handle the dimension M, N by matrix inverse algorithm, and reduce the dependency of precise \( 1/\varepsilon \) by quantum Fourier transformation [12]. In case when a low-rank approximation is appropriate, our quantum SVM operates on the full training set in logarithmic runtime.

2 Review of SVM

The fundamental mission for the SVM is to classify a vector into one of two classes, given M training data points of the form \( \{ \left( {{\mathbf{x}}_{{\mathbf{j}}} ,y_{j} } \right):{\mathbf{x}}_{{\mathbf{j}}} \in R^{N} ,y_{j} = \pm 1, j = 1, \ldots ,M\} \), where label variable \( y_{j} = 1 \,{\text{or}}\, - 1 \) indicates the class that \( x_{j} \) belongs. For the classification, the SVM finds a maximum-margin hyperplane with normal vector w that divides the data set into two classes. The task of SVM aims to find two parallel hyperplanes whose maximum possible distance is \( {2 \mathord{\left/ {\vphantom {2 {\left\| {\mathbf{w}} \right\|}}} \right. \kern-0pt} {\left\| {\mathbf{w}} \right\|}} \) with each side of the hyperplane \( y_{j} = 1\, \text{and}\, - 1 \). Adjusting the fact which part \( {\mathbf{x}}_{{\mathbf{j}}} \) belongs depends on two constraints s.t. \( {\mathbf{wx}}_{{\mathbf{j}}} + b \ge 1 \) for \( {\text{y}}_{\text{j}} = 1 \) and \( {\mathbf{wx}}_{{\mathbf{j}}} + b \le - 1 \) for \( y_{j} = - 1 \). Therefore, finding the maximum margin hyperplane consists of minimizing \( \left\| {\mathbf{w}} \right\|^{2} /2 \) subject to the inequality constraints \( {\text{y}}_{\text{j}} ({\mathbf{wx}}_{{\mathbf{j}}} + b) \ge 1 \) f for all index j. This is the fundamental formulation of the problem. The dual formulation is maximizing over the Karush-Kuhn-Tucker multipliers \( \alpha = \left( {\alpha_{1} , \ldots ,\alpha_{M} } \right)^{T} \) the function:

$$ L\left( \alpha \right) = \mathop \sum \limits_{j = 1}^{M} y_{j} \alpha_{j} - \frac{1}{2}\mathop \sum \limits_{j,k = 1}^{M} \alpha_{j} K_{j,k} \alpha_{k} $$

The constraints can be expressed as

$$ \sum\nolimits_{j = 1}^{M} {\alpha_{j} = 0} \,\,{\text{and}}\,\,y_{j} \alpha_{j} \ge 0. $$

The hyperplane parameters w, b are decomposed of \( {\mathbf{w}} = \sum\nolimits_{j} {\alpha_{j} {\mathbf{x}}_{{\mathbf{j}}} } \) and \( {\text{b}} = {\text{y}}_{\text{j}} - {\mathbf{wx}}_{{\mathbf{j}}} \) for all the \( j = 1, \ldots ,M \). We then introduce the kernel matrix, a central quantity for supervised machine learning problems, \( {\text{K}}_{j,k} = k({\mathbf{x}}_{{\mathbf{j}}} ,{\mathbf{x}}_{{\mathbf{k}}} ) \). We in this paper present how to prepare the quantum kernel matrix, whose kernel function is inner products \( k\left( {{\mathbf{x}}_{{\mathbf{j}}} ,{\mathbf{x}}_{{\mathbf{k}}} } \right) = {\mathbf{x}}_{{\mathbf{j}}} {\mathbf{x}}_{{\mathbf{k}}} \) or Gaussian distance \( k\left( {{\mathbf{x}}_{{\mathbf{j}}} ,{\mathbf{x}}_{{\mathbf{k}}} } \right) = { \exp }(\left\| {{\mathbf{x}}_{{\mathbf{j}}}^{{\mathbf{2}}} } \right\| + \left\| {{\mathbf{x}}_{{\mathbf{k}}}^{{\mathbf{2}}} } \right\|) \). Solving the kernel matrix approximately takes O(M^3) complexity. As each kernel product takes \( {\mathcal{O}}(N) \) overhead, the classical support vector machine algorithm takes running time \( {\mathcal{O}}(\log 1/\varepsilon \,M^{2} (M + N)) \) with accuracy ε. The classification result can be computed as

$$ y\left( {\mathbf{x}} \right) = sign(\mathop \sum \limits_{j = 1}^{M} \alpha_{j} k\left( {{\mathbf{x}}_{{\mathbf{j}}} ,{\mathbf{x}}} \right) + b) $$

3 Quantum Kernel Matrix

3.1 Construct Quantum Inner-Product Kernel Matrix

In the quantum setting, assume that there exists the mechanism supporting classical data encoding into quantum states:

$$ \left| {{\mathbf{x}}_{{\mathbf{j}}} } \right\rangle = \frac{{\mathbf{1}}}{{\left\| {{\mathbf{x}}_{{\mathbf{j}}} } \right\|}}\mathop \sum \limits_{{\varvec{k} = {\mathbf{1}}}}^{\varvec{N}} \left( {{\mathbf{x}}_{{\mathbf{j}}} } \right)_{\varvec{k}} \left| \varvec{k} \right\rangle $$

Here the notation \( \left( {{\mathbf{x}}_{{\mathbf{j}}} } \right)_{\varvec{k}} \) denotes the k-th components of the vector \( {\mathbf{x}}_{{\mathbf{j}}} \). For the quantum mechanical preparation, we utilize QRAM mechanism obtaining the quantum state

$$ \left|\varvec{\chi}\right\rangle = \frac{{\mathbf{1}}}{{\sqrt {\varvec{N}_{\varvec{\chi}} } }}\mathop \sum \limits_{{\varvec{i} = {\mathbf{1}}}}^{\varvec{M}} \left\| {{\mathbf{x}}_{{\mathbf{i}}} } \right\|\left| \varvec{i} \right\rangle \left| {{\mathbf{x}}_{{\mathbf{i}}} } \right\rangle $$

where normalized factor \( {\mathbf{N}}_{\varvec{\chi}} = \sum\nolimits_{{\varvec{i} = {\mathbf{1}}}}^{\varvec{M}} {\left\| {{\mathbf{x}}_{{\mathbf{i}}} } \right\|}^{{\mathbf{2}}} \) with the \( {\mathbf{\mathcal{O}}}({\mathbf{log}}\,\varvec{MN}) \) running time. Discard the second register, we obtain the quantum inner-product kernel matrix

$$ \frac{\varvec{K}}{{\varvec{tr}(\varvec{K})}} = \varvec{tr}_{{\mathbf{2}}} \left( {\left|\varvec{\chi}\right\rangle \left\langle\varvec{\chi}\right|} \right) = \frac{{\mathbf{1}}}{{\varvec{N}_{\varvec{\chi}} }}\mathop \sum \limits_{{\varvec{i},\varvec{j} = {\mathbf{1}}}}^{\varvec{M}} \left\langle {{{\mathbf{x}}_{{\mathbf{j}}} }} \mathrel{\left | {\vphantom {{{\mathbf{x}}_{{\mathbf{j}}} } {{\mathbf{x}}_{{\mathbf{i}}} }}} \right. \kern-0pt} {{{\mathbf{x}}_{{\mathbf{i}}} }} \right\rangle \left\| {{\mathbf{x}}_{{\mathbf{i}}} {\mathbf{x}}_{{\mathbf{j}}} } \right\|\left| \varvec{i} \right\rangle \left\langle \varvec{j} \right| $$

3.2 Construct Quantum Gaussian Kernel Matrix

Suppose the data set \( {\mathbf{x}} = {\mathbf{x}}_{{\mathbf{1}}} , \ldots ,{\mathbf{x}}_{{\mathbf{M}}} \) is stored in the structure of a special designed binary tree [13], thus we can assume there existing a pair of oracle \( \varvec{U}_{{\mathbf{1}}} ,\varvec{U}_{{\mathbf{2}}} \), s.t.

$$ \varvec{U}_{{\mathbf{1}}} \left| \varvec{j} \right\rangle \left| {\mathbf{0}} \right\rangle \left| {\mathbf{0}} \right\rangle = \left| \varvec{j} \right\rangle \left| {{\mathbf{x}}_{{\mathbf{j}}} } \right\rangle \left| {\mathbf{0}} \right\rangle $$
$$ \varvec{U}_{{\mathbf{2}}} \left| \varvec{j} \right\rangle \left| {{\mathbf{x}}_{{\mathbf{j}}} } \right\rangle \left| {\mathbf{0}} \right\rangle = \left| \varvec{j} \right\rangle \left| {{\mathbf{x}}_{{\mathbf{j}}} } \right\rangle \left| {\left\| {{\mathbf{x}}_{{\mathbf{j}}} } \right\|} \right\rangle $$

Perform the oracle \( \varvec{U}_{{\mathbf{1}}}^{\dag } \varvec{U}_{{\mathbf{2}}} \varvec{U}_{{\mathbf{1}}} \) onto the state \( \left| {\varvec{\chi}^{\varvec{'}} } \right\rangle = \left|\varvec{\chi}\right\rangle \left| {\mathbf{0}} \right\rangle \), the system becomes to

$$ \frac{{\mathbf{1}}}{{\sqrt {\varvec{N}_{\varvec{\chi}} } }}\mathop \sum \limits_{{\varvec{i} = {\mathbf{1}}}}^{\varvec{M}} \left| \varvec{i} \right\rangle \left| {\left\| {{\mathbf{x}}_{{\mathbf{i}}} } \right\|} \right\rangle $$

Add an ancilla qubit and perform controlled rotation, we obtain

$$ \left| \varvec{G} \right\rangle = \frac{\mathbf{1}}{{\sqrt {\user2 C} }}\sum\limits_{{\varvec{i} = \mathbf{1}}}^{\varvec{M}} {{\mathbf{exp}}\left( {\left\| {{\mathbf{x}}_{{\mathbf{i}}} } \right\|^{\mathbf{2}} \left| \varvec{i} \right\rangle \left| {\left\| {{\mathbf{x}}_{{\mathbf{i}}} } \right\|} \right\rangle } \right)} $$

Uncompute the second system by invoking \( \varvec{U}_{{\mathbf{2}}}^{\dag } \), the system finally turns to our destination

$$ \frac{\varvec{K}}{{\varvec{tr(K)}}} = \left| \varvec{G} \right\rangle \left\langle \varvec{G} \right| = \frac{{\mathbf{1}}}{\varvec{C}}\sum\limits_{{\varvec{i},\varvec{j} = {\mathbf{1}}}}^{\varvec{M}} {{\mathbf{exp}}\left( {\left\| {{\mathbf{x}}_{{\mathbf{i}}} } \right\|^{{\mathbf{2}}} + \left\| {{\mathbf{x}}_{{\mathbf{j}}} } \right\|^{{\mathbf{2}}} } \right)\left| \varvec{i} \right\rangle \left\langle \varvec{j} \right|} $$

4 Quantum Least-Squares SVM

The main idea of this work is to adopt the least-squares reconstruction of the SVM that rounds the quadratic programming and gets the parameters from the solution of the linear equation system. The principal simplification is to draw into the lax variables \( e_{j} \) and replace the inequality constraints with equality constraints:

$$ y_{j} \left( {{\mathbf{wx}}_{{\mathbf{j}}} + b} \right) \ge 1 \to {\mathbf{wx}}_{{\mathbf{j}}} + b = y_{j} - y_{j} e_{j} . $$

Besides the constraints, the implied Lagrange function contains a penalty term \( \frac{\upgamma}{{\mathop \sum \nolimits_{i = 1}^{M} e_{i}^{2} }} \), where user-specified parameter \( \gamma \) determines the relative derivatives of training error and SVM objective. Taking partial derivatives of the Lagrange function and eliminating the variables \( {\mathbf{w}} \), \( e_{j} \) into consideration, the least-squares approximation of the problem is introduced:

$$ F\left( {\begin{array}{*{20}c} b \\ \alpha \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} 0 & {{\mathbf{1}}^{T} } \\ {\mathbf{1}} & {K + \gamma^{ - 1} {\mathbf{1}}} \\ \end{array} } \right)\left( {\begin{array}{*{20}c} b \\ \alpha \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} 0 \\ \varvec{y} \\ \end{array} } \right) $$

Here \( K \) is the kernel matrix we have constructed above, \( y = \left( {y_{1} , \cdots y_{M} } \right)^{T} \) denotes the classier label and \( 1 = \left( {1, \cdots ,1} \right)^{T} \). The matrix \( F \) is a \( \left( {M + 1} \right) \times \left( {M + 1} \right) \) dimensional matrix. Thus the quantum SVM parameter \( \left( {b,\alpha } \right) \) are determined by

$$ \left( {b,\alpha^{T} } \right) = F^{ - 1} \left( {0,y^{T} } \right)^{T} . $$

To inverse the matrix \( F \), we now describe the Fourier approach, which is based on an approximation of \( 1/F \) as a linear combination of unitaries \( e^{{ - iFt_{i} }} ,t_{i} \in R. \) These unitaries can be implemented by using some Hamiltonian simulation methods [7, 14, 15]. Our quantum algorithm establishes the following Fourier expansion of the function \( 1/x \) on the domain \( D_{k} \):

Theorem 1:

Let the function \( h\left( x \right) \) be defined as

$$ h\left( x \right) = \frac{i}{{\sqrt {2\pi } }}\mathop \sum \limits_{j = 1}^{J - 1} \varDelta_{y} \mathop \sum \limits_{k = - K}^{K} \varDelta_{z} z_{k} e^{{ - z_{k}^{2} /2}} e^{{ - ixy_{j} z_{k} }} , $$

where \( y_{j} : = j\varDelta_{y} ,z_{k} : = k\varDelta_{z}, \) for some fixed \( J = \varTheta \left( {\frac{k}{\varepsilon }log\left( {k/\varepsilon } \right)} \right) \), \( K = \varTheta \left( {klog\left( {k/\varepsilon } \right)} \right) \), \( \varDelta_{y} = \varTheta \left( {\varepsilon /\sqrt {log\left( {k/\varepsilon } \right)} } \right) \) and \( \varDelta_{z} = \varTheta \left( {\left( {k\sqrt {log\left( {k/\varepsilon } \right)} } \right)^{ - 1} } \right) \). Then \( h\left( x \right) \) is “\( \varepsilon \) - close to \( 1/x \) on the domain \( D_{k} \).

Based on the above theorem, the matrix \( F^{ - 1} \) can be expressed as the linear combination of some unitaries:

$$ \frac{1}{F} = \frac{i}{{\sqrt {2\pi } }}\mathop \sum \limits_{j = 1}^{J - 1} \varDelta_{y} \mathop \sum \limits_{k = - K}^{K} \varDelta_{z} z_{k} e^{{ - z_{k}^{2} /2}} e^{{ - ixy_{j} z_{k} }} . $$

To implementing this equation, we need the following theorem.

Theorem 2:

Let \( A \) be a Hermitian operator with eigenvalues in a domain \( D\, \subseteq \,R \). Suppose the function \( f:D \to R \) satisfies \( \left| {f\left( x \right)} \right| \ge 1 \) and all \( x \in D \). And \( f \) is \( \varepsilon \) - close to \( \sum\nolimits_{i} {\alpha_{i} T} \) on \( D \) for some \( \varepsilon \in \left( {0,1/2} \right) \), coefficients \( \alpha_{i} > 0 \), and functions \( T_{i} :D \to C \). Let \( U_{i} \) be a set of unitaries such that

$$ U_{i} \left| 0 \right\rangle \left| \varphi \right\rangle = \left| 0 \right\rangle T_{i} \left( A \right)\left| \varphi \right\rangle + \left| {\varphi^{ \bot } } \right\rangle $$

for all states \( \left| \varphi \right\rangle \), where \( \left( {\left| 0 \right\rangle \left\langle 0 \right| \otimes 1} \right)\left| {\varphi^{ \bot } } \right\rangle = 0 \). Given an algorithm for creating a quantum state \( \left| b \right\rangle \), there is a quantum algorithm that prepares a quantum state \( 4\varepsilon \)-close to \( f\left( A \right)\left| b \right\rangle /\left\| {f\left( A \right)\left| b \right\rangle } \right\| \), succeeding with constant probability, that makes an expected \( {\rm O}\left( {\alpha /\left\| {f\left( A \right)\left| b \right\rangle } \right\|} \right) \) uses of the algorithm, \( U \) and \( V \), where

$$ U = \mathop \sum \limits_{i} \left| i \right\rangle \left\langle i \right| \otimes U_{i} , $$
$$ V\left| 0 \right\rangle = \frac{1}{\sqrt \alpha }\mathop \sum \limits_{i} \sqrt {\alpha_{i} } \left| i \right\rangle ,\alpha = \mathop \sum \limits_{i} \alpha_{i} . $$

According to the Theorem 2, we introduce a signal quantum state and signal operator corresponding to \( 1/F \) under the Fourier expansion. Suppose there exists a unitary \( V \) that maps the initial state \( \left| 0 \right\rangle \) to the quantum state

$$ V\left| 0 \right\rangle = \frac{1}{{\left( {2\pi } \right)^{{\frac{1}{4}}} \sqrt \alpha }}\mathop \sum \limits_{j = 0}^{J - 1} \sqrt {\varDelta_{y} } \mathop \sum \limits_{k = - K}^{K} \sqrt {\varDelta_{z} \left| {z_{k} } \right|} e^{{ - z_{k}^{2} /2}} \left| {j,k} \right\rangle , $$

where the parameter \( \alpha \) is the \( L_{1} \) norm of the coefficients of this linear combination

$$ \alpha = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{j = 0}^{J - 1} \varDelta_{y} \mathop \sum \limits_{k = - K}^{K} \varDelta_{z} \left| {z_{k} } \right|e^{{ - z_{k}^{2} /2}} = \varTheta \left( {yJ} \right). $$

Theorem 2 also requires the unitary

$$ U = i\mathop \sum \limits_{j = 0}^{J - 1} \mathop \sum \limits_{k = - K}^{K} \left| {j,k} \right\rangle \left\langle {j,k} \right| \otimes sign\left( k \right)e^{{ - iFy_{j} z_{k} }} . $$

The term \( e^{{ - iFy_{j} z_{k} }} \) can be implemented utilizing the Qubitization method [14] in running time

$$ {\rm O}\left( {y_{j} z_{k} + log\left( {1/\varepsilon } \right)} \right) = \varTheta \left( {\kappa \,log\left( {\kappa /\varepsilon } \right)} \right) + log\left( {1/\varepsilon } \right). $$

Thus we can decompose the \( 1/F \) under the spectrum of \( F \):

$$ F^{ - 1} = \mathop \sum \limits_{j = 1}^{M + 1} \frac{1}{{\lambda_{j} }}\left| {\lambda_{j} } \right\rangle \left\langle {\lambda_{j} } \right|. $$

Perform \( F^{ - 1} \) on the label sequence state \( \left| y \right\rangle \), we obtain the SVM parameters:

$$ \left| {b,\alpha } \right\rangle = \mathop \sum \limits_{j = 1}^{M + 1} \frac{{\left\langle {{\lambda_{j} }} \mathrel{\left | {\vphantom {{\lambda_{j} } y}} \right. \kern-0pt} {y} \right\rangle }}{{\lambda_{j} }}\left| {\lambda_{j} } \right\rangle . $$

According to the training set labels, the expansion coefficient of the new state are the desired SVM parameters:

$$ \left| {b,\alpha } \right\rangle = \frac{1}{\sqrt C }\left( {b\left| 0 \right\rangle + \mathop \sum \limits_{k = 1}^{M} \alpha_{k} \left| k \right\rangle } \right), $$

where \( C = b^{2} + \sum\nolimits_{k = 1}^{M} {\alpha_{k}^{2} } . \)

5 Classification

Here, we have trained the quantum SVM model and would like to classify a query state \( \left| {\mathbf{x}} \right\rangle \) now. From the computed state \( \left| {{\mathbf{b}},{\varvec{\upalpha}}} \right\rangle \), add a register and perform the QRAM mechanism for encoding \( \left| {\mathbf{x}} \right\rangle \) in the entangle state:

$$ \left| \varvec{u} \right\rangle = \frac{\mathbf{1}}{{\sqrt{\user2 N} }}\left( {\varvec{b}\left| \mathbf{0} \right\rangle \left| \mathbf{0} \right\rangle + \sum\limits_{{\varvec{k} = \mathbf{1}}}^{\varvec{M}} {\varvec{\alpha}_{\varvec{k}} \left| {\varvec{x}_{\varvec{k}} } \right|\left| \varvec{k} \right\rangle \left| {\varvec{x}_{\varvec{k}} } \right\rangle } } \right) $$

with the factorized factor \( \varvec{N} = \varvec{b}^{{\mathbf{2}}} + \sum\nolimits_{{\varvec{k} = {\mathbf{1}}}}^{\varvec{M}} {\varvec{\alpha}_{\varvec{k}}^{{\mathbf{2}}} \left| {\varvec{x}_{\varvec{k}} } \right|^{{\mathbf{2}}} } \). Then, after constructing the query state \( \left| \varvec{x} \right\rangle \)

$$ \left| \varvec{x} \right\rangle = \frac{{\mathbf{1}}}{{\sqrt {\varvec{N}_{\varvec{x}} } }}(\left| {\mathbf{0}} \right\rangle \left| {\mathbf{0}} \right\rangle + \mathop \sum \limits_{{\varvec{k} = {\mathbf{1}}}}^{\varvec{M}} \left| \varvec{x} \right|\left| \varvec{k} \right\rangle \left| \varvec{x} \right\rangle ) $$

utilizing QRAM again, we can finally take swap test to invoke the classification mission. Suppose the axillary state \( \left|\varvec{\psi}\right\rangle = \frac{{\mathbf{1}}}{{\sqrt {\mathbf{2}} }}\left( {\left| {\mathbf{0}} \right\rangle \left| \varvec{u} \right\rangle + \left| {\mathbf{1}} \right\rangle \left| \varvec{x} \right\rangle } \right) \) and measure another ancilla in the state \( \left|\varvec{\phi}\right\rangle = \frac{{\mathbf{1}}}{{\sqrt {\mathbf{2}} }}(\left| {\mathbf{0}} \right\rangle - \left| {\mathbf{1}} \right\rangle ) \). The measurement has the success probability \( \varvec{P} = \frac{{\mathbf{1}}}{{\mathbf{2}}}({\mathbf{1}} - \left\langle {\varvec{u}} \mathrel{\left | {\vphantom {\varvec{u} \varvec{x}}} \right. \kern-0pt} {\varvec{x}} \right\rangle ) \). Thus if \( \varvec{P} < 1/2 \), the new vector \( \varvec{x} \) belongs to +1, otherwise −1.

6 Complexity Analysis

We now show that quantum matrix inversion substantially performs the operator \( \varvec{e}^{{ - \varvec{iFt}}} \) and analyze the running time of our algorithm. The matrix F contains the kernel matrix K and an additional row and column owing to the offset consideration b. From literature [14], we know that the gate complexity of simulating the Hamiltonian F for time t with error \( {\varvec{\upvarepsilon}} \) is \( {\mathbf{\mathcal{O}}}({\mathbf{t}} + {\mathbf{log}}\,{\mathbf{1}}/\varvec{\varepsilon}) \). Noting that time \( \varvec{t} = \varvec{y}_{\varvec{j}} \varvec{z}_{\varvec{k}} \), then F can be efficiently simulation in time \( {\varvec{\Theta}}({\varvec{\upkappa}}\,{\mathbf{log}}\,\varvec{\kappa}/\varvec{\varepsilon}+ {\mathbf{log}}\,{\mathbf{1}}/\varvec{\varepsilon}) \). It is interesting to note that the maximum absolute eigenvalue of \( \varvec{F}/(\varvec{tr}(\varvec{F})) \) is \( \le {\mathbf{1}} \) and the minimum absolute eigenvalue is \( \le \varvec{O}({\mathbf{1}}/\varvec{M}) \). Therefore, the condition number \( \varvec{\kappa} \) is \( \varvec{O}(\varvec{M}) \) in this case. To solve such an eigenvalue would require exponential time. Considering the relationship \( \varvec{\varepsilon}\le {\mathbf{max}}\left(\varvec{\lambda}\right) \le \varvec{ }{\mathbf{1}} \), thus \( \varvec{\kappa}= {\mathbf{\mathcal{O}}}({\mathbf{1}}/{\varvec{\upvarepsilon}}) \) holds. In consideration of the preparation of the kernel matrix in \( {\mathbf{\mathcal{O}}}({\mathbf{log}}\,\varvec{MN}) \), the run time is thus

$$ {\mathbf{\mathcal{O}}}({\mathbf{log}}(\frac{{\mathbf{1}}}{\varvec{\varepsilon}}){\varvec{\upvarepsilon}}^{{ - {\mathbf{1}}}} \,{\mathbf{log}}\,\varvec{MN}). $$

Compared to Rebentrost’s algorithm, we achieve exponentially improved dependence on precision \( {\varvec{\upvarepsilon}} \).

7 Conclusion

In this work, we have shown that the support vector machine can be implemented by quantum mechanics, and the complexity of proposed algorithm is logarithmic in the feature size and the amount of training data. Besides, the presented algorithm also improves dependence on precision ε. When the training data kernel matrix is controlled by the optimal Hamiltonian simulation method, the speed of the quantum algorithm is maximized. Furthermore, our algorithm also avoids the complexity caused by phase estimation and controlled rotation. In summary, quantum SVM is an important machine learning algorithm that can be efficiently implemented. It also provides advantages in data privacy and could be one important composition in quantum neural networks.