Skip to main content
Log in

A quantum algorithm of K-means toward practical use

  • Published:
Quantum Information Processing Aims and scope Submit manuscript

Abstract

Clustering problems in image recognition are recurrent in unsupervised machine learning. The K-means algorithm is a simple and popular algorithm for solving the clustering problem. However, for example, the calculation of the distance between the cluster center and data is time-consuming in the centroid calculation of the K-means algorithm, in particular, for large data sizes. In the present paper, we investigate the possibility of quantum computation to speedup K-means algorithm for large data sizes. We describe a quantum-enhanced K-means algorithm from which centroid calculations are removed. For mean and distance calculations of vector data, we propose a quantum subroutine based on quantum entanglement with a potential speedup to make the speed of the proposed subroutine comparable to its classical counterpart. The proposed K-means algorithm is evaluated on three datasets: synthetic, Iris, and image datasets. The numerical experimental results show that the clustering performance of the proposed algorithm is comparable to that of the classical K-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. The Qiskit version was as follows: ’qiskit-terra: 0.15.2’, ’qiskit-aer: 0.6.1’, ’qiskit-ignis: 0.4.0’, ’qiskit-ibmq-provider: 0.9.0’, ’qiskit-aqua: 0.7.5’, and ’qiskit: 0.21.0’. The Python version was 3.8.5.

  2. The Scikit-learn version was 0.23.2.

  3. https://www.pakutaso.com/nature/flower/ (last accessed on Sep., 2021)

  4. Box-plots are interpreted as follows. The center line in the box denotes the median of the data. The top and bottom edges of the box denote the third and first quartile points, respectively. On the upper (lower) side, the horizontal line denotes the maximum (minimum) point in the range between (the first quartile point \(-\) 1.5 \( \times \) (the third—the first quartile points)) and (the third quartile point + 1.5 \( \times \) (the third—the first quartile points)). The circles denote larger or smaller points than the horizontal lines on the upper or lower sides, respectively, and indicate the outlier values.

  5. For the t test and the Wilcoxon signed rank test, the null hypothesis states that \( mean_{1} = mean_{2} \).

  6. We modified the code shown in the Web site (https://www.sejuku.net/blog/64365).

References

  1. Aïmeur, E., Brassard, G., Gambs, S.: Machine learning in a quantum world. In: Lamontagne, L., Marchand, M. (eds.) Advances in artificial intelligence, pp. 431–442. Springer, Berlin (2006)

    Chapter  Google Scholar 

  2. Aleksandrowicz, G., Alexander, T., Barkoutsos, P., Bello, L., Ben-Haim, Y., Bucher, D., Cabrera-Hernández, F.J., Carballo-Franquis, J., Chen, A., Chen, C.F., Chow, J.M., Córcoles-Gonzales, A.D., Cross, A.J., Cross, A., Cruz-Benito, J., Culver, C., González, S.D.L.P., Torre, E.D.L., Ding, D., Dumitrescu, E., Duran, I., Eendebak, P., Everitt, M., Sertage, I.F., Frisch, A., Fuhrer, A., Gambetta, J., Gago, B.G., Gomez-Mosquera, J., Greenberg, D., Hamamura, I., Havlicek, V., Hellmers, J., Herok, Ł., Horii, H., Hu, S., Imamichi, T., Itoko, T., Javadi-Abhari, A., Kanazawa, N., Karazeev, A., Krsulich, K., Liu, P., Luh, Y., Maeng, Y., Marques, M., Martin-Fernández, F.J., McClure, D.T., McKay, D., Meesala, S., Mezzacapo, A., Moll, N., Rodríguez, D.M., Nannicini, G., Nation, P., Ollitrault, P., O’Riordan, L.J., Paik, H., Pérez, J., Phan, A., Pistoia, M., Prutyanov, V., Reuter, M., Rice, J., Davila, A.R., Rudy, R.H.P., Ryu, M., Sathaye, N., Schnabel, C., Schoute, E., Setia, K., Shi, Y., Silva, A., Siraichi, Y., Sivarajah, S., A.Smolin, J., Soeken, M., Takahashi, H., Tavernelli, I., Taylor, C., Taylour, P., Trabing, K., Treinish, M., Turner, W., Vogt-Lee, D., Vuillot, C., Wildstrom, J.A., Wilson, J., Winston, E., Wood, C., Wood, S., Worner, S., Akhalwaya, I.Y., Zoufal, C.: Qiskit: An Open-source Framework for Quantum Computing (2019). https://doi.org/10.5281/zenodo.2562111

  3. Arunachalam, S., de Wolf, R.: Optimal quantum sample complexity of learning algorithms. J. Mach. Learn. Res. 19(71), 1–36 (2018)

    MathSciNet  MATH  Google Scholar 

  4. Baritompa, W.P., Bulger, D.W., Wood, G.R.: Grover’s quantum algorithm applied to global optimization. SIAM J. Opt. 15(4), 1170–1184 (2005). https://doi.org/10.1137/040605072

  5. Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum circuits as machine learning models. Quant. Sci. Technol. 4(4), 043001 (2019). https://doi.org/10.1088/2058-9565/ab4eb5

    Article  ADS  Google Scholar 

  6. Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017). https://doi.org/10.1038/nature23474

    Article  ADS  Google Scholar 

  7. Biau, G., Devroye, L., Lugosi, G.: On the performance of clustering in hilbert spaces. IEEE Trans. Inf. Theory 54(2), 781–790 (2008). https://doi.org/10.1109/TIT.2007.913516

    Article  MathSciNet  MATH  Google Scholar 

  8. Bishop, C.M.: Pattern recognition and machine learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  9. Brassard, G., Dupuis, F., Gambs, S., Tapp, A.: An optimal quantum algorithm to approximate the mean and its application for approximating the median of a set of points over an arbitrary distance. arXiv:1106.4267 [quant-ph] (2011)

  10. Dürr, C., Høyer, P.: A quantum algorithm for finding the minimum. arXiv:quant-ph/9607014 (1996)

  11. Goel, A., Tung, C., Lu, Y.H., Thiruvathukal, G.K.: A survey of methods for low-power deep learning and computer vision. In: 2020 IEEE 6th world forum on Internet of Things (WF-IoT), pp. 1–6 (2020). https://doi.org/10.1109/WF-IoT48130.2020.9221198

  12. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the twenty-eighth annual ACM symposium on theory of computing, STOC ’96, pp. 212–219. Association for computing machinery, New York, NY, USA (1996). https://doi.org/10.1145/237814.237866

  13. Kerenidis, I., Landman, J., Luongo, A., Prakash, A.: q-means: A quantum algorithm for unsupervised machine learning. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in neural information processing systems, vol. 32. Curran Associates, Inc. (2019)

  14. Khan, S.U., Awan, A.J., Vall-llosera, G.: K-means clustering on noisy intermediate scale quantum computers. arXiv preprint arXiv:1909.12183 (2019)

  15. Kopczyk, D.: Quantum machine learning for data scientists. arXiv preprint arXiv:1804.10068 (2018)

  16. Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411 (2013)

  17. Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum principal component analysis. Nat. Phys. 10(9), 631–633 (2014). https://doi.org/10.1038/nphys3029

    Article  Google Scholar 

  18. Nielsen, M.A., Chuang, I.L.: Quantum computation and quantum information, 10th edn. Cambridge University Press, USA (2011)

    MATH  Google Scholar 

  19. Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014). https://doi.org/10.1103/PhysRevLett.113.130503

    Article  ADS  Google Scholar 

  20. Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp. 410–420. Association for computational linguistics, Prague, Czech Republic (2007)

  21. Schuld, M., Sinayskiy, I., Petruccione, F.: An introduction to quantum machine learning. Contemp. Phys. 56(2), 172–185 (2015). https://doi.org/10.1080/00107514.2014.964942

    Article  ADS  MATH  Google Scholar 

  22. Shor, P.: Algorithms for quantum computation: discrete logarithms and factoring. In: Proceedings 35th annual symposium on foundations of computer science, pp. 124–134 (1994). https://doi.org/10.1109/SFCS.1994.365700

  23. Valiant, L.G.: A theory of the learnable. Commun ACM 27(11), 1134–1142 (1984). https://doi.org/10.1145/1968.1972

    Article  MATH  Google Scholar 

  24. Wiebe, N., Kapoor, A., Svore, K.M.: Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quant. Inf. Comput. 15(3–4), 316–356 (2015)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The author would like to thank the anonymous reviewers for their valuable comments and suggestions on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroshi Ohno.

Ethics declarations

Conflict of interest

The author declares that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

An implementation of the classical K-means algorithm

A classical K-means algorithm can be implemented in Python as followsFootnote 6:

figure a

An implementation of Q-MIN

A quantum subroutine qsub_min for Q-MIN for two qubits is implemented in Python and Qiskit [2] as follows:

figure b

Detailed derivation of Eq. 8

The state \( \vert \psi _{3} \rangle \) is derived as follows:

$$\begin{aligned} \begin{aligned} \vert \psi _{3} \rangle&= \cos {(t)} (I \cos {(t)} + i H^{\otimes m} \sin {(t)}) \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle \\&\quad - i \sin {(t)} \frac{1}{\sqrt{M+1}} (I \cos {(t)} + i H^{\otimes m} \sin {(t)}) \sum _{j = 0}^{M} \vert j \rangle \vert \varvec{x}_{j} \rangle \\&= \cos ^{2}{(t)} \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle + i \cos {(t)} \sin {(t)} H^{\otimes m} \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle \\&\quad - i \frac{\sin {(t)} \cos {(t)}}{\sqrt{M+1}} \sum _{j = 0}^{M} \vert j \rangle \vert \varvec{x}_{j} \rangle + \frac{\sin ^{2}{(t)}}{\sqrt{M+1}} H^{\otimes m} \sum _{j = 0}^{M} \vert j \rangle \vert \varvec{x}_{j} \rangle \\&= \cos ^{2}{(t)} \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle + i \frac{\cos {(t)} \sin {(t)}}{\sqrt{M+1}} \sum _{j = 0}^{M} \vert j \rangle \vert \varvec{x}_{0} \rangle \\&\quad - i \frac{\sin {(t)} \cos {(t)}}{\sqrt{M+1}} \sum _{j = 0}^{M} \vert j \rangle \vert \varvec{x}_{j} \rangle + \frac{\sin ^{2}{(t)}}{\sqrt{M+1}} \sum _{j = 0}^{M} \frac{1}{\sqrt{M+1}} \sum _{k = 0}^{M} (-1)^{j \cdot k} \vert k \rangle \vert \varvec{x}_{j} \rangle \\&= \cos ^{2}{(t)} \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle + i \frac{\cos {(t)} \sin {(t)}}{\sqrt{M+1}} \sum _{j = 0}^{M} \vert j \rangle \vert \varvec{x}_{0} \rangle - i \frac{\sin {(t)} \cos {(t)}}{\sqrt{M+1}} \vert 0 \rangle \vert \varvec{x}_{0} \rangle \\&\quad - i \frac{\sin {(t)} \cos {(t)}}{\sqrt{M+1}} \sum _{j = 1}^{M} \vert j \rangle \vert \varvec{x}_{j} \rangle + \frac{\sin ^{2}{(t)}}{M+1} \sum _{j = 0}^{M} \vert 0 \rangle \vert \varvec{x}_{j} \rangle \\&\quad + \frac{\sin ^{2}{(t)}}{M+1} \sum _{j = 0}^{M} \sum _{k = 1}^{M} (-1)^{j \cdot k} \vert k \rangle \vert \varvec{x}_{j} \rangle \\&= \cos ^{2}{(t)} \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle + i \frac{\cos {(t)} \sin {(t)}}{\sqrt{M+1}} \sum _{j = 1}^{M} \vert j \rangle \vert \varvec{x}_{0} \rangle \\&\quad - i \frac{\sin {(t)} \cos {(t)}}{\sqrt{M+1}} \sum _{j = 1}^{M} \vert j \rangle \vert \varvec{x}_{j} \rangle + \frac{\sin ^{2}{(t)}}{M+1} \sum _{j = 0}^{M} \vert 0 \rangle \vert \varvec{x}_{j} \rangle + \frac{\sin ^{2}{(t)}}{M+1} \sum _{j = 0}^{M} \sum _{k = 1}^{M} (-1)^{j \cdot k} \vert k \rangle \vert \varvec{x}_{j} \rangle \\&= \left( \cos ^{2}{(t)} + \frac{\sin ^{2}{(t)}}{M + 1} \right) \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{0} \rangle + \frac{\sin ^{2}{(t)}}{M + 1} \sum _{j = 1}^{M} \vert 0 \rangle ^{\otimes m} \vert \varvec{x}_{j} \rangle \\&\quad + i \frac{\cos {(t)} \sin {(t)}}{\sqrt{M + 1}} \sum _{j = 1}^{M} \vert j \rangle (\vert \varvec{x}_{0} \rangle - \vert \varvec{x}_{j} \rangle ) + \frac{\sin ^{2}{(t)}}{M + 1} \sum _{j = 0}^{M} \sum _{k = 1}^{M} (-1)^{j \cdot k} \vert k \rangle \vert \varvec{x}_{j} \rangle . \end{aligned} \end{aligned}$$
(13)

Implementation of the proposed algorithm

A quantum subroutine qsub_md for the proposed algorithm for \( m = 2 \) and \( d = 2 \) is implemented in Python and Qiskit as follows:

figure c
Fig. 10
figure 10

Clustering results for the synthetic dataset

Fig. 11
figure 11

Clustering results for the Iris dataset

Fig. 12
figure 12

Clustering results of C-KM for the image dataset

Fig. 13
figure 13

Clustering results of Q-KM for the image dataset

Fig. 14
figure 14

Clustering results of Q-KM* for the image dataset

Implementation of the proposed quantum-enhanced K-means algorithm

Using the quantum subroutines (qsub_md and qsub_min), the proposed algorithm is implemented as follows:

figure d

Other results for synthetic, Iris, and image datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ohno, H. A quantum algorithm of K-means toward practical use. Quantum Inf Process 21, 146 (2022). https://doi.org/10.1007/s11128-022-03485-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11128-022-03485-x

Keywords

Navigation