Skip to main content
Log in

Low-time-complexity document clustering using memristive dot product engine

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Document clustering has been commonly accepted in the field of data analysis. Nevertheless, the challenging issues for the clustering are the massive similarity measurement operations in the von Neumann architecture which result in huge time consumption. Memristive in-memory computing provides a brand-new path to solve this problem. In this article, utilizing the memristive dot product engine, we demonstrate a cosine similarity accelerated document clustering method for the first time. The memristor-based clustering method lowers the time complexity from O(N · d) of the conventional algorithm to O(N) by executing similarity measurement in one step. Focused on the unit-length vectors, an in-situ normalization scheme for the stored vectors in the crossbar array is proposed to provide an efficient hardware training scheme and reduce the normalization steps during the clustering. Utilizing the BBCSport dataset as a benchmark, we further discussed the impact of the non-ideal factors in the memristors, including the available quantized states, the inevitable programming noise, and the device failure. Simulation results indicate that the 6-bit quantized states and 5% programming noise are acceptable for the document clustering tasks. Besides, high resistance states of the failure cells are recommended for higher performance clustering results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wenskovitch J, Crandell I, Ramakrishnan N, et al. Towards a systematic combination of dimension reduction and clustering in visual analytics. IEEE Trans Visual Comput Graph, 2018, 24: 131–141

    Article  Google Scholar 

  2. Abualigah L M, Khader A T, Al-Betar M A, et al. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl, 2017, 84: 24–36

    Article  Google Scholar 

  3. Xie X, Wang B. Web page recommendation via twofold clustering: considering user behavior and topic relation. Neural Comput Applic, 2018, 29: 235–243

    Article  MathSciNet  Google Scholar 

  4. Trappey A J C, Trappey C V, Hsu F, et al. A fuzzy ontological knowledge document clustering methodology. IEEE Trans Syst Man Cybern B, 2009, 39: 806–814

    Article  Google Scholar 

  5. Hu M, Graves C E, Li C, et al. Memristor-based analog computation and neural network classification with a dot product engine. Adv Mater, 2018, 30: 1705914

    Article  Google Scholar 

  6. Zhang T, Yang K, Xu X, et al. Memristive devices and networks for brain-inspired computing. Phys Status Solidi RRL, 2019, 13: 1900029

    Article  Google Scholar 

  7. Ma W, Zidan M A, Lu W D. Neuromorphic computing with memristive devices. Sci China Inf Sci, 2018, 61: 060422

    Article  Google Scholar 

  8. Yu S M. Orientation classification by a winner-take-all network with oxide RRAM based synaptic devices. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 2014

  9. Sheridan P M, Cai F, Du C, et al. Sparse coding with memristor networks. Nat Nanotech, 2017, 12: 784–789

    Article  Google Scholar 

  10. Jiang Y, Kang J, Wang X. RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition. Sci Rep, 2017, 7: 45233

    Article  Google Scholar 

  11. Jeong Y J, Lee J, Moon J, et al. K-means data clustering with memristor networks. Nano Lett, 2018, 18: 4447–4453

    Article  Google Scholar 

  12. Yin S, Kim M, Kadetotad D, et al. A 1.06-µW smart ECG processor in 65-nm CMOS for real-time biometric authentication and personal cardiac monitoring. IEEE J Solid-State Circ, 2019, 54: 2316–2326

    Article  Google Scholar 

  13. van Laarhoven T. L2 regularization versus batch and weight normalization. 2017. ArXiv:170605350

  14. Wang F, Xiang X, Cheng J, et al. NormFace: L2 hypersphere embedding for face verification. In: Proceedings of the 25th ACM international conference on Multimedia, 2017

  15. Yao P, Wu H, Gao B, et al. Face classification using electronic synapses. Nat Commun, 2017, 8: 15199

    Article  Google Scholar 

  16. Endo Y, Miyamoto S. Spherical K-means++ clustering. In: Modeling Decisions for Artificial Intelligence. Berlin: Springer, 2015. 9321: 103–114

    Book  MATH  Google Scholar 

  17. Kim H, Kim H K, Cho S. Improving spherical k-means for document clustering: fast initialization, sparse centroid projection, and efficient cluster labeling. Expert Syst Appl, 2020, 150: 113288

    Article  Google Scholar 

  18. Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Commun ACM, 1975, 18: 613–620

    Article  MATH  Google Scholar 

  19. Ravindran R M, Thanamani D A S. K-means document clustering using vector space model. Bonfring Int J Data Min, 2015, 5: 10–14

    Article  Google Scholar 

  20. Curiskis S A, Drake B, Osborn T R, et al. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inf Process Manage, 2020, 57: 102034

    Article  Google Scholar 

  21. Lau J H, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP, 2016. 78–86

  22. Sadowski C, Levin G. SimHash: Hash-Based Similarity Detection. Techreport, 2007. 1–10

  23. Chaidaroon S, Fang Y. Variational deep semantic hashing for text documents. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017. 75–84

  24. Xia P, Zhang L, Li F. Learning similarity with cosine similarity ensemble. Inf Sci, 2015, 307: 39–52

    Article  MathSciNet  MATH  Google Scholar 

  25. Duwairi R, Abu-Rahmeh M. A novel approach for initializing the spherical K-means clustering algorithm. Simul Model Practice Theor, 2015, 54: 49–63

    Article  Google Scholar 

  26. Zidan M A, Jeong Y J, Lee J, et al. A general memristor-based partial differential equation solver. Nat Electron, 2018, 1: 411–420

    Article  Google Scholar 

  27. Peng X, Huang S, Luo Y, et al. DNN+NeuroSim: an end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), 2019

  28. Yao P, Wu H, Gao B, et al. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577: 641–646

    Article  Google Scholar 

  29. Fernando B R, Hasan R, Taha T M. Low power memristor crossbar based winner takes all circuit. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), 2018

  30. Lin P, Li C, Wang Z, et al. Three-dimensional memristor circuits as complex neural networks. Nat Electron, 2020, 3: 225–232

    Article  Google Scholar 

  31. Wang Z, Li C, Lin P, et al. In situ training of feed-forward and recurrent convolutional memristor networks. Nat Mach Intell, 2019, 1: 434–442

    Article  Google Scholar 

  32. Li C, Belkin D, Li Y, et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nat Commun, 2018, 9: 2385

    Article  Google Scholar 

  33. Liu X, Zeng Z, Wunsch II D C. Memristor-based LSTM network with in situ training and its applications. Neural Networks, 2020, 131: 300–311

    Article  Google Scholar 

  34. Perera D G, Kin F L. On-chip hardware support for similarity measures. In: Prcoeedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 2007

  35. Ambrogio S, Narayanan P, Tsai H, et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature, 2018, 558: 60–67

    Article  Google Scholar 

  36. Chen P-Y, Peng X C, Yu S M. NeuroSim+: an integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures. In: Prcoeedings of IEEE International Electron Devices Meeting, 2018

  37. Zhu Z H, Sun H B, Lin Y J, et al. A configurable multi-precision CNN computing framework based on single bit RRAM. In: Proceedings of the 56th Annual Design Automation Conference, 2019. 1–6

Download references

Acknowledgements

This work was supported by National Key Research and Development Plan of MOST of China (Grant No. 2019YFB2205100), National Natural Science Foundation of China (Grant Nos. 61874164, 92064012, 61841404), and the support of Hubei Key Laboratory for Advanced Memories, Hubei Engineering Research Center on Microelectronics, and Chua Memristor Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, H., Li, Y. & Miao, X. Low-time-complexity document clustering using memristive dot product engine. Sci. China Inf. Sci. 65, 122410 (2022). https://doi.org/10.1007/s11432-021-3316-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3316-x

Keywords

Navigation