Abstract
Document clustering has been commonly accepted in the field of data analysis. Nevertheless, the challenging issues for the clustering are the massive similarity measurement operations in the von Neumann architecture which result in huge time consumption. Memristive in-memory computing provides a brand-new path to solve this problem. In this article, utilizing the memristive dot product engine, we demonstrate a cosine similarity accelerated document clustering method for the first time. The memristor-based clustering method lowers the time complexity from O(N · d) of the conventional algorithm to O(N) by executing similarity measurement in one step. Focused on the unit-length vectors, an in-situ normalization scheme for the stored vectors in the crossbar array is proposed to provide an efficient hardware training scheme and reduce the normalization steps during the clustering. Utilizing the BBCSport dataset as a benchmark, we further discussed the impact of the non-ideal factors in the memristors, including the available quantized states, the inevitable programming noise, and the device failure. Simulation results indicate that the 6-bit quantized states and 5% programming noise are acceptable for the document clustering tasks. Besides, high resistance states of the failure cells are recommended for higher performance clustering results.
Similar content being viewed by others
References
Wenskovitch J, Crandell I, Ramakrishnan N, et al. Towards a systematic combination of dimension reduction and clustering in visual analytics. IEEE Trans Visual Comput Graph, 2018, 24: 131–141
Abualigah L M, Khader A T, Al-Betar M A, et al. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl, 2017, 84: 24–36
Xie X, Wang B. Web page recommendation via twofold clustering: considering user behavior and topic relation. Neural Comput Applic, 2018, 29: 235–243
Trappey A J C, Trappey C V, Hsu F, et al. A fuzzy ontological knowledge document clustering methodology. IEEE Trans Syst Man Cybern B, 2009, 39: 806–814
Hu M, Graves C E, Li C, et al. Memristor-based analog computation and neural network classification with a dot product engine. Adv Mater, 2018, 30: 1705914
Zhang T, Yang K, Xu X, et al. Memristive devices and networks for brain-inspired computing. Phys Status Solidi RRL, 2019, 13: 1900029
Ma W, Zidan M A, Lu W D. Neuromorphic computing with memristive devices. Sci China Inf Sci, 2018, 61: 060422
Yu S M. Orientation classification by a winner-take-all network with oxide RRAM based synaptic devices. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 2014
Sheridan P M, Cai F, Du C, et al. Sparse coding with memristor networks. Nat Nanotech, 2017, 12: 784–789
Jiang Y, Kang J, Wang X. RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition. Sci Rep, 2017, 7: 45233
Jeong Y J, Lee J, Moon J, et al. K-means data clustering with memristor networks. Nano Lett, 2018, 18: 4447–4453
Yin S, Kim M, Kadetotad D, et al. A 1.06-µW smart ECG processor in 65-nm CMOS for real-time biometric authentication and personal cardiac monitoring. IEEE J Solid-State Circ, 2019, 54: 2316–2326
van Laarhoven T. L2 regularization versus batch and weight normalization. 2017. ArXiv:170605350
Wang F, Xiang X, Cheng J, et al. NormFace: L2 hypersphere embedding for face verification. In: Proceedings of the 25th ACM international conference on Multimedia, 2017
Yao P, Wu H, Gao B, et al. Face classification using electronic synapses. Nat Commun, 2017, 8: 15199
Endo Y, Miyamoto S. Spherical K-means++ clustering. In: Modeling Decisions for Artificial Intelligence. Berlin: Springer, 2015. 9321: 103–114
Kim H, Kim H K, Cho S. Improving spherical k-means for document clustering: fast initialization, sparse centroid projection, and efficient cluster labeling. Expert Syst Appl, 2020, 150: 113288
Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Commun ACM, 1975, 18: 613–620
Ravindran R M, Thanamani D A S. K-means document clustering using vector space model. Bonfring Int J Data Min, 2015, 5: 10–14
Curiskis S A, Drake B, Osborn T R, et al. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inf Process Manage, 2020, 57: 102034
Lau J H, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP, 2016. 78–86
Sadowski C, Levin G. SimHash: Hash-Based Similarity Detection. Techreport, 2007. 1–10
Chaidaroon S, Fang Y. Variational deep semantic hashing for text documents. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017. 75–84
Xia P, Zhang L, Li F. Learning similarity with cosine similarity ensemble. Inf Sci, 2015, 307: 39–52
Duwairi R, Abu-Rahmeh M. A novel approach for initializing the spherical K-means clustering algorithm. Simul Model Practice Theor, 2015, 54: 49–63
Zidan M A, Jeong Y J, Lee J, et al. A general memristor-based partial differential equation solver. Nat Electron, 2018, 1: 411–420
Peng X, Huang S, Luo Y, et al. DNN+NeuroSim: an end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), 2019
Yao P, Wu H, Gao B, et al. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577: 641–646
Fernando B R, Hasan R, Taha T M. Low power memristor crossbar based winner takes all circuit. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), 2018
Lin P, Li C, Wang Z, et al. Three-dimensional memristor circuits as complex neural networks. Nat Electron, 2020, 3: 225–232
Wang Z, Li C, Lin P, et al. In situ training of feed-forward and recurrent convolutional memristor networks. Nat Mach Intell, 2019, 1: 434–442
Li C, Belkin D, Li Y, et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nat Commun, 2018, 9: 2385
Liu X, Zeng Z, Wunsch II D C. Memristor-based LSTM network with in situ training and its applications. Neural Networks, 2020, 131: 300–311
Perera D G, Kin F L. On-chip hardware support for similarity measures. In: Prcoeedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 2007
Ambrogio S, Narayanan P, Tsai H, et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature, 2018, 558: 60–67
Chen P-Y, Peng X C, Yu S M. NeuroSim+: an integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures. In: Prcoeedings of IEEE International Electron Devices Meeting, 2018
Zhu Z H, Sun H B, Lin Y J, et al. A configurable multi-precision CNN computing framework based on single bit RRAM. In: Proceedings of the 56th Annual Design Automation Conference, 2019. 1–6
Acknowledgements
This work was supported by National Key Research and Development Plan of MOST of China (Grant No. 2019YFB2205100), National Natural Science Foundation of China (Grant Nos. 61874164, 92064012, 61841404), and the support of Hubei Key Laboratory for Advanced Memories, Hubei Engineering Research Center on Microelectronics, and Chua Memristor Institute.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, H., Li, Y. & Miao, X. Low-time-complexity document clustering using memristive dot product engine. Sci. China Inf. Sci. 65, 122410 (2022). https://doi.org/10.1007/s11432-021-3316-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3316-x