Deep matrix factorization with knowledge transfer for lifelong clustering and semi-supervised clustering
Introduction
Clustering analysis is one of the techniques that partition a set of data objects in such a way that data objects in the same group (or called cluster) are more similar to each other than to those in other groups. In recent years, data clustering has been studied extensively from some novel perspectives, say clustering with noisy data [13], incomplete data clustering [17], multi-view clustering [36], and multi-task clustering [48], [49]. Among these, multi-task clustering transfers the knowledge learned from different tasks to help clustering for a target task and groups the data of multiple tasks at the same time. For example, Zhang et al. [47] proposed a multi-task clustering method to transfer instance knowledge across tasks to help cluster each target task. Zhang et al. [48] also proposed a model to jointly mine task relationship among all tasks to help data clustering of every task. However, multi-task clustering methods mainly consider a fixed task set and do not learn incrementally. Given a sequence of tasks, when a new task is fed into a multi-task clustering model, the model needs to rebuild on all arrived tasks repeatedly, which is costly and time-consuming as the tasks are possibly endless. In this paper, we focus on the problem of data clustering of a sequence of tasks. This problem setting is similar as the current lifelong learning setting [5], which addresses a set of clustering tasks consecutively. Lifelong learning is a continual learning process where the learner learns a sequence of tasks; after learning each task, the learner retains knowledge and later uses the knowledge to help learn future task [2]. Mostly existing works focus on lifelong classification problems. In this work, we concern a lifelong clustering problem.
Rao et al. [25] proposed an unsupervised continual learning model to learn a task specific representation on the top of a larger set of shared parameters, which aims to solve catastrophic forgetting in neural networks. Note that the data in [25] are actually from the same task as the same dataset is partitioned into multiple subsets and then each subset is treated as a task. While, our tasks are from different datasets (see experiment section). Sun et al. [29] proposed a lifelong spectral clustering method to investigate spectral clustering in a lifelong learning setting. However, this method only extracts the shallow information of the data using spectral graph theory and cannot mine the deep intrinsic information among data samples. Our lifelong clustering method is a deep model, which is based on deep matrix factorization to effectively extract deep information [18], [3]. Meanwhile, these methods work in an unsupervised manner without any label information. In practice, a few of prior label information can robust and boost clustering performance remarkably [11], [10], [16]. Learning with partially supervised label information is known as semi-supervised learning [34]. To our knowledge, there is no work on lifelong semi-supervised clustering. In our work, we bring semi-supervised learning into our lifelong clustering framework and further propose a lifelong semi-supervised clustering method. In brief, the framework of our lifelong clustering is shown in Fig. 1.
Regarding clustering in a lifelong learning setting, there are at least two key challenges need to address. (1) How to mine shared knowledge among multiple tasks? Similar to multi-task learning, it needs to extract effective and sufficient information across tasks and to learn the correlation between tasks. (2) How to learn and transfer knowledge from previous tasks to the target/new task? The number of tasks in lifelong learning is possibly never ending and the task set is not fixed unlike multi-task learning. Thus, it would be essential that storing knowledge learned from previous tasks and transferring the knowledge to a new task to boost clustering performance for this new task.
In this paper, we design a deep multi-layer architecture to learn deep-layer representations across a sequence of tasks and meanwhile utilize graph co-clustering to transfer knowledge from one task to another. We then propose a Deep Matrix factorization method with Knowledge transfer (denoted as DMK) for lifelong clustering based on deep matrix factorization and graph co-clustering. Given a set of prior label information in each task, we further propose a Semi-supervised DMK (denoted as SDMK) for lifelong semi-supervised clustering. The proposed DMK and SDMK will be clear shortly. In summary, this paper makes the following contributions:
- •
It studies a new clustering paradigm called lifelong clustering, which aims to retain and accumulate knowledge learned from each previous task and to use the knowledge to help future clustering tasks.
- •
It proposes a deep multi-layer model (i.e., Deep Matrix factorization method with Knowledge transfer) for lifelong clustering. The proposed model learns a basis feature library shared by all previous tasks and meanwhile learns a task-specific representation for each new/target task. It also exploits graph co-clustering to learn a common feature embedding library to exchange and transfer latent information between sequential tasks.
- •
It further proposes a lifelong semi-supervised clustering method by exploiting a few of prior label information, which uses a label knowledge library to transfer semi-supervised knowledge among tasks.
- •
An alternating iterative optimization algorithm is presented to optimize the proposed lifelong clustering and semi-supervised clustering methods respectively in the lifelong setting. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the proposed methods.
Section snippets
Related works
In this section, we discuss the related works to our proposed method, most notably, Lifelong Learning, Multi-task Clustering and Matrix Factorization.
Deep matrix factorization method with knowledge transfer for lifelong clustering
This section presents the proposed Deep Matrix factorization method with Knowledge transfer for lifelong clustering (i.e., DMK). We first introduce some preliminaries and then detail the proposed DMK model.
Semi-supervised lifelong clustering based on DMK
This section presents our semi-supervised DMK (i.e., SDMK), which is a lifelong semi-supervised clustering method by utilizing a few of prior label information from data. The key novelty of SDMK against DMK is to learn a label knowledge library to retain and transfer semi-supervised information among sequential tasks. Similar to optimize DMK, an optimization algorithm is proposed to solve the SDMK in a lifelong manner.
Let and , where l and u indicates labeled and
Optimization of DMK
This section introduces how to optimize the proposed DMK in a lifelong clustering setting. When the m-th future/new clustering task arrives and flows into our DMK, we update each variable by fixing other variables alternatively. Note that we can update and B without accessing the raw data of previous tasks again, which can reduce the time complexity compared to those traditional algorithms. Our update rules are shown as:
Update . We minimize the objective
Experiments
This section evaluates the lifelong clustering performance of the proposed DMK and SDMK. We compared DKM (and SDMK) with several unsupervised clustering (and semi-supervised) baseline models on four real-world datasets.
Conclusions
This paper presented a Deep Matrix factorization method with Knowledge transfer (DMK) for lifelong clustering. The proposed DMK is a deep multi-layer framework to extract deep features against shallow matrix factorization based models. DMK learns an unified representation across tasks and then exploits graph co-clustering to embed task-specific features and to transfer knowledge across tasks. In addition, it accumulates the previously learned knowledge without accessing the previous tasks’ raw
CRediT authorship contribution statement
Yiling Zhang: Conceptualization, Methodology, Software, Investigation, Visualization, Writing - original draft. Hao Wang: Supervision, Writing - review & editing. Yan Yang: Supervision, Writing - review & editing, Funding acquisition. Wei Zhou: Data curation, Supervision, Writing - review & editing. Tianrui Li: Writing - review & editing. Xiaocao Ouyang: Writing - review & editing. Hongyang Chen: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Science Foundation of China (No. 61976247). Hao Wang would like to thank a grant from Project Funded by China Postdoctoral Science Foundation (No. 2020M681960) and a grant Sponsored by Zhejiang Lab (No. 2020KB0AA02).
References (50)
- et al.
Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning
Lifelong learning in artificial neural networks
Communications of the ACM
(2019)- et al.
Implicit regularization in deep matrix factorization
Advances in Neural Information Processing Systems
(2019) - et al.
Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization
Information Sciences
(2020) - et al.
Lifelong machine learning
Synthesis Lectures on Artificial Intelligence and Machine Learning
(2018) - et al.
Convex and semi-nonnegative matrix factorizations
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2010) - et al.
Convex multi-task feature learning
Machine Learning
(2008) - et al.
Learning the shared subspace for multi-task clustering and transductive transfer classification
- et al.
Multimodal face-pose estimation with multitask manifold deep learning
IEEE Transactions on Industrial Informatics
(2019) - et al.
Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization
IEEE Transactions on Neural Networks and Learning Systems
(2020)
Semi-supervised adaptive symmetric non-negative matrix factorization
IEEE Transactions on Cybernetics
Robust graph learning from noisy data
IEEE Transactions on Cybernetics
Symmetric nonnegative matrix factorization for graph clustering
Lifelong multi-task multi-view learning using latent spaces
Robust structured nonnegative matrix factorization for image representation
IEEE Transactions on Neural Networks and Learning Systems
Co-regularized nonnegative matrix factorization for evolving community detection in dynamic networks
Information Sciences
Optimization algorithms exploiting unitary constraints
IEEE Transactions on Signal Processing
Never-ending learning
Communications of the ACM
Efficient and robust feature selection via joint l2, 1-norms minimization
Continual lifelong learning with neural networks: A review
Neural Networks
Robust distribution-based nonnegative matrix factorizations for dimensionality reduction
Information Sciences
Continual unsupervised representation learning
Advances in Neural Information Processing Systems
Cited by (8)
Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering
2022, Information SciencesCitation Excerpt :As one of the most critical tools in the area of data mining, clustering has attracted much attention because of its low cost and high efficiency. In the past decades, many clustering methods have emerged, such as k-means [3], subspace clustering [4], hierarchical clustering [5], non-negative matrix factorization (NMF) [6,7], etc. Among these clustering methods, NMF has gained more attention due to its excellent geometric significance - the intuitive interpretability of non-negative data and the ability to learn parts-based representation [8–10].
Graph-based adaptive and discriminative subspace learning for face image clustering
2022, Expert Systems with ApplicationsA New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization
2024, Neural Processing LettersCohesive Pair-Wises Constrained Deep Embedding for Semi-Supervised Clustering with Very Few Labeled Samples*
2024, International Arab Journal of Information TechnologyIncremental trust-aware matrix factorization for recommender systems: towards Green AI
2023, Applied IntelligenceA Knowledge Transfer-Based Semi-Supervised Federated Learning for IoT Malware Detection
2023, IEEE Transactions on Dependable and Secure Computing