Abstract
Recently, deep multi-view clustering leveraging autoencoders has garnered significant attention due to its ability to simultaneously enhance feature learning capabilities and optimize clustering outcomes. However, existing autoencoder-based deep multi-view clustering methods often exhibit a tendency to either overly emphasize view-specific information, thus neglecting shared information across views, or alternatively, to place undue focus on shared information, resulting in the dilution of complementary information from individual views. Given the principle that commonality resides within individuality, this paper proposes a staged training approach that comprises two phases: pre-training and fine-tuning. The pre-training phase primarily focuses on learning view-specific information, while the fine-tuning phase aims to doubly enhance commonality across views while maintaining these specific details. Specifically, we learn and extract the specific information of each view through the autoencoder in the pre-training stage. After entering the fine-tuning stage, we first initially enhance the commonality between independent specific views through the transformer layer, and then further strengthen these commonalities through contrastive learning on the semantic labels of each view, so as to obtain more accurate clustering results.





Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Chen, J., Yang, S., Peng, X., Peng, D., Wang, Z.: Augmented sparse representation for incomplete multiview clustering. IEEE Trans. Neural Netw. Learn Syst. 35(3), 4058–4071 (2022)
Xu, J., Ren, Y., Tang, H., Yang, Z., Pan, L., Yang, Y., Pu, X., Philip, S.Y., He, L.: Self-supervised discriminative feature learning for deep multi-view clustering. IEEE Trans. Knowl. Data Eng. 35(7), 7470–7482 (2022)
Li, Y., Yang, M., Zhang, Z.: A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31(10), 1863–1883 (2018)
Wang, C., Pan, S., Hu, R., Long, G., Jiang, J., Zhang, C.: Attributed graph clustering: a deep attentional embedding approach. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3670–3676 (2019)
Du, G., Zhou, L., Li, Z., Wang, L., Lü, K.: Neighbor-aware deep multi-view clustering via graph convolutional network. Inf. Fusion 93, 330–343 (2023)
Xie, Y., Lin, B., Qu, Y., Li, C., Zhang, W., Ma, L., Wen, Y., Tao, D.: Joint deep multi-view learning for image clustering. IEEE Trans. Knowl. Data Eng. 33(11), 3594–3606 (2020)
Tao, Z., Liu, H., Fu, H., Fu, Y.: Multi-view saliency-guided clustering for image cosegmentation. IEEE Trans. Image Process. 28(9), 4634–4645 (2019)
Xue, Z., Li, G., Wang, S., Huang, J., Zhang, W., Huang, Q.: Beyond global fusion: a group-aware fusion approach for multi-view image clustering. Inf. Sci. 493, 176–191 (2019)
Nie, F., Cai, G., Li, J., Li, X.: Auto-weighted multi-view learning for image clustering and semi-supervised classification. IEEE Trans. Image Process. 27(3), 1501–1511 (2017)
Fang, U., Li, M., Li, J., Gao, L., Jia, T., Zhang, Y.: A comprehensive survey on multi-view clustering. IEEE Trans. Knowl. Data Eng. 35(12), 12350–12368 (2023)
Zhao, W., Xu, C., Guan, Z., Liu, Y.: Multiview concept learning via deep matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 814–825 (2020)
Khan, G.A., Hu, J., Li, T., Diallo, B., Wang, H.: Multi-view data clustering via non-negative matrix factorization with manifold regularization. Int. J Mach Learn. Cybern. 13, 1–13 (2022)
Chen, J., Yang, S., Mao, H., Fahy, C.: Multiview subspace clustering using low-rank representation. IEEE Trans. Cybern. 52(11), 12364–12378 (2021)
Lan, S., Zheng, Q., Yu, Y.: Double-level view-correlation multi-view subspace clustering. Knowl.-Based Syst. 284, 111271 (2024)
Gao, H., Nie, F., Li, X., Huang, H.: Multi-view subspace clustering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4238–4246 (2015)
Rong, W., Zhuo, E., Peng, H., Chen, J., Wang, H., Han, C., Cai, H.: Learning a consensus affinity matrix for multi-view clustering via subspaces merging on Grassmann manifold. Inf. Sci. 547, 68–87 (2021)
Wang, H., Yang, Y., Liu, B.: Gmc: Graph-based multi-view clustering. IEEE Trans. Knowl. Data Eng. 32(6), 1116–1129 (2019)
Huang, S., Tsang, I.W., Xu, Z., Lv, J.: Measuring diversity in graph learning: a unified framework for structured multi-view clustering. IEEE Trans. Knowl. Data Eng. 34(12), 5869–5883 (2021)
Wang, Y., Chang, D., Fu, Z., Zhao, Y.: Consistent multiple graph embedding for multi-view clustering. IEEE Trans. Multimed. 25, 1008–1018 (2021)
Wang, H., Yao, M., Jiang, G., Mi, Z., Fu, X.: Graph-collaborated auto-encoder hashing for multiview binary clustering. IEEE Trans. Neural Netw. Learn Syst. 13, 1–13 (2023)
Jiang, G., Peng, J., Wang, H., Mi, Z., Fu, X.: Tensorial multi-view clustering via low-rank constrained high-order graph learning. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5307–5318 (2022)
Wang, H., Jiang, G., Peng, J., Deng, R., Fu, X.: Towards adaptive consensus graph: multi-view clustering via graph collaboration. IEEE Trans. Multimedia 25, 6629–6641 (2022)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)
Hu, S., Zou, G., Zhang, C., Lou, Z., Geng, R., Ye, Y.: Joint contrastive triple-learning for deep multi-view clustering. Inf. Process. Manag. 60(3), 103284 (2023)
Chen, J., Mao, H., Woo, W.L., Peng, X.: Deep multiview clustering by contrasting cluster assignments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16752–16761 (2023)
Xu, J., Tang, H., Ren, Y., Peng, L., Zhu, X., He, L.: Multi-level feature learning for contrastive multi-view clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16051–16060 (2022)
Wang, Q., Cheng, J., Gao, Q., Zhao, G., Jiao, L.: Deep multi-view subspace clustering with unified and discriminative learning. IEEE Trans. Multimed. 23, 3483–3493 (2020)
Yang, Y., Guan, Z., Zhao, W., Lu, W., Zong, B.: Graph substructure assembling network with soft sequence and context attention. IEEE Trans. Knowl. Data Eng. 35(5), 4894–4907 (2022)
Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. IEEE Trans. Knowl. Data Eng. 35(2), 1637–1650 (2021)
Xia, W., Wang, Q., Gao, Q., Zhang, X., Gao, X.: Self-supervised graph convolutional network for multi-view clustering. IEEE Trans. Multimed. 24, 3182–3192 (2021)
Diallo, B., Hu, J., Li, T., Khan, G.A., Liang, X., Wang, H.: Auto-attention mechanism for multi-view deep embedding clustering. Pattern Recogn. 143, 109764 (2023)
Ke, G., Hong, Z., Yu, W., Zhang, X., Liu, Z.: Efficient multi-view clustering networks. Appl. Intell. 52(13), 14918–14934 (2022)
Lu, R.-K., Liu, J.-W., Zuo, X.: Attentive multi-view deep subspace clustering net. Neurocomputing 435, 186–196 (2021)
Le-Khac, P.H., Healy, G., Smeaton, A.F.: Contrastive representation learning: a framework and review. IEEE Access 8, 193907–193934 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? Adv. Neural. Inf. Process. Syst. 33, 6827–6839 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Winn, J., Jojic, N.: Locus: learning object classes with unsupervised segmentation. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) vol. 1, pp. 756–763 (2005). IEEE
Nene, S.A.: Columbia object image library(coil-20). Tech Rep 5, (1996)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 524–531 (2005). IEEE
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Cai, X., Wang, H., Huang, H., Ding, C.: Joint stage recognition and anatomical annotation of drosophila gene expression patterns. Bioinformatics 28(12), 16–24 (2012)
Liu, X., Zhu, X., Li, M., Tang, C., Zhu, E., Yin, J., Gao, W.: Efficient and effective incomplete multi-view clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4392–4399 (2019)
Chen, J., Yang, S., Peng, X., Peng, D., Wang, Z.: Augmented sparse representation for incomplete multiview clustering. IEEE Trans. Neural Netw. Learn Syst. 35(3), 4058–4071 (2022)
Tang, H., Liu, Y.: Deep safe incomplete multi-view clustering: theorem and algorithm. In: International Conference on Machine Learning, pp. 21090–21110 (2022). PMLR
Lin, Y., Gou, Y., Liu, X., Bai, J., Lv, J., Peng, X.: Dual contrastive prediction for incomplete multi-view representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4447–4461 (2022)
Tang, H., Liu, Y.: Deep safe multi-view clustering: Reducing the risk of clustering performance degradation caused by view increase. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 202–211 (2022)
Zheng, Q., Zhu, J., Li, Z., Tian, Z., Li, C.: Comprehensive multi-view representation learning. Inf. Fusion 89, 198–209 (2023)
Bian, J., Xie, X., Lai, J.-H., Nie, F.: Multi-view contrastive clustering via integrating graph aggregation and confidence enhancement. Inf. Fusion 108, 102393 (2024)
Maaten, L., Hinton, G.: Visualizing data using t-sne. J Mach. Learn. Res. 9(11), 2579–2605 (2008)
Acknowledgements
This work is supported by National Natural Science Foundation of China (CN) [62276164, 61602296], ‘Science and technology innovation action plan’ Natural Science Foundation of Shanghai [22ZR1427000], and Shanghai Oriental Talent Program-Youth Program. The authors would like to thank their supports.
Funding
Natural Science Foundation of Shanghai (22ZR1427000), National Natural Science Foundation of China (CN) (62276164, 61602296).
Author information
Authors and Affiliations
Contributions
Author Yang Zhiyuan played a central role in this study. He not only put forward innovative ideas, but also personally wrote the codes required for the experiment and carefully set up the experimental environment to ensure the smooth progress of the research. Meanwhile, Authors Zhu Changming and Li Zishi focused on the review of the document. With profound professional backgrounds and rigorous attitudes, they carefully reviewed the document, providing strong guarantees for the accuracy and integrity of the research.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Communicated by Yongdong Zhang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Z., Zhu, C. & Li, Z. Deep contrastive multi-view clustering with doubly enhanced commonality. Multimedia Systems 30, 196 (2024). https://doi.org/10.1007/s00530-024-01400-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01400-1