skip to main content
research-article

An Optimal Edge-weighted Graph Semantic Correlation Framework for Multi-view Feature Representation Learning

Published: 25 April 2024 Publication History

Abstract

In this article, we present an optimal edge-weighted graph semantic correlation (EWGSC) framework for multi-view feature representation learning. Different from most existing multi-view representation methods, local structural information and global correlation in multi-view feature spaces are exploited jointly in the EWGSC framework, leading to a new and high-quality multi-view feature representation. Specifically, a novel edge-weighted graph model is first conceptualized and developed to preserve local structural information in each of the multi-view feature spaces. Then, the explored structural information is integrated with a semantic correlation algorithm, labeled multiple canonical correlation analysis (LMCCA), to form a powerful platform for effectively exploiting local and global relations across multi-view feature spaces jointly. We then theoretically verified the relation between the upper limit on the number of projected dimensions and the optimal solution to the multi-view feature representation problem. To validate the effectiveness and generality of the proposed framework, we conducted experiments on five datasets of different scales, including visual-based (University of California Irvine (UCI) iris database, Olivetti Research Lab (ORL) face database, and Caltech 256 database), text-image-based (Wiki database), and video-based (Ryerson Multimedia Lab (RML) audio-visual emotion database) examples. The experimental results show the superiority of the proposed framework on multi-view feature representation over state-of-the-art algorithms.

References

[1]
Yingming Li, Ming Yang, and Zhongfei Zhang. 2018. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31, 10 (2018), 1863–1883.
[2]
Wenwu Zhu, Xin Wang, and Hongzhi Li. 2019. Multi-modal deep analysis for multimedia. IEEE Trans. Circ. Syst. Vid. Technol. 30, 10 (2019), 3740–3764.
[3]
Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. 2017. Multi-view learning overview: Recent progress and new challenges. Inf. Fusion 38 (2017), 43–54.
[4]
Jan Rupnik and John Shawe-Taylor. 2010. Multi-view canonical correlation analysis. In Conference on Data Mining and Data Warehouses (SiKDD’10). 1–4.
[5]
Yuxin Peng, Xin Huang, and Yunzhen Zhao. 2017. An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2017), 2372–2385.
[6]
Haifeng Hu. 2013. Multiview gait recognition based on patch distribution features and uncorrelated multilinear sparse local discriminant canonical correlation analysis. IEEE Trans. Circ. Syst. Vid. Technol. 24, 4 (2013), 617–630.
[7]
Ting-Kai Sun, Song-Can Chen, Zhong Jin, and Jing-Yu Yang. 2007. Kernelized discriminative canonical correlation analysis. In International Conference on Wavelet Analysis and Pattern Recognition, Vol. 3. IEEE, 1283–1287.
[8]
Yi-Ou Li, Tülay Adali, Wei Wang, and Vince D. Calhoun. 2009. Joint blind source separation by multiset canonical correlation analysis. IEEE Trans. Sig. Process. 57, 10 (2009), 3918–3929.
[9]
Lei Gao, Lin Qi, and Ling Guan. 2015. Sparsity preserving multiple canonical correlation analysis with visual emotion recognition to multi-feature fusion. In IEEE International Conference on Image Processing (ICIP’15). IEEE, 2710–2714.
[10]
Jia Chen, Gang Wang, and Georgios B. Giannakis. 2019. Graph multiview canonical correlation analysis. IEEE Trans. Sig. Process. 67, 11 (2019), 2826–2838.
[11]
Lei Gao, Lin Qi, Enqing Chen, and Ling Guan. 2017. Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans. Image Process. 27, 4 (2017), 1951–1965.
[12]
Xiaofei He and Partha Niyogi. 2003. Locality preserving projections. Adv. Neural Inf. Process. Syst. 16 (2003).
[13]
Devanjali Relan, Lucia Ballerini, Emanuele Trucco, and Tom MacGillivray. 2019. Using orthogonal locality preserving projections to find dominant features for classifying retinal blood vessels. Multim. Tools Applic. 78, 10 (2019), 12783–12803.
[14]
Lei Gao, Rui Zhang, Lin Qi, Enqing Chen, and Ling Guan. 2018. The labeled multiple canonical correlation analysis for information fusion. IEEE Trans. Multim. 21, 2 (2018), 375–387.
[15]
Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.
[16]
Cen Chen, Kenli Li, Wei Wei, Joey Tianyi Zhou, and Zeng Zeng. 2021. Hierarchical graph neural networks for few-shot learning. IEEE Trans. Circ. Syst. Vid. Technol. 32, 1 (2021), 240–252.
[17]
Dan Li, Haibao Wang, Yufeng Wang, and Shengpei Wang. 2023. Instance-wise multi-view representation learning. Inf. Fusion 91 (2023), 612–622.
[18]
Yijie Lin, Yuanbiao Gou, Xiaotian Liu, Jinfeng Bai, Jiancheng Lv, and Xi Peng. 2023. Dual contrastive prediction for incomplete multi-view representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 45, 4 (2023), 4447–4461.
[19]
Lei Gao and Ling Guan. 2023. A discriminant information theoretic learning framework for multi-modal feature representation. ACM Trans. Intell. Syst. Technol. 14, 3 (2023), 1–24.
[20]
Ren Wang, Haoliang Sun, Yuling Ma, Xiaoming Xi, and Yilong Yin. 2023. MetaViewer: Towards a unified multi-view representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11590–11599.
[21]
Qinghai Zheng, Jihua Zhu, Zhongyu Li, Zhiqiang Tian, and Chen Li. 2023. Comprehensive multi-view representation learning. Inf. Fusion 89 (2023), 198–209.
[22]
Guanqun Cao, Alexandros Iosifidis, and Moncef Gabbouj. 2017. Multi-view nonparametric discriminant analysis for image retrieval and recognition. IEEE Sig. Process. Lett. 24, 10 (2017), 1537–1541.
[23]
Heng Pan, Jinrong He, Yu Ling, Lie Ju, and Guoliang He. 2018. Graph regularized multiview marginal discriminant projection. J. Visual Commun. Image Represent. 57 (2018), 12–22.
[24]
Xinge You, Jiamiao Xu, Wei Yuan, Xiao-Yuan Jing, Dacheng Tao, and Taiping Zhang. 2019. Multi-view common component discriminant analysis for cross-view classification. Pattern Recog. 92 (2019), 37–51.
[25]
Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, and Yi Ma. 2015. PCANet: A simple deep learning baseline for image classification? IEEE Trans. Image Process. 24, 12 (2015), 5017–5032.
[26]
Yong Xu, Zuofeng Zhong, Jian Yang, Jane You, and David Zhang. 2016. A new discriminative sparse representation method for robust face recognition via \(l\_\){2}\(\) regularization. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (2016), 2233–2242.
[27]
Min Meng, Mengcheng Lan, Jun Yu, Jigang Wu, and Dapeng Tao. 2019. Constrained discriminative projection learning for image classification. IEEE Trans. Image Process. 29 (2019), 186–198.
[28]
Anu Singha, Mrinal Kanti Bhowmik, and Debotosh Bhattacherjee. 2020. Akin-based orthogonal space (AOS): A subspace learning method for face recognition. Multim. Tools Applic. 79, 47 (2020), 35069–35091.
[29]
M. R. Rejeesh. 2019. Interest point based face recognition using adaptive neuro fuzzy inference system. Multim. Tools Applic. 78, 16 (2019), 22691–22710.
[30]
Xiaojun Yang, Gang Liu, Qiang Yu, and Rong Wang. 2018. Stable and orthogonal local discriminant embedding using trace ratio criterion for dimensionality reduction. Multim. Tools Applic. 77, 3 (2018), 3071–3081.
[31]
Ming-Hua Wan and Zhi-Hui Lai. 2019. Generalized discriminant local median preserving projections (GDLMPP) for face recognition. Neural Process. Lett. 49, 3 (2019), 951–963.
[32]
Theofanis Sapatinas. 2005. Discriminant analysis and statistical pattern recognition. Journal of the Royal Statistical Society Series A: Statistics in Society 168, 3 (2005), 635–636.
[33]
Kah Phooi Seng, Li-Minn Ang, and Chien Shing Ooi. 2016. A combined rule-based & machine learning audio-visual emotion recognition approach. IEEE Trans. Affect. Comput. 9, 1 (2016), 3–13.
[34]
Zhan Wang, Lizhi Wang, and Hua Huang. 2020. Joint low rank embedded multiple features learning for audio–visual emotion recognition. Neurocomputing 388 (2020), 324–333.
[35]
Shiqing Zhang, Shiliang Zhang, Tiejun Huang, and Wen Gao. 2017. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans. Multim. 20, 6 (2017), 1576–1590.
[36]
Yaxiong Ma, Yixue Hao, Min Chen, Jincai Chen, Ping Lu, and Andrej Košir. 2019. Audio-visual emotion fusion (AVEF): A deep efficient weighted approach. Inf. Fusion 46 (2019), 184–192.
[37]
Ioannis Kansizoglou, Loukas Bampis, and Antonios Gasteratos. 2022. An active learning paradigm for online audio-visual emotion recognition. IEEE Trans. Affect. Comput. 13, 2 (2022), 756–768.
[38]
Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In AAAI Conference on Artificial Intelligence.
[39]
Junhao Liu, Min Yang, Chengming Li, and Ruifeng Xu. 2020. Improving cross-modal image-text retrieval with teacher-student learning. IEEE Trans. Circ. Syst. Vid. Technol. 31, 8 (2020), 3242–3253.
[40]
Cheng Wang, Haojin Yang, and Christoph Meinel. 2016. A deep semantic framework for multimodal representation learning. Multim. Tools Applic. 75, 15 (2016), 9255–9276.
[41]
Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2016. Deep coupled metric learning for cross-modal matching. IEEE Trans. Multim. 19, 6 (2016), 1234–1244.
[42]
Jun Xu, Wangpeng An, Lei Zhang, and David Zhang. 2019. Sparse, collaborative, or nonnegative representation: which helps pattern classification? Pattern Recog. 88 (2019), 679–688.
[43]
Rushi Lan and Yicong Zhou. 2017. An extended probabilistic collaborative representation based classifier for image classification. In IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 1392–1397.
[44]
Jie Wen, Xiaozhao Fang, Jinrong Cui, Lunke Fei, Ke Yan, Yan Chen, and Yong Xu. 2018. Robust sparse linear discriminant analysis. IEEE Trans. Circ. Syst. Vid. Technol. 29, 2 (2018), 390–403.
[45]
Chengyong Zheng and Ningning Wang. 2019. Collaborative representation with k-nearest classes for classification. Pattern Recog. Lett. 117 (2019), 30–36.
[46]
Feng Yang, Zheng Ma, and Mei Xie. 2021. Image classification with superpixels and feature fusion method. J. Electron. Sci. Technol. 19, 1 (2021), 100096.
[47]
Chunjie Zhang, Jian Cheng, and Qi Tian. 2017. Multiview label sharing for visual representations and classifications. IEEE Trans. Multim. 20, 4 (2017), 903–913.
[48]
Qingfeng Liu and Chengjun Liu. 2016. A novel locally linear KNN method with applications to visual recognition. IEEE Trans. Neural Netw. Learn. Syst. 28, 9 (2016), 2010–2021.
[49]
Chunjie Zhang, Jian Cheng, and Qi Tian. 2017. Structured weak semantic space construction for visual categorization. IEEE Trans. Neural Netw. Learn. Syst. 29, 8 (2017), 3442–3451.
[50]
Ammar Mahmood, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2020. ResFeats: Residual network based features for underwater image classification. Image Vis. Comput. 93 (2020), 103811.
[51]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[52]
Ammar Mahmood, Mohammed Bennamoun, Senjian An, and Ferdous Sohel. 2017. ResFeats: Residual network based features for image classification. In IEEE International Conference on Image Processing (ICIP’17). IEEE, 1597–1601.
[53]
Xiexing Feng, Q. M. Jonathan Wu, Yimin Yang, and Libo Cao. 2020. An autuencoder-based data augmentation strategy for generalization improvement of DCNNs. Neurocomputing 402 (2020), 283–297.
[54]
Wei Luo, Jun Li, Jian Yang, Wei Xu, and Jian Zhang. 2017. Convolutional sparse autoencoders for image classification. IEEE Trans. Neural Netw. Learn. Syst. 29, 7 (2017), 3289–3294.
[55]
Hao Tang, Hong Liu, Wei Xiao, and Nicu Sebe. 2020. When dictionary learning meets deep learning: Deep dictionary learning and coding network for image recognition with limited data. IEEE Trans. Neural Netw. Learn. Syst. 32, 5 (2020), 2129–2141.
[56]
Weifeng Ge and Yizhou Yu. 2017. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In IEEE Conference on Computer Vision and Pattern Recognition. 1086–1095.
[57]
Bingyan Liu, Yifeng Cai, Yao Guo, and Xiangqun Chen. 2021. TransTailor: Pruning the pre-trained model for improved transfer learning. In AAAI Conference on Artificial Intelligence. 8627–8634.
[58]
Ibrahim F. Jasim Ghalyan. 2020. Estimation of ergodicity limits of bag-of-words modeling for guaranteed stochastic convergence. Pattern Recog. 99 (2020), 107094.
[59]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Trans. Image Process. 40, 6 (2017), 1452–1464.
[60]
Yimin Yang and Q. M. Jonathan Wu. 2019. Features combined from hundreds of midlayers: Hierarchical networks with subnetwork nodes. IEEE Trans. Neural Netw. Learn. Syst. 30, 11 (2019), 3313–3325.
[61]
Wandong Zhang, Jonathan Wu, and Yimin Yang. 2020. Wi-HSNN: A subnetwork-based encoding structure for dimension reduction and food classification via harnessing multi-CNN model high-level features. Neurocomputing 414 (2020), 57–66.
[62]
Wandong Zhang, Q. M. Jonathan Wu, Yimin Yang, Thangarajah Akilan, and Hui Zhang. 2020. A width-growth model with subnetwork nodes and refinement structure for representation learning and image classification. IEEE Trans. Industr. Inform. 17, 3 (2020), 1562–1572.
[63]
Oscar Déniz, Gloria Bueno, Jesús Salido, and Fernando De la Torre. 2011. Face recognition using histograms of oriented gradients. Pattern Recog. Lett. 32, 12 (2011), 1598–1603.
[64]
Bo Yang and Songcan Chen. 2013. A comparative study on local binary pattern (LBP) based face recognition: LBP histogram versus LBP image. Neurocomputing 120 (2013), 365–379.
[65]
Chengjun Liu and Harry Wechsler. 2002. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 11, 4 (2002), 467–476.
[66]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
[67]
Yongjun Zhang, Wenjie Liu, Haisheng Fan, Yongjie Zou, Zhongwei Cui, and Qian Wang. 2022. Dictionary learning and face recognition based on sample expansion. Appl. Intell. 52, 4 (2022), 3766–3780.
[68]
Xuqin Wei, Yun Shi, Weiyin Gong, and Yanyun Guan. 2022. Improved image representation and sparse representation for face recognition. Multim. Tools Applic. (2022), 1–15.
[69]
Nazmin Begum and A. Syed Mustafa. 2022. A novel approach for multimodal facial expression recognition using deep learning techniques. Multim. Tools Applic. 81, 13 (2022), 18521–18529.
[70]
Susmini Indriani Lestariningati, Andriyan Bayu Suksmono, Ian Joseph Matheus Edward, and Koredianto Usman. 2022. Group class residual l 1-minimization on random projection sparse representation classifier for face recognition. Electronics 11, 17 (2022), 2723.
[71]
Yun Tie and Ling Guan. 2012. A deformable 3-D facial expression model for dynamic human emotional state recognition. IEEE Trans. Circ. Syst. Vid. Technol. 23, 1 (2012), 142–157.
[72]
Bangalore S. Manjunath and Wei-Ying Ma. 1996. Texture features for browsing and retrieval of image data. IEEE Trans. Image Process. 18, 8 (1996), 837–842.
[73]
Anwer Slimi, Mounir Zrigui, and Henri Nicolas. 2022. MuLER: Multiplet-loss for emotion recognition. In International Conference on Multimedia Retrieval. 435–442.
[74]
Shabina Bhaskar and T. M. Thasleema. 2023. LSTM model for visual speech recognition through facial expressions. Multim. Tools Applic. 82, 4 (2023), 5455–5472.
[75]
Khanh-Duy Nguyen, Duy-Dinh Le, and Duc Anh Duong. 2013. Efficient traffic sign detection using bag of visual words and multi-scales sift. In International Conference on Neural Information Processing. Springer, 433–441.
[76]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, Jan. (2003), 993–1022.
[77]
Abhishek Sharma and David W. Jacobs. 2011. Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In Conference on Computer Vision and Pattern Recognition. IEEE, 593–600.
[78]
Ran He, Man Zhang, Liang Wang, Ye Ji, and Qiyue Yin. 2015. Cross-modal subspace learning via pairwise constraints. IEEE Trans. Image Process. 24, 12 (2015), 5543–5556.
[79]
Meixiang Xu, Zhenfeng Zhu, Xingxing Zhang, Yao Zhao, and Xuelong Li. 2019. Canonical correlation analysis with l 2, 1-norm for multiview data representation. IEEE Trans. Cybern. 50, 11 (2019), 4772–4782.
[80]
Xiaoqiang Yan, Yangdong Ye, Yiqiao Mao, and Hui Yu. 2019. Shared-private information bottleneck method for cross-modal clustering. IEEE Access 7 (2019), 36045–36056.
[81]
Lei Gao and Ling Guan. 2021. A discriminative vectorial framework for multi-modal feature representation. IEEE Trans. Multim. 24 (2021), 1503–1514.
[82]
Lei Gao, Lin Qi, and Ling Guan. 2021. A discriminant kernel entropy-based framework for feature representation learning. J. Vis. Commun. Image Represent. 81 (2021), 103366.
[83]
Ibrahim F. Ghalyan. 2023. Capacitive empirical risk function-based bag-of-words and pattern classification processes. Pattern Recog. 139 (2023), 109482.
[84]
Xingjian Li, Haoyi Xiong, Zeyu Chen, Jun Huan, Ji Liu, Cheng-Zhong Xu, and Dejing Dou. 2021. Knowledge distillation with attention for deep transfer learning of convolutional networks. ACM Trans. Knowl. Discov. Data 16, 3 (2021), 1–20.
[85]
Xuhong Li, Yves Grandvalet, and Franck Davoine. 2020. A baseline regularization scheme for transfer learning with convolutional neural networks. Pattern Recog. 98 (2020), 107049.
[86]
Haoyi Xiong, Ruosi Wan, Jian Zhao, Zeyu Chen, Xingjian Li, Zhanxing Zhu, and Jun Huan. 2022. GrOD: Deep learning with gradients orthogonal decomposition for knowledge transfer, distillation, and adversarial training. ACM Trans. Knowl. Discov. Data 16, 6 (2022), 1–25.
[87]
Yang Zhong and Atsuto Maki. 2020. Regularizing CNN transfer learning with randomised regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13637–13646.
[88]
Zengmao Wang, Zixi Chen, and Bo Du. 2023. Active learning with co-auxiliary learning and multi-level diversity for image classification. IEEE Trans. Circ. Syst. Vid. Technol. 33, 8 (2023), 3899–3911.
[89]
Thomas Melzer, Michael Reiter, and Horst Bischof. 2003. Appearance models based on kernel canonical correlation analysis. Pattern Recog. 36, 9 (2003), 1961–1971.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 7
July 2024
973 pages
EISSN:1551-6865
DOI:10.1145/3613662
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2024
Online AM: 27 February 2024
Accepted: 11 February 2024
Revised: 10 February 2024
Received: 05 February 2023
Published in TOMM Volume 20, Issue 7

Check for updates

Author Tags

  1. Multi-view feature representation
  2. graph model
  3. semantic correlation
  4. data visualization
  5. face recognition
  6. emotion recognition
  7. text-image recognition
  8. object recognition

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 176
    Total Downloads
  • Downloads (Last 12 months)176
  • Downloads (Last 6 weeks)28
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media