NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images

Xue, Luo-yang; Mao, Qi-rong; Huang, Xiao-hua; Chen, Jie

doi:10.1631/FITEE.1900618

NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images

Published: 17 September 2020

Volume 21, pages 1321–1333, (2020)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

194 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis. However, the annotation of large-scale datasets is expensive and time consuming. Instead, it is easy to obtain weakly labeled web images from the Internet. However, noisy labels still lead to seriously degraded performance when we use images directly from the web for training networks. To address this drawback, we propose an end-to-end weakly supervised learning network, which is robust to mislabeled web images. Specifically, the proposed attention module automatically eliminates the distraction of those samples with incorrect labels by reducing their attention scores in the training process. On the other hand, the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach. Besides the process of feature learning, applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids. Quantitative and qualitative evaluations on well- and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SentiNet: Mining Visual Sentiment from Scratch

Weighted Co-Training for Cross-Domain Image Sentiment Classification

Article 14 July 2017

Weakly supervised discriminate enhancement network for visual sentiment analysis

Article 18 June 2022

References

Borth D, Ji RR, Chen T, et al., 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proc 21^st ACM Int Conf on Multimedia, p.223–232. https://doi.org/10.1145/2502081.2502282
Campos V, Salvador A, Giró-i-Nieto X, et al., 2015. Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. https://arxiv.org/abs/1508.05056
Campos V, Jou B, Giró-i-Nieto X, 2017. From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis Comput, 65:15–22. https://doi.org/10.1016/j.imavis.2017.01.011
Article Google Scholar
Chang CC, Lin CJ, 2011. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol, 2(3):27. https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Chen L, Zhang HW, Xiao J, et al., 2017. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6298–6306. https://doi.org/10.1109/CVPR.2017.667
Chen SX, Zhang CJ, Dong M, et al., 2017. Using ranking-CNN for age estimation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.742–751. https://doi.org/10.1109/CVPR.2017.86
Chen SX, Zhang CJ, Dong M, 2018a. Coupled end-to-end transfer learning with generalized Fisher information. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4329–4338. https://doi.org/10.1109/CVPR.2018.00455
Chen SX, Zhang CJ, Dong M, 2018b. Deep age estimation: from classification to ranking. IEEE Trans Multim, 20(8):2209–2222. https://doi.org/10.1109/TMM.2017.2786869
Article Google Scholar
Chen T, Borth D, Darrell T, et al., 2014a. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. https://arxiv.org/abs/1410.8586
Chen T, Yu FX, Chen JW, et al., 2014b. Object-based visual sentiment concept analysis and application. Proc 22^nd ACM Int Conf on Multimedia, p.367–376. https://doi.org/10.1145/2647868.2654935
Corbetta M, Shulman GL, 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci, 3(3):201–205. https://doi.org/10.1038/nrn755
Article Google Scholar
Deng J, Dong W, Socher R, et al., 2009. ImageNet: a large-scale hierarchical image database. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.248–255. https://doi.org/10.1109/CVPRW.2009.5206848
Durand T, Mordan T, Thome N, et al., 2017. WILDCAT: weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5957–5966. https://doi.org/10.1109/CVPR.2017.631
Fang Y, Tan H, Zhang J, 2018. Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness. IEEE Access, 6:20625–20631. https://doi.org/10.1109/ACCESS.2018.2820025
Article Google Scholar
Girshick R, Donahue J, Darrell T, et al., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.580–587. https://doi.org/10.1109/CVPR.2014.81
He XT, Peng YX, 2017. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. Proc 21^st AAAI Conf on Artificial Intelligence, p.4075–4081.
He XT, Peng YX, Zhao JJ, 2019. Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circ Syst Video Technol, 29(5):1394–1407. https://doi.org/10.1109/TCSVT.2018.2834480
Article Google Scholar
Hinton GE, 2008. Visualizing high-dimensional data using t-SNE. Vigil Christ, 9(2):2579–2605.
MATH Google Scholar
Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2261–2269. https://doi.org/10.1109/CVPR.2017.243
Itti L, Koch C, Niebur E, 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell, 20(11):1254–1259. https://doi.org/10.1109/34.730558
Article Google Scholar
Jia XB, Jin Y, Li N, et al., 2018. Words alignment based on association rules for cross-domain sentiment classification. Front Inform Technol Electron Eng, 19(2):260–272. https://doi.org/10.1631/FITEE.1601679
Article Google Scholar
Katsurai M, Satoh S, 2016. Image sentiment analysis using latent correlations among visual, textual, and sentiment views. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2837–2841. https://doi.org/10.1109/ICASSP.2016.7472195
Krizhevsky A, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009, University of Toronto, Toronto, Canada.
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
LeCun Y, Boser B, Denker JS, et al., 1989. Backpropagation applied to handwritten zip code recognition. Neur Comput, 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
Li ZH, Fan YY, Liu WH, et al., 2018. Image sentiment prediction based on textual descriptions with adjective noun pairs. Multim Tools Appl, 77(1):1115–1132. https://doi.org/10.1007/s11042-016-4310-5
Article Google Scholar
Liu GL, Reda FA, Shih KJ, et al., 2018. Image inpainting for irregular holes using partial convolutions. Proc 15^th European Conf on Computer Vision, p.89–105. https://doi.org/10.1007/978-3-030-01252-6_6
Machajdik J, Hanbury A, 2010. Affective image classification using features inspired by psychology and art theory. Proc 18^th ACM Int Conf on Multimedia, p.83–92. https://doi.org/10.1145/1873951.1873965
Mikels JA, Fredrickson BL, Larkin GR, et al., 2005. Emotional category data on images from the international affective picture system. Behav Res Methods, 37(4):626–630. https://doi.org/10.3758/BF03192732
Article Google Scholar
Oquab M, Bottou L, Laptev I, et al., 2015. Is object localization for free?—Weakly-supervised learning with convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.685–694. https://doi.org/10.1109/CVPR.2015.7298668
Ou WH, Luan X, Gou JP, et al., 2018. Robust discriminative nonnegative dictionary learning for occluded face recognition. Patt Recogn Lett, 107:41–49. https://doi.org/10.1016/j.patrec.2017.07.006
Article Google Scholar
Park J, Woo S, Lee JY, et al., 2018. BAM: bottleneck attention module. Proc British Machine Vision Conf, Article 147.
Peng KC, Sadovnik A, Gallagher A, et al., 2016. Where do emotions come from? Predicting the emotion stimuli map. Proc IEEE Int Conf on Image Processing, p.614–618. https://doi.org/10.1109/ICIP.2016.7532430
Peng YX, Qi JW, Zhuo YK, 2019a. MAVA: multi-level adaptive visual-textual alignment by cross-media biattention mechanism. IEEE Trans Image Process, 29: 2728–2741. https://doi.org/10.1109/TIP.2019.2952085
Article Google Scholar
Peng YX, Zhao YZ, Zhang JC, 2019b. Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circ Syst Video Technol, 29(3):773–786. https://doi.org/10.1109/TCSVT.2018.2808685
Article Google Scholar
Rohrbach A, Rohrbach M, Hu RH, et al., 2016. Grounding of textual phrases in images by reconstruction. Proc 14^th European Conf on Computer Vision, p.817–834. https://doi.org/10.1007/978-3-319-46448-0_49
Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
Sun M, Yang JF, Wang K, et al., 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. Proc IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2016.7552961
Szegedy C, Vanhoucke V, Ioffe S, et al., 2016. Rethinking the inception architecture for computer vision. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2818–2826. https://doi.org/10.1109/CVPR.2016.308
Wang F, Jiang MQ, Qian C, et al., 2017. Residual attention network for image classification. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6450–6458. https://doi.org/10.1109/CVPR.2017.683
Woo S, Park J, Lee JY, et al., 2018. CBAM: convolutional block attention module. Proc 15^th European Conf on Computer Vision, p.3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Xiao FY, Sigal L, Lee YJ, 2017. Weakly-supervised visual grounding of phrases with linguistic structures. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5253–5262. https://doi.org/10.1109/CVPR.2017.558
Yang JF, She DY, Sun M, 2017a. Joint image emotion classification and distribution learning via deep convolutional neural network. Proc 26^th Int Joint Conf on Artificial Intelligence, p.3266–3272. https://doi.org/10.24963/ijcai.2017/456
Yang JF, Sun M, Sun XX, 2017b. Learning visual sentiment distributions via augmented conditional probability neural network. Proc 31^st AAAI Conf on Artificial Intelligence, p.224–230.
Yang JF, She DY, Sun M, et al., 2018a. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multim, 20(9):2513–2525. https://doi.org/10.1109/TMM.2018.2803520
Article Google Scholar
Yang JF, She DY, Lai YK, et al., 2018b. Weakly supervised coupled networks for visual sentiment analysis. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7584–7592. https://doi.org/10.1109/CVPR.2018.00791
You QZ, Luo JB, Jin HL, et al., 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. Proc 29^th AAAI Conf on Artificial Intelligence, p.381–388.
You QZ, Luo JB, Jin HL, et al., 2016. Building a large scale dataset for image emotion recognition: the fine print and the benchmark. Proc 30^th AAAI Conf on Artificial Intelligence, p.308–314.
You QZ, Jin HL, Luo JB, 2017. Visual sentiment analysis by attending on local image regions. Proc 31^st AAAI Conf on Artificial Intelligence, p.231–237.
Yu JH, Lin Z, Yang JM, et al., 2018. Generative image inpainting with contextual attention. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5505–5514. https://doi.org/10.1109/CVPR.2018.00577
Yuan JB, Mcdonough S, You QZ, et al., 2013. Sentribute: image sentiment analysis from a mid-level perspective.
Proc 2^nd Int Workshop on Issues of Sentiment Discovery and Opinion Mining, p.1–8. https://doi.org/10.1145/2502069.2502079
Zagoruyko S, Komodakis N, 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. https://arxiv.org/abs/1612.03928
Zeng SN, Gou JP, Yang X, 2018. Improving sparsity of coefficients for robust sparse and collaborative representation-based image classification. Neur Comput Appl, 30(10):2965–2978. https://doi.org/10.1007/s00521-017-2900-4
Article Google Scholar
Zhang FF, Mao QR, Shen XJ, et al., 2018a. Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans Multim Comput Commun Appl, 14(1s):27. https://doi.org/10.1145/3176646
Article Google Scholar
Zhang FF, Zhang TZ, Mao QR, et al., 2018b. Joint pose and expression modeling for facial expression recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3359–3368. https://doi.org/10.1109/CVPR.2018.00354
Zhang N, Donahue J, Girshick R, et al., 2014. Part-based R-CNNs for fine-grained category detection. Proc 13^th European Conf on Computer Vision, p.834–849. https://doi.org/10.1007/978-3-319-10590-1_54
Zhang QS, Zhu SC, 2018. Visual interpretability for deep learning: a survey. Front Inform Technol Electron Eng, 19(1):27–39. https://doi.org/10.1631/FITEE.1700808
Article Google Scholar
Zhao SC, Gao Y, Jiang XL, et al., 2014. Exploring principles-of-art features for image emotion recognition. Proc 22^nd ACM Int Conf on Multimedia, p.47–56. https://doi.org/10.1145/2647868.2654930
Zhao SC, Yao HX, Gao Y, et al., 2016. Predicting personalized emotion perceptions of social images. Proc 24^th ACM Int Conf on Multimedia, p.1385–1394. https://doi.org/10.1145/2964284.2964289
Zhao SC, Ding GG, Gao Y, et al., 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. Proc 26^th Int Joint Conf on Artificial Intelligence, p.4669–4675. https://doi.org/10.24963/ijcai.2017/651
Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921–2929. https://doi.org/10.1109/CVPR.2016.319
Zhu Y, Zhou YZ, Ye QX, et al., 2017. Soft proposal networks for weakly supervised object localization. Proc IEEE Int Conf on Computer Vision, p.1859–1868. https://doi.org/10.1109/ICCV.2017.204
Zhu YK, Groth O, Bernstein M, et al., 2016. Visual7W: grounded question answering in images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4995–5004. https://doi.org/10.1109/CVPR.2016.540
Zhuang BH, Liu LQ, Li Y, et al., 2017. Attend in groups: a weakly-supervised deep learning framework for learning from web data. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2915–2924. https://doi.org/10.1109/CVPR.2017.311

Download references

Author information

Authors and Affiliations

Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, China
Luo-yang Xue, Qi-rong Mao & Jie Chen
Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang, 212013, China
Qi-rong Mao
School of Computer Engineering, Nanjing Institute of Technology, Nanjing, 211167, China
Xiao-hua Huang
Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, 8000, Finland
Xiao-hua Huang

Authors

Luo-yang Xue
View author publications
You can also search for this author inPubMed Google Scholar
Qi-rong Mao
View author publications
You can also search for this author inPubMed Google Scholar
Xiao-hua Huang
View author publications
You can also search for this author inPubMed Google Scholar
Jie Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qi-rong Mao.

Additional information

Project supported by the Key Project of the National Natural Science Foundation of China (No. U1836220), the National Natural Science Foundation of China (No. 61672267), the Qing Lan Talent Program of Jiangsu Province, China, the Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, China, the Finnish Cultural Foundation, the Jiangsu Specially-Appointed Professor Program, China (No. 3051107219003), the Jiangsu Joint Research Project of Sino-Foreign Cooperative Education Platform, China, and the Talent Startup Project of Nanjing Institute of Technology, China (No. YKJ201982)

Contributors

Luo-yang XUE designed the research and drafted the manuscript. Qi-rong MAO and Xiao-hua HUANG helped organize the manuscript. Jie CHEN participated in the experiments. Luo-yang XUE and Qi-rong MAO revised and finalized the paper.

Compliance with ethics guidelines

Luo-yang XUE, Qi-rong MAO, Xiao-hua HUANG, and Jie CHEN declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, Ly., Mao, Qr., Huang, Xh. et al. NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images. Front Inform Technol Electron Eng 21, 1321–1333 (2020). https://doi.org/10.1631/FITEE.1900618

Download citation

Received: 12 November 2019
Accepted: 03 February 2020
Published: 17 September 2020
Issue Date: September 2020
DOI: https://doi.org/10.1631/FITEE.1900618

Key words

CLC number

TP391.4

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SentiNet: Mining Visual Sentiment from Scratch

Weighted Co-Training for Cross-Domain Image Sentiment Classification

Weakly supervised discriminate enhancement network for visual sentiment analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Contributors

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now