Skip to main content

Advertisement

Log in

NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis. However, the annotation of large-scale datasets is expensive and time consuming. Instead, it is easy to obtain weakly labeled web images from the Internet. However, noisy labels still lead to seriously degraded performance when we use images directly from the web for training networks. To address this drawback, we propose an end-to-end weakly supervised learning network, which is robust to mislabeled web images. Specifically, the proposed attention module automatically eliminates the distraction of those samples with incorrect labels by reducing their attention scores in the training process. On the other hand, the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach. Besides the process of feature learning, applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids. Quantitative and qualitative evaluations on well- and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Borth D, Ji RR, Chen T, et al., 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proc 21st ACM Int Conf on Multimedia, p.223–232. https://doi.org/10.1145/2502081.2502282

  • Campos V, Salvador A, Giró-i-Nieto X, et al., 2015. Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. https://arxiv.org/abs/1508.05056

  • Campos V, Jou B, Giró-i-Nieto X, 2017. From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis Comput, 65:15–22. https://doi.org/10.1016/j.imavis.2017.01.011

    Article  Google Scholar 

  • Chang CC, Lin CJ, 2011. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol, 2(3):27. https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  • Chen L, Zhang HW, Xiao J, et al., 2017. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6298–6306. https://doi.org/10.1109/CVPR.2017.667

  • Chen SX, Zhang CJ, Dong M, et al., 2017. Using ranking-CNN for age estimation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.742–751. https://doi.org/10.1109/CVPR.2017.86

  • Chen SX, Zhang CJ, Dong M, 2018a. Coupled end-to-end transfer learning with generalized Fisher information. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4329–4338. https://doi.org/10.1109/CVPR.2018.00455

  • Chen SX, Zhang CJ, Dong M, 2018b. Deep age estimation: from classification to ranking. IEEE Trans Multim, 20(8):2209–2222. https://doi.org/10.1109/TMM.2017.2786869

    Article  Google Scholar 

  • Chen T, Borth D, Darrell T, et al., 2014a. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. https://arxiv.org/abs/1410.8586

  • Chen T, Yu FX, Chen JW, et al., 2014b. Object-based visual sentiment concept analysis and application. Proc 22nd ACM Int Conf on Multimedia, p.367–376. https://doi.org/10.1145/2647868.2654935

  • Corbetta M, Shulman GL, 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci, 3(3):201–205. https://doi.org/10.1038/nrn755

    Article  Google Scholar 

  • Deng J, Dong W, Socher R, et al., 2009. ImageNet: a large-scale hierarchical image database. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.248–255. https://doi.org/10.1109/CVPRW.2009.5206848

  • Durand T, Mordan T, Thome N, et al., 2017. WILDCAT: weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5957–5966. https://doi.org/10.1109/CVPR.2017.631

  • Fang Y, Tan H, Zhang J, 2018. Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness. IEEE Access, 6:20625–20631. https://doi.org/10.1109/ACCESS.2018.2820025

    Article  Google Scholar 

  • Girshick R, Donahue J, Darrell T, et al., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.580–587. https://doi.org/10.1109/CVPR.2014.81

  • He XT, Peng YX, 2017. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. Proc 21st AAAI Conf on Artificial Intelligence, p.4075–4081.

  • He XT, Peng YX, Zhao JJ, 2019. Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circ Syst Video Technol, 29(5):1394–1407. https://doi.org/10.1109/TCSVT.2018.2834480

    Article  Google Scholar 

  • Hinton GE, 2008. Visualizing high-dimensional data using t-SNE. Vigil Christ, 9(2):2579–2605.

    MATH  Google Scholar 

  • Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7132–7141. https://doi.org/10.1109/CVPR.2018.00745

  • Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2261–2269. https://doi.org/10.1109/CVPR.2017.243

  • Itti L, Koch C, Niebur E, 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell, 20(11):1254–1259. https://doi.org/10.1109/34.730558

    Article  Google Scholar 

  • Jia XB, Jin Y, Li N, et al., 2018. Words alignment based on association rules for cross-domain sentiment classification. Front Inform Technol Electron Eng, 19(2):260–272. https://doi.org/10.1631/FITEE.1601679

    Article  Google Scholar 

  • Katsurai M, Satoh S, 2016. Image sentiment analysis using latent correlations among visual, textual, and sentiment views. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2837–2841. https://doi.org/10.1109/ICASSP.2016.7472195

  • Krizhevsky A, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009, University of Toronto, Toronto, Canada.

    Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  • LeCun Y, Boser B, Denker JS, et al., 1989. Backpropagation applied to handwritten zip code recognition. Neur Comput, 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  • Li ZH, Fan YY, Liu WH, et al., 2018. Image sentiment prediction based on textual descriptions with adjective noun pairs. Multim Tools Appl, 77(1):1115–1132. https://doi.org/10.1007/s11042-016-4310-5

    Article  Google Scholar 

  • Liu GL, Reda FA, Shih KJ, et al., 2018. Image inpainting for irregular holes using partial convolutions. Proc 15th European Conf on Computer Vision, p.89–105. https://doi.org/10.1007/978-3-030-01252-6_6

  • Machajdik J, Hanbury A, 2010. Affective image classification using features inspired by psychology and art theory. Proc 18th ACM Int Conf on Multimedia, p.83–92. https://doi.org/10.1145/1873951.1873965

  • Mikels JA, Fredrickson BL, Larkin GR, et al., 2005. Emotional category data on images from the international affective picture system. Behav Res Methods, 37(4):626–630. https://doi.org/10.3758/BF03192732

    Article  Google Scholar 

  • Oquab M, Bottou L, Laptev I, et al., 2015. Is object localization for free?—Weakly-supervised learning with convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.685–694. https://doi.org/10.1109/CVPR.2015.7298668

  • Ou WH, Luan X, Gou JP, et al., 2018. Robust discriminative nonnegative dictionary learning for occluded face recognition. Patt Recogn Lett, 107:41–49. https://doi.org/10.1016/j.patrec.2017.07.006

    Article  Google Scholar 

  • Park J, Woo S, Lee JY, et al., 2018. BAM: bottleneck attention module. Proc British Machine Vision Conf, Article 147.

  • Peng KC, Sadovnik A, Gallagher A, et al., 2016. Where do emotions come from? Predicting the emotion stimuli map. Proc IEEE Int Conf on Image Processing, p.614–618. https://doi.org/10.1109/ICIP.2016.7532430

  • Peng YX, Qi JW, Zhuo YK, 2019a. MAVA: multi-level adaptive visual-textual alignment by cross-media biattention mechanism. IEEE Trans Image Process, 29: 2728–2741. https://doi.org/10.1109/TIP.2019.2952085

    Article  Google Scholar 

  • Peng YX, Zhao YZ, Zhang JC, 2019b. Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circ Syst Video Technol, 29(3):773–786. https://doi.org/10.1109/TCSVT.2018.2808685

    Article  Google Scholar 

  • Rohrbach A, Rohrbach M, Hu RH, et al., 2016. Grounding of textual phrases in images by reconstruction. Proc 14th European Conf on Computer Vision, p.817–834. https://doi.org/10.1007/978-3-319-46448-0_49

  • Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556

  • Sun M, Yang JF, Wang K, et al., 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. Proc IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2016.7552961

  • Szegedy C, Vanhoucke V, Ioffe S, et al., 2016. Rethinking the inception architecture for computer vision. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2818–2826. https://doi.org/10.1109/CVPR.2016.308

  • Wang F, Jiang MQ, Qian C, et al., 2017. Residual attention network for image classification. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6450–6458. https://doi.org/10.1109/CVPR.2017.683

  • Woo S, Park J, Lee JY, et al., 2018. CBAM: convolutional block attention module. Proc 15th European Conf on Computer Vision, p.3–19. https://doi.org/10.1007/978-3-030-01234-2_1

  • Xiao FY, Sigal L, Lee YJ, 2017. Weakly-supervised visual grounding of phrases with linguistic structures. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5253–5262. https://doi.org/10.1109/CVPR.2017.558

  • Yang JF, She DY, Sun M, 2017a. Joint image emotion classification and distribution learning via deep convolutional neural network. Proc 26th Int Joint Conf on Artificial Intelligence, p.3266–3272. https://doi.org/10.24963/ijcai.2017/456

  • Yang JF, Sun M, Sun XX, 2017b. Learning visual sentiment distributions via augmented conditional probability neural network. Proc 31st AAAI Conf on Artificial Intelligence, p.224–230.

  • Yang JF, She DY, Sun M, et al., 2018a. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multim, 20(9):2513–2525. https://doi.org/10.1109/TMM.2018.2803520

    Article  Google Scholar 

  • Yang JF, She DY, Lai YK, et al., 2018b. Weakly supervised coupled networks for visual sentiment analysis. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7584–7592. https://doi.org/10.1109/CVPR.2018.00791

  • You QZ, Luo JB, Jin HL, et al., 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. Proc 29th AAAI Conf on Artificial Intelligence, p.381–388.

  • You QZ, Luo JB, Jin HL, et al., 2016. Building a large scale dataset for image emotion recognition: the fine print and the benchmark. Proc 30th AAAI Conf on Artificial Intelligence, p.308–314.

  • You QZ, Jin HL, Luo JB, 2017. Visual sentiment analysis by attending on local image regions. Proc 31st AAAI Conf on Artificial Intelligence, p.231–237.

  • Yu JH, Lin Z, Yang JM, et al., 2018. Generative image inpainting with contextual attention. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5505–5514. https://doi.org/10.1109/CVPR.2018.00577

  • Yuan JB, Mcdonough S, You QZ, et al., 2013. Sentribute: image sentiment analysis from a mid-level perspective.

  • Proc 2nd Int Workshop on Issues of Sentiment Discovery and Opinion Mining, p.1–8. https://doi.org/10.1145/2502069.2502079

  • Zagoruyko S, Komodakis N, 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. https://arxiv.org/abs/1612.03928

  • Zeng SN, Gou JP, Yang X, 2018. Improving sparsity of coefficients for robust sparse and collaborative representation-based image classification. Neur Comput Appl, 30(10):2965–2978. https://doi.org/10.1007/s00521-017-2900-4

    Article  Google Scholar 

  • Zhang FF, Mao QR, Shen XJ, et al., 2018a. Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans Multim Comput Commun Appl, 14(1s):27. https://doi.org/10.1145/3176646

    Article  Google Scholar 

  • Zhang FF, Zhang TZ, Mao QR, et al., 2018b. Joint pose and expression modeling for facial expression recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3359–3368. https://doi.org/10.1109/CVPR.2018.00354

  • Zhang N, Donahue J, Girshick R, et al., 2014. Part-based R-CNNs for fine-grained category detection. Proc 13th European Conf on Computer Vision, p.834–849. https://doi.org/10.1007/978-3-319-10590-1_54

  • Zhang QS, Zhu SC, 2018. Visual interpretability for deep learning: a survey. Front Inform Technol Electron Eng, 19(1):27–39. https://doi.org/10.1631/FITEE.1700808

    Article  Google Scholar 

  • Zhao SC, Gao Y, Jiang XL, et al., 2014. Exploring principles-of-art features for image emotion recognition. Proc 22nd ACM Int Conf on Multimedia, p.47–56. https://doi.org/10.1145/2647868.2654930

  • Zhao SC, Yao HX, Gao Y, et al., 2016. Predicting personalized emotion perceptions of social images. Proc 24th ACM Int Conf on Multimedia, p.1385–1394. https://doi.org/10.1145/2964284.2964289

  • Zhao SC, Ding GG, Gao Y, et al., 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. Proc 26th Int Joint Conf on Artificial Intelligence, p.4669–4675. https://doi.org/10.24963/ijcai.2017/651

  • Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921–2929. https://doi.org/10.1109/CVPR.2016.319

  • Zhu Y, Zhou YZ, Ye QX, et al., 2017. Soft proposal networks for weakly supervised object localization. Proc IEEE Int Conf on Computer Vision, p.1859–1868. https://doi.org/10.1109/ICCV.2017.204

  • Zhu YK, Groth O, Bernstein M, et al., 2016. Visual7W: grounded question answering in images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4995–5004. https://doi.org/10.1109/CVPR.2016.540

  • Zhuang BH, Liu LQ, Li Y, et al., 2017. Attend in groups: a weakly-supervised deep learning framework for learning from web data. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2915–2924. https://doi.org/10.1109/CVPR.2017.311

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi-rong Mao.

Additional information

Project supported by the Key Project of the National Natural Science Foundation of China (No. U1836220), the National Natural Science Foundation of China (No. 61672267), the Qing Lan Talent Program of Jiangsu Province, China, the Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, China, the Finnish Cultural Foundation, the Jiangsu Specially-Appointed Professor Program, China (No. 3051107219003), the Jiangsu Joint Research Project of Sino-Foreign Cooperative Education Platform, China, and the Talent Startup Project of Nanjing Institute of Technology, China (No. YKJ201982)

Contributors

Luo-yang XUE designed the research and drafted the manuscript. Qi-rong MAO and Xiao-hua HUANG helped organize the manuscript. Jie CHEN participated in the experiments. Luo-yang XUE and Qi-rong MAO revised and finalized the paper.

Compliance with ethics guidelines

Luo-yang XUE, Qi-rong MAO, Xiao-hua HUANG, and Jie CHEN declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, Ly., Mao, Qr., Huang, Xh. et al. NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images. Front Inform Technol Electron Eng 21, 1321–1333 (2020). https://doi.org/10.1631/FITEE.1900618

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1900618

Key words

CLC number