Abstract
For the image classification task, usually, the image collected in the wild contains multiple objects instead of a single dominant one. Besides, the image label is not explicitly associated with the object region, i.e., it is weakly annotated. In this paper, we propose a novel deep convolutional network for image classification under a weakly supervised condition. The proposed method, namely MIDCN, formulate the problem into Multiple Instance Learning (MIL), where each image is a bag which contains multiple instances (objects). Different with previous deep MIL methods which predict the label of each bag (i.e., image) by simply performing pooling/voting strategy over their instance (i.e., region) predictions, MIDCN directly predicts the label of a bag via bag features learned by measuring the similarities between instance features and a set of learned informative prototypes. Specifically, the prototypes are obtained by a newly proposed Global Contrast Pooling (GCP) layer which leverages instances not only coming from the current bag but also the other bags. Thus the learned bag features also contain global information of all the training bags, which is more robust and noise free. We did extensive experiments on two real-world image datasets, including both natural image dataset (PASCAL VOC 07) and pathological lung cancer image dataset, and show the results of the proposed MIDCN consistently outperforms the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amores, J.: Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013). https://doi.org/10.1016/j.artint.2013.06.003
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, Vancouver, BC, Canada, 9–14 December 2002, pp. 561–568 (2002)
Babenko, B., Verma, N., Dollár, P., Belongie, S.J.: Multiple instance learning with manifold bags. In: ICML 2011, Bellevue, WA, USA, 28 June–2 July 2011, pp. 81–88 (2011)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM TIST 2(3), 27 (2011). https://doi.org/10.1145/1961189.1961199
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
Cheng, M., Zhang, Z., Lin, W., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300 fps. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 3286–3293 (2014). https://doi.org/10.1109/CVPR.2014.414
Everingham, M., Eslami, S.M.A., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5
Feng, J., Zhou, Z.H.: Deep MIML network. In: AAAI, pp. 1884–1890 (2017)
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hoffman, J., Pathak, D., Darrell, T., Saenko, K.: Detector discovery in the wild: joint multiple instance and representation learning. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 2883–2891 (2015). https://doi.org/10.1109/CVPR.2015.7298906
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, Orlando, FL, USA, 03–07 November 2014, pp. 675–678 (2014). https://doi.org/10.1145/2647868.2654889
Karpathy, A., Li, F.: Deep visual-semantic alignments for generating image descriptions. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 3128–3137 (2015). https://doi.org/10.1109/CVPR.2015.7298932
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, Lake Tahoe, NV, USA, 3–6 December 2012, pp. 1106–1114 (2012)
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
Liu, M., Zhang, D., Shen, D.: Ensemble sparse classification of Alzheimer’s disease. NeuroImage 60(2), 1106–1116 (2012). https://doi.org/10.1016/j.neuroimage.2012.01.055
Mittelman, R., Lee, H., Kuipers, B., Savarese, S.: Weakly supervised learning of mid-level features with Beta-Bernoulli process restricted Boltzmann machines. In: CVPR, Portland, OR, USA, 23–28 June 2013, pp. 476–483 (2013). https://doi.org/10.1109/CVPR.2013.68
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 1717–1724 (2014). https://doi.org/10.1109/CVPR.2014.222
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free? Weakly-supervised learning with convolutional neural networks. In: CVPR, Boston, USA, June 2015
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017)
Pathak, D., Krähenbühl, P., Darrell, T.: Constrained convolutional neural networks for weakly supervised segmentation. In: ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 1796–1804 (2015). https://doi.org/10.1109/ICCV.2015.209
Pinheiro, P.H.O., Collobert, R.: From image-level to pixel-level labeling with convolutional networks. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 1713–1721 (2015). https://doi.org/10.1109/CVPR.2015.7298780
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR abs/1312.6229 (2013)
Shi, Y., Gao, Y., Yang, Y., Zhang, Y., Wang, D.: Multimodal sparse representation-based classification for lung needle biopsy images. IEEE Trans. Biomed. Eng. 60(10), 2675–2685 (2013). https://doi.org/10.1109/TBME.2013.2262099
Sun, M., Han, T.X., Liu, M.C., Khodayari-Rostamabad, A.: Multiple instance learning convolutional neural networks for object recognition. In: 2016 International Conference on Pattern Recognition, pp. 3270–3275. IEEE (2016)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220
Wei, Y., et al.: CNN: single-label to multi-label. CoRR abs/1406.5726 (2014)
Wu, J., Yu, Y., Huang, C., Yu, K.: Deep multiple instance learning for image classification and auto-annotation. In: CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 3460–3469 (2015). https://doi.org/10.1109/CVPR.2015.7298968
Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., Chang, E.I.: Deep learning of feature representation with multiple instance learning for medical image analysis. In: ICASSP 2014, Florence, Italy, 4–9 May 2014, pp. 1626–1630 (2014). https://doi.org/10.1109/ICASSP.2014.6853873
Zhang, L., et al.: Kernel sparse representation-based classifier. IEEE Trans. Signal Process. 60(4), 1684–1695 (2012). https://doi.org/10.1109/TSP.2011.2179539
Acknowledgment
This work was supported in part by the National Key Research and Development Program of China (2017YFB0702601), the National Natural Science Foundation of China (Grant Nos. 61673203, 61806092), Jiangsu Natural Science Foundation (BK20180326), and the Fundamental Research Funds for the Central Universities (14380056).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
He, K., Huo, J., Shi, Y., Gao, Y., Shen, D. (2019). MIDCN: A Multiple Instance Deep Convolutional Network for Image Classification. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11670. Springer, Cham. https://doi.org/10.1007/978-3-030-29908-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-29908-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29907-1
Online ISBN: 978-3-030-29908-8
eBook Packages: Computer ScienceComputer Science (R0)