skip to main content
10.1145/2964284.2964324acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-batch Reference Learning for Deep Classification and Retrieval

Authors:
Huei-Fang Yang
Academia Sinica, Taipei, Taiwan Roc
,
Kevin Lin
Academia Sinica, Taipei, Taiwan Roc
,
Chu-Song Chen
Academia Sinica, Taipei, Taiwan Roc
Authors Info & Claims
Published: 01 October 2016 Publication History

Abstract

Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called cross-batch reference (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the mean average precision (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.

References

[1]
R. Arandjelovic and A. Zisserman. All about vlad. In Proc. CVPR, pages 1578--1585, 2013.
[2]
A. Babenko and V. S. Lempitsky. Aggregating deep convolutional features for image retrieval. In Proc. ICCV, pages 1269--1277, 2015.
[3]
A. Babenko, A. Slesarev, A. Chigorin, and V. S. Lempitsky. Neural codes for image retrieval. In Proc. ECCV, pages 584--599, 2014.
[4]
C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In Proc. NIPS, pages 193--200, 2006.
[5]
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proc. ICML, pages 647--655, 2014.
[6]
X. Fan, K. Zheng, Y. Lin, and S. Wang. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In Proc. CVPR, pages 1347--1355, 2015.
[7]
S. Gao, W. Wu, C.-H. Lee, and T.-S. Chua. A maximal figure-of-merit learning approach to text categorization. In Proc. ACM SIGIR, pages 174--181, 2003.
[8]
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 38(1):142--158, 2016.
[9]
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. CVPR, pages 580--587, 2014.
[10]
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2916--2929, 2013.
[11]
Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In Proc. ECCV, pages 392--407, 2014.
[12]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. CVPR, 2016.
[13]
H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704--1716, 2012.
[14]
H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In Proc. CVPR, pages 3310--3317, 2014.
[15]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proc. ACM MM, pages 675--678, 2014.
[16]
I. Kim and C. Lee. An efficient gradient-based approach to optimizing average precision through maximal figure-of-merit learning. Signal Processing Systems, 74(3):285--295, 2014.
[17]
A. Krizhevsky. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Report, 2009.
[18]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proc. NIPS, pages 1106--1114, 2012.
[19]
K. Li, Z. Huang, Y. Cheng, and C. Lee. A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers. In Proc. ICASSP, pages 4503--4507, 2014.
[20]
G. Lin, C. Shen, Q. Shi, A. van den Hengel, and D. Suter. Fast supervised hashing with decision trees for high-dimensional data. In Proc. CVPR, pages 1971--1978, 2014.
[21]
M. Lin, Q. Chen, and S. Yan. Network in network. In Proc. ICLR, 2014.
[22]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int'l J. Computer Vision, 60(2):91--110, 2004.
[23]
J. Y. Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. In Proc. CVPR Workshops, pages 53--61, 2015.
[24]
M.-E. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In Proc. CVPR, pages 1447--1454, 2006.
[25]
M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In Proc. CVPR, pages 1717--1724, 2014.
[26]
M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronnin, and C. Schmid. Local convolutional features with unsupervised training for image retrieval. In Proc. ICCV, pages 91--99, 2015.
[27]
F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In Proc. CVPR, 2007.
[28]
F. Perronnin, J. Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In Proc. ECCV, pages 143--156, 2010.
[29]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007.
[30]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.
[31]
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In Proc. CVPR Workshops on DeepVision, 2014.
[32]
A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. A baseline for visual instance retrieval with deep convolutional networks. arXiv preprint arXiv:1412.6574, 2014.
[33]
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proc. NIPS, pages 91--99, 2015.
[34]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Int'l J. Computer Visionl, pages 1--42, 2015.
[35]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proc. ICLR, 2014.
[36]
Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A. Hanjalic, and N. Oliver. TFMAP: optimizing MAP for top-n context-aware recommendation. In Proc. SIGIR, pages 155--164, 2012.
[37]
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Proc. NIPS, pages 568--576, 2014.
[38]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. ICLR, 2015.
[39]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. CVPR, pages 1--9, 2015.
[40]
M. J. Taylor, J. Guiver, S. Robertson, and T. Minka. Softrank: optimizing non-smooth rank metrics. In Proc. WSDM, pages 77--86, 2008.
[41]
J. Wan, D. Wang, S. C. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Proc. ACM MM, pages 157--166, 2014.
[42]
K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207--244, 2009.
[43]
P. Wu, S. C. H. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In Proc. ACM MM, pages 153--162, 2013.
[44]
R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan. Supervised hashing for image retreieval via image representation learning. In Proc. AAAI, pages 2156--2162, 2014.
[45]
J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, pages 3485--3492, 2010.
[46]
L. Xie, R. Hong, B. Zhang, and Q. Tian. Image classification and retrieval are ONE. In Proc. ICMR, pages 3--10, 2015.
[47]
J. Xu and H. Li. Adarank: A boosting algorithm for information retrieval. In Proc. SIGIR, pages 391--398, 2007.
[48]
Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Proc. SIGIR, pages 271--278, 2007.
[49]
F. Zhao, Y. Huang, L. Wang, and T. Tan. Deep semantic ranking based hashash for multi-label image retreieval. In Proc. CVPR, pages 1556--1564, 2015.
[50]
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In Proc. ACM SIGKDD, pages 940--948, 2012.

Cited By

View all
  • (2023)Deep Learning for Instance Retrieval: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.321859145:6(7270-7292)Online publication date: 1-Jun-2023
  • (2023)Fine-Grained Retrieval Method of Textile ImageIEEE Access10.1109/ACCESS.2023.328763011(70525-70533)Online publication date: 2023
  • (2022)Continual Learning for Visual Search with Backward Consistent Feature Embedding2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01620(16681-16690)Online publication date: Jun-2022
  • Show More Cited By

Index Terms

  1. Cross-batch Reference Learning for Deep Classification and Retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '16: Proceedings of the 24th ACM international conference on Multimedia
    October 2016
    1542 pages
    ISBN:9781450336031
    DOI:10.1145/2964284
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. image classification
    3. image retrieval

    Qualifiers

    • Research-article

    Conference

    MM '16
    Sponsor:
    MM '16: ACM Multimedia Conference
    October 15 - 19, 2016
    Amsterdam, The Netherlands

    Acceptance Rates

    MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Deep Learning for Instance Retrieval: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.321859145:6(7270-7292)Online publication date: 1-Jun-2023
    • (2023)Fine-Grained Retrieval Method of Textile ImageIEEE Access10.1109/ACCESS.2023.328763011(70525-70533)Online publication date: 2023
    • (2022)Continual Learning for Visual Search with Backward Consistent Feature Embedding2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01620(16681-16690)Online publication date: Jun-2022
    • (2022)Ranking-Based Deep Hashing Network for Image RetrievalIEEE Access10.1109/ACCESS.2022.322457810(125334-125352)Online publication date: 2022
    • (2021)Deep Category-Level and Regularized Hashing With Global Semantic Similarity LearningIEEE Transactions on Cybernetics10.1109/TCYB.2020.296499351:12(6240-6252)Online publication date: Dec-2021
    • (2020)Cross-Batch Reference Learning for Deep RetrievalIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.293687631:9(3145-3158)Online publication date: Sep-2020
    • (2020)Discrete Deep Hashing With Ranking Optimization for Image RetrievalIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.292786831:6(2052-2063)Online publication date: Jun-2020
    • (2020)Content based Fine-Grained Image Retrieval using Convolutional Neural Network2020 7th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN48934.2020.9071334(1120-1125)Online publication date: Feb-2020
    • (2020)Smooth-AP: Smoothing the Path Towards Large-Scale Image RetrievalComputer Vision – ECCV 202010.1007/978-3-030-58545-7_39(677-694)Online publication date: 5-Nov-2020
    • (2019)Siamese Dilated Inception Hashing With Intra-Group Correlation Enhancement for Image RetrievalIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.2935118(1-15)Online publication date: 2019
    • Show More Cited By

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media

    Get Access

    Get Access

    Login options

    References

    References

    [1]
    R. Arandjelovic and A. Zisserman. All about vlad. In Proc. CVPR, pages 1578--1585, 2013.
    [2]
    A. Babenko and V. S. Lempitsky. Aggregating deep convolutional features for image retrieval. In Proc. ICCV, pages 1269--1277, 2015.
    [3]
    A. Babenko, A. Slesarev, A. Chigorin, and V. S. Lempitsky. Neural codes for image retrieval. In Proc. ECCV, pages 584--599, 2014.
    [4]
    C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In Proc. NIPS, pages 193--200, 2006.
    [5]
    J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proc. ICML, pages 647--655, 2014.
    [6]
    X. Fan, K. Zheng, Y. Lin, and S. Wang. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In Proc. CVPR, pages 1347--1355, 2015.
    [7]
    S. Gao, W. Wu, C.-H. Lee, and T.-S. Chua. A maximal figure-of-merit learning approach to text categorization. In Proc. ACM SIGIR, pages 174--181, 2003.
    [8]
    R. Girshick, J. Donahue, T. Darrell, and J. Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 38(1):142--158, 2016.
    [9]
    R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. CVPR, pages 580--587, 2014.
    [10]
    Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2916--2929, 2013.
    [11]
    Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In Proc. ECCV, pages 392--407, 2014.
    [12]
    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. CVPR, 2016.
    [13]
    H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704--1716, 2012.
    [14]
    H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In Proc. CVPR, pages 3310--3317, 2014.
    [15]
    Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proc. ACM MM, pages 675--678, 2014.
    [16]
    I. Kim and C. Lee. An efficient gradient-based approach to optimizing average precision through maximal figure-of-merit learning. Signal Processing Systems, 74(3):285--295, 2014.
    [17]
    A. Krizhevsky. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Report, 2009.
    [18]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proc. NIPS, pages 1106--1114, 2012.
    [19]
    K. Li, Z. Huang, Y. Cheng, and C. Lee. A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers. In Proc. ICASSP, pages 4503--4507, 2014.
    [20]
    G. Lin, C. Shen, Q. Shi, A. van den Hengel, and D. Suter. Fast supervised hashing with decision trees for high-dimensional data. In Proc. CVPR, pages 1971--1978, 2014.
    [21]
    M. Lin, Q. Chen, and S. Yan. Network in network. In Proc. ICLR, 2014.
    [22]
    D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int'l J. Computer Vision, 60(2):91--110, 2004.
    [23]
    J. Y. Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. In Proc. CVPR Workshops, pages 53--61, 2015.
    [24]
    M.-E. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In Proc. CVPR, pages 1447--1454, 2006.
    [25]
    M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In Proc. CVPR, pages 1717--1724, 2014.
    [26]
    M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronnin, and C. Schmid. Local convolutional features with unsupervised training for image retrieval. In Proc. ICCV, pages 91--99, 2015.
    [27]
    F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In Proc. CVPR, 2007.
    [28]
    F. Perronnin, J. Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In Proc. ECCV, pages 143--156, 2010.
    [29]
    J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007.
    [30]
    J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.
    [31]
    A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In Proc. CVPR Workshops on DeepVision, 2014.
    [32]
    A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. A baseline for visual instance retrieval with deep convolutional networks. arXiv preprint arXiv:1412.6574, 2014.
    [33]
    S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proc. NIPS, pages 91--99, 2015.
    [34]
    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Int'l J. Computer Visionl, pages 1--42, 2015.
    [35]
    P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proc. ICLR, 2014.
    [36]
    Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A. Hanjalic, and N. Oliver. TFMAP: optimizing MAP for top-n context-aware recommendation. In Proc. SIGIR, pages 155--164, 2012.
    [37]
    K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Proc. NIPS, pages 568--576, 2014.
    [38]
    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. ICLR, 2015.
    [39]
    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. CVPR, pages 1--9, 2015.
    [40]
    M. J. Taylor, J. Guiver, S. Robertson, and T. Minka. Softrank: optimizing non-smooth rank metrics. In Proc. WSDM, pages 77--86, 2008.
    [41]
    J. Wan, D. Wang, S. C. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Proc. ACM MM, pages 157--166, 2014.
    [42]
    K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207--244, 2009.
    [43]
    P. Wu, S. C. H. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In Proc. ACM MM, pages 153--162, 2013.
    [44]
    R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan. Supervised hashing for image retreieval via image representation learning. In Proc. AAAI, pages 2156--2162, 2014.
    [45]
    J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, pages 3485--3492, 2010.
    [46]
    L. Xie, R. Hong, B. Zhang, and Q. Tian. Image classification and retrieval are ONE. In Proc. ICMR, pages 3--10, 2015.
    [47]
    J. Xu and H. Li. Adarank: A boosting algorithm for information retrieval. In Proc. SIGIR, pages 391--398, 2007.
    [48]
    Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Proc. SIGIR, pages 271--278, 2007.
    [49]
    F. Zhao, Y. Huang, L. Wang, and T. Tan. Deep semantic ranking based hashash for multi-label image retreieval. In Proc. CVPR, pages 1556--1564, 2015.
    [50]
    Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In Proc. ACM SIGKDD, pages 940--948, 2012.