Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 916-923
Neurocomputing

Bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking

https://doi.org/10.1016/j.neucom.2017.09.042Get rights and content

Highlights

  • Propose a semi-supervised multi-hashing using bagging to relieve the disadvantage of boosting-based multi-hashing methods.

  • Then, use boosting to train individual hash function in each hash table.

  • This hybrid method takes advantages of both bagging and boosting to maximize their benefits.

  • Propose a semi-supervised weighting scheme for query-adaptive re-ranking to improve retrieval performance of multi-hashing.

Abstract

Hashing-based methods have been widely applied in large scale image retrieval problem due to its high efficiency. In real world applications, it is difficult to require all images in a large database being labeled while unsupervised methods waste information from labeled images. Therefore, semi-supervised hashing methods are proposed to use partially labeled database to train hash functions using both the semantic and the unsupervised information. Multi-hashing methods achieve better precision-recall in comparison to single hashing method. However, current boosting-based multi-hashing methods do not improve performance after a small number of hash tables are created. Therefore, a bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking (BBSHR) is proposed in this paper. In the proposed method, an individual hash table of multi-hashing is trained using the boosting-based BSPLH, such that each hash bit corrects errors made by previous bits. Moreover, we propose a new semi-supervised weighting scheme for the query-adaptive re-ranking. Experimental results show that the proposed method yields better precision and recall rates for given numbers of hash tables and bits.

Introduction

The explosive growth of multi-media contents on the Internet creates a huge challenge for image retrieval researches. Image retrieval methods can be categorized into text-based [1], [2], [3] and content-based (CBIR) [4], [5], [6]. CBIR methods develop rapidly in the past decades. For a large scale CBIR problem, linear search methods may still use too much time and therefore sub-linear methods are needed. Instead of taking a long time to search for exact matches, approximated nearest neighbor search methods [7], [8] finding similar images in an approximated manner are much more efficient, especially for very large scale problems and no particular image is targeted. Hashing-based image retrieval methods [9], [10], [11] are instances of approximated nearest neighbor search methods which represent images with binary hash codes and have shown to be highly efficient in large scale image searches [12]. For a given query image q, hashing method tries to find its similar by finding images in the database yielding the smallest Hamming distances from q in their hash codes. Therefore, hashing methods generate hash codes for images such that similar images share similar hash codes while dissimilar images have very dissimilar hash codes.

In general, image retrieval performance improves when more hash bits and multiple hash tables are used. However, existing boosting-based multi-hashing methods do not improve and even sometimes reduce retrieval performance after the number of hash tables reaches a certain threshold. Therefore, the bagging–boosting-based semi-supervised multi-hashing method is proposed to address this problem. The proposed method consists of two steps: multi-hashing construction and query-adaptive re-ranking.

Major contributions of this paper include:

  • The proposal of a semi-supervised multi-hashing using bagging to relieve the disadvantage of boosting-based multi-hashing methods: new hash table being highly similar to existing one’s after a number of tables being created. Then, boosting is used to train individual hash function in each hash table. This hybrid method takes advantages of both bagging and boosting and applies them in different parts of the whole algorithm to maximize their benefits.

  • Proposing a semi-supervised weighting scheme for query-adaptive re-ranking to improve retrieval performance of multi-hashing for semi-supervised image retrieval problem.

Related works are introduced in Section 2. The bagging–boosting-based semi-supervised multi-hashing with re-ranking (BBSHR) is proposed in Section 3. Experimental results are shown and discussed in Section 4. Section 5 concludes this paper.

Section snippets

Related works

Current hashing methods and multi-hashing methods are introduced in Sections 2.1 and 2.2, respectively.

The BBSHR

The Bagging–Boosting-based Semi-supervised Hashing with query-adaptive Re-ranking (BBSHR) consists of three major components: a hybrid semi-supervised multi-hashing to train multiple hash tables, semi-supervised category-specific weight generation, and a semi-supervised query-adaptive re-ranking to order the retrieved images for a given query. These three components will be proposed in Sections 3.1–3.3, respectively.

Experiments

In this section, we compare the BBSHR with state-of-the-art hashing methods using three databases: the MNIST, the USPS, and the CIFAR10. Methods in comparisons include: the LSH [18], [19], the CH [32], the DCH [33], the BIQH [34], the BSPLH [26], and the SPLH [25].

The USPS and the MNIST are handwritten digits databases consisting of 10 categories: 0–9 digits. The MNIST consists of 70K 28 × 28-pixel images being represented by 784-dimensional feature vectors. The USPS consists of 9298

Conclusion

A bagging–boosting-based semi-supervised multi-hashing method with query-adaptive re-ranking (BBSHR) is proposed in this paper. The BBSHR uses the semi-supervised bagging to construct multiple hash tables and then individual hash table is trained using a boosting-based method. The semi-supervised query-adaptive re-ranking is proposed to further improve retrieval performance. Experimental results show that the BBSHR outperforms state-of-the-art hashing methods with statistical significance. ann1

Acknowledgment

This work is under support of the National Natural Science Foundation of China under Grants (61272201 and 61572201) and the Fundamental Research Funds for the Central Universities (2017ZD052).

Wing W. Y. Ng (S’ 02-M’ 05-SM’ 15) received his B.Sc. and Ph.D. degrees from Hong Kong Polytechnic University in 2001 and 2006, respectively. He is now a Professor in the School of Computer Science and Engineering, South China University of Technology, China. His major research directions include machine learning and information retrieval. He is currently an associate editor of the International Journal of Machine Learning and Cybernetics. He is the principle investigator of three China

References (41)

  • ZhaoJ.

    Research on content-based multimedia information retrieval

    Proceedings of the 2011 International Conference on Computational and Information Sciences

    (2011)
  • T. Kato

    Cognitive View Mechanism for Content-Based Multimedia Information Retrieval

    (1993)
  • ZhouG. et al.

    Relevance feature mapping for content-based multimedia information retrieval

    Pattern Recognit.

    (2012)
  • M. Casey et al.

    Song intersection by approximate nearest neighbor search

    Proceedings of the 2006 International Society for Music Information Retrieval (ISMIR)

    (2006)
  • LiP. et al.

    Spectral hashing with semantically consistent graph for image indexing

    IEEE Trans. Multimed.

    (2013)
  • ShaoJ. et al.

    Sparse spectral hashing

    Pattern Recognit. Lett.

    (2012)
  • W.W. Ng et al.

    Two-phase mapping hashing

    Neurocomputing

    (2015)
  • LiuL. et al.

    Sequential compact code learning for unsupervised image hashing

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • ZhuL. et al.

    Unsupervised visual hashing with semantic assistant for content-based image retrieval

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • LiuL. et al.

    Unsupervised local feature hashing for image similarity search

    IEEE Trans. Cybern.

    (2015)
  • Cited by (24)

    • An ensemble classifier approach for thyroid disease diagnosis using the AdaBoostM algorithm

      2021, Machine Learning, Big Data, and IoT for Medical Informatics
    • Automated sperm morphology analysis approach using a directional masking technique

      2020, Computers in Biology and Medicine
      Citation Excerpt :

      Alternatives to individual classifiers include ensemble techniques, such as boosting and bagging, which can combine multiple base models to increase the accuracy beyond that of the single best model. These have become popular in the machine learning literature [60–63]. Ensemble methods use multiple learning algorithms (also called weak learners) to improve the overall performance by combining the individual decisions of each weak learner.

    • Bootstrap dual complementary hashing with semi-supervised re-ranking for image retrieval

      2020, Neurocomputing
      Citation Excerpt :

      This could significantly lower the retrieval accuracy. Multi-hashing methods (e.g. [15–18]) generate multiple hash tables in order to improve recall rate without yielding a significant drop in precision. The pairwise similarity matrix which records the semantic relationship between image pairs is generally introduced into the objective function of hash functions training for semantic relationship preservation.

    • Predicting rank for scientific research papers using supervised learning

      2019, Applied Computing and Informatics
      Citation Excerpt :

      The hyper planes can be determined by means of a few points which will be called “support vectors”. The Boosting [19] is summarized as follows: A large set of simple features.

    • Unsupervised adaptive hashing based on feature clustering

      2019, Neurocomputing
      Citation Excerpt :

      In many critical applications of computer vision, approximate nearest neighbor (ANN) search [3,4] on image dataset is a fundamental step. As a particular approach to ANN search, hashing methods [5–7] have promising performance for high-dimensional data [1] due to the compact encoding and Hamming space. Existing hashing methods can be categorized into data-independent methods and data-dependent methods.

    View all citing articles on Scopus

    Wing W. Y. Ng (S’ 02-M’ 05-SM’ 15) received his B.Sc. and Ph.D. degrees from Hong Kong Polytechnic University in 2001 and 2006, respectively. He is now a Professor in the School of Computer Science and Engineering, South China University of Technology, China. His major research directions include machine learning and information retrieval. He is currently an associate editor of the International Journal of Machine Learning and Cybernetics. He is the principle investigator of three China National Nature Science Foundation projects and a Program for New Century Excellent Talents in University from China Ministry of Education. He served as the Board of Governor of IEEE Systems, Man and Cybernetics Society in 2011–2013.

    Xiancheng Zhou received the B.Sc. and M.Sc. degrees in computer science from the South China University of Technology. His research interests include machine learning and information retrieval.

    Xing Tian received his B.Sc. degree in Computer Science from the South China University of Technology, Guangzhou, China and is currently a Ph.D. candidate of the School of Computer Science and Engineering, South China University of Technology. His current research interests focus on image retrieval and machine learning in non-stationary big data environments.

    Professor Xizhao Wang received the Ph.D. degree in computer science from the Harbin Institute of Technology, Harbin, China, in 1998. He is currently a Professor with the Big Data Institute, Shenzhen University, Shenzhen, China. His current research interests include uncertainty modeling and machine learning for big data. He has edited more than ten special issues and published three monographs, two textbooks, and more than 200 peer-reviewed research papers. By the Google scholar, the total number of citations is over 5000. He is on the list of Elsevier 2015/2016 most cited Chinese authors. He is the Chair of the IEEE SMC Technical Committee on Computational Intelligence, the Editor-in-Chief of Machine Learning and Cybernetics Journal, and Associate Editor for a couple of journals in the related areas. He was a recipient of the IEEE SMCS Outstanding Contribution Award in 2004 and a recipient of the IEEE SMCS Best Associate Editor Award in 2006.

    Professor Daniel S. Yeung (M’ 89-SM’ 99-F’ 04) is a past President of the IEEE SMC Society. He was Head and Chair Professor of the Computing Department of Hong Kong Polytechnic University, Hong Kong, and a faculty member of Rochester Institute of Technology, USA. He has also worked for TRW Inc., General Electric Corporation R&D Centre and Computer Consoles Inc. in USA. He is a Fellow of the IEEE.

    View full text