Bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking

doi:10.1016/j.neucom.2017.09.042

Neurocomputing

Volume 275, 31 January 2018, Pages 916-923

https://doi.org/10.1016/j.neucom.2017.09.042 Get rights and content

Highlights

•
Propose a semi-supervised multi-hashing using bagging to relieve the disadvantage of boosting-based multi-hashing methods.
•
Then, use boosting to train individual hash function in each hash table.
•
This hybrid method takes advantages of both bagging and boosting to maximize their benefits.
•
Propose a semi-supervised weighting scheme for query-adaptive re-ranking to improve retrieval performance of multi-hashing.

Abstract

Hashing-based methods have been widely applied in large scale image retrieval problem due to its high efficiency. In real world applications, it is difficult to require all images in a large database being labeled while unsupervised methods waste information from labeled images. Therefore, semi-supervised hashing methods are proposed to use partially labeled database to train hash functions using both the semantic and the unsupervised information. Multi-hashing methods achieve better precision-recall in comparison to single hashing method. However, current boosting-based multi-hashing methods do not improve performance after a small number of hash tables are created. Therefore, a bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking (BBSHR) is proposed in this paper. In the proposed method, an individual hash table of multi-hashing is trained using the boosting-based BSPLH, such that each hash bit corrects errors made by previous bits. Moreover, we propose a new semi-supervised weighting scheme for the query-adaptive re-ranking. Experimental results show that the proposed method yields better precision and recall rates for given numbers of hash tables and bits.

Introduction

The explosive growth of multi-media contents on the Internet creates a huge challenge for image retrieval researches. Image retrieval methods can be categorized into text-based [1], [2], [3] and content-based (CBIR) [4], [5], [6]. CBIR methods develop rapidly in the past decades. For a large scale CBIR problem, linear search methods may still use too much time and therefore sub-linear methods are needed. Instead of taking a long time to search for exact matches, approximated nearest neighbor search methods [7], [8] finding similar images in an approximated manner are much more efficient, especially for very large scale problems and no particular image is targeted. Hashing-based image retrieval methods [9], [10], [11] are instances of approximated nearest neighbor search methods which represent images with binary hash codes and have shown to be highly efficient in large scale image searches [12]. For a given query image q, hashing method tries to find its similar by finding images in the database yielding the smallest Hamming distances from q in their hash codes. Therefore, hashing methods generate hash codes for images such that similar images share similar hash codes while dissimilar images have very dissimilar hash codes.

In general, image retrieval performance improves when more hash bits and multiple hash tables are used. However, existing boosting-based multi-hashing methods do not improve and even sometimes reduce retrieval performance after the number of hash tables reaches a certain threshold. Therefore, the bagging–boosting-based semi-supervised multi-hashing method is proposed to address this problem. The proposed method consists of two steps: multi-hashing construction and query-adaptive re-ranking.

Major contributions of this paper include:

•
The proposal of a semi-supervised multi-hashing using bagging to relieve the disadvantage of boosting-based multi-hashing methods: new hash table being highly similar to existing one’s after a number of tables being created. Then, boosting is used to train individual hash function in each hash table. This hybrid method takes advantages of both bagging and boosting and applies them in different parts of the whole algorithm to maximize their benefits.
•
Proposing a semi-supervised weighting scheme for query-adaptive re-ranking to improve retrieval performance of multi-hashing for semi-supervised image retrieval problem.

Related works are introduced in Section 2. The bagging–boosting-based semi-supervised multi-hashing with re-ranking (BBSHR) is proposed in Section 3. Experimental results are shown and discussed in Section 4. Section 5 concludes this paper.

Section snippets

Related works

Current hashing methods and multi-hashing methods are introduced in Sections 2.1 and 2.2, respectively.

The BBSHR

The Bagging–Boosting-based Semi-supervised Hashing with query-adaptive Re-ranking (BBSHR) consists of three major components: a hybrid semi-supervised multi-hashing to train multiple hash tables, semi-supervised category-specific weight generation, and a semi-supervised query-adaptive re-ranking to order the retrieved images for a given query. These three components will be proposed in Sections 3.1–3.3, respectively.

Experiments

In this section, we compare the BBSHR with state-of-the-art hashing methods using three databases: the MNIST, the USPS, and the CIFAR10. Methods in comparisons include: the LSH [18], [19], the CH [32], the DCH [33], the BIQH [34], the BSPLH [26], and the SPLH [25].

The USPS and the MNIST are handwritten digits databases consisting of 10 categories: 0–9 digits. The MNIST consists of 70K 28 × 28-pixel images being represented by 784-dimensional feature vectors. The USPS consists of 9298

Conclusion

A bagging–boosting-based semi-supervised multi-hashing method with query-adaptive re-ranking (BBSHR) is proposed in this paper. The BBSHR uses the semi-supervised bagging to construct multiple hash tables and then individual hash table is trained using a boosting-based method. The semi-supervised query-adaptive re-ranking is proposed to further improve retrieval performance. Experimental results show that the BBSHR outperforms state-of-the-art hashing methods with statistical significance. ann1

Acknowledgment

This work is under support of the National Natural Science Foundation of China under Grants (61272201 and 61572201) and the Fundamental Research Funds for the Central Universities (2017ZD052).

References (41)

TongC.S. et al.
Adaptive approximate nearest neighbor search for fractal image compression
IEEE Trans. Image Process.
(2002)
B. Demir et al.
Hashing-based scalable remote sensing image search and retrieval in large archives
IEEE Trans. Geosci. Remote Sens.
(2016)
X. Liu et al.
Large-scale unsupervised hashing with shared structure learning
IEEE Trans. Cybern.
(2015)
XieH. et al.
Contextual query expansion for image retrieval
IEEE Trans. Multimed.
(2014)
ChengJ. et al.
Semi-supervised multi-graph hashing for scalable similarity search
Comput. Vis. Image Underst.
(2014)
L. Breiman
Using iterated bagging to Debias regressions
Mach. Learn.
(2001)
J. Hulse et al.
Experimental perspectives on learning from imbalanced data
Proceedings of the Twenty-Fourth International Conference on Machine Learning
(2008)
ZhangC. et al.
User term feedback in interactive text-based image retrieval
Proceedings of the Twenty-eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
(2005)
LiW. et al.
Text-based image retrieval using progressive multi-instance learning
Proceedings of the 2011 International Conference on Computer Vision
(2011)
D. Petrelli et al.
Using concept hierarchies in text-based image retrieval: a user evaluation
Lect. Notes Comput. Sci.
(2005)

ZhaoJ.

Research on content-based multimedia information retrieval

Proceedings of the 2011 International Conference on Computational and Information Sciences

(2011)

T. Kato

Cognitive View Mechanism for Content-Based Multimedia Information Retrieval

(1993)

ZhouG. et al.

Relevance feature mapping for content-based multimedia information retrieval

Pattern Recognit.

(2012)

M. Casey et al.

Song intersection by approximate nearest neighbor search

Proceedings of the 2006 International Society for Music Information Retrieval (ISMIR)

(2006)

LiP. et al.

Spectral hashing with semantically consistent graph for image indexing

IEEE Trans. Multimed.

(2013)

ShaoJ. et al.

Sparse spectral hashing

Pattern Recognit. Lett.

(2012)

W.W. Ng et al.

Two-phase mapping hashing

Neurocomputing

(2015)

LiuL. et al.

Sequential compact code learning for unsupervised image hashing

IEEE Trans. Neural Netw. Learn. Syst.

(2015)

ZhuL. et al.

Unsupervised visual hashing with semantic assistant for content-based image retrieval

IEEE Trans. Knowl. Data Eng.

(2016)

LiuL. et al.

Unsupervised local feature hashing for image similarity search

IEEE Trans. Cybern.

(2015)

Cited by (24)

Modified boosting and bagging for building tilt rate prediction in tunnel construction
2023, Automation in Construction
The building tilt rate (BTR) prediction problem is of great importance for safety management in metro tunnel construction. To improve the BTR prediction accuracy, it is pertinent to accurately identify and properly handle less-reliable data which refers to data with an inconsistent input-output relation. A modified boosting and bagging approach is thus proposed, i.e., BooBag. Specifically, modified bagging of BooBag can create multiple artificial labels for differentiating reliable data and less-reliable data while modified boosting can be used for pace management to avoid both underfitting and overfitting. Investigations of a practical BTR prediction case show that BooBag can accurately differentiate reliable data and less-reliable data, which leads to a smaller BTR prediction error, which is further validated via extensive independence, strategy, and resilience comparisons. BooBag can be used for safety management in tunnel construction as a pre-step to be combined with varied machine learning approaches for key safety indicator prediction, irregularity/anomaly detection, fault diagnosis, etc.
An ensemble classifier approach for thyroid disease diagnosis using the AdaBoostM algorithm
2021, Machine Learning, Big Data, and IoT for Medical Informatics
The use of information technology in medicine has been a reality for some time now, and the continuous progress that is taking place is reflected in the technologies applied to the care of people who become increasingly avant-garde. These technologies are used with the fundamental objective of improving the health and life expectancy of the world population. Ensemble learning methods provide more correct decision-making processes, at the expense of greater complexity and a loss of interpretability, compared to learning systems based on single hypotheses. The ensemble learning combines the predictions of hypothesis collections to obtain greater performance efficiency. In this chapter, we will explore how to use the ensemble methods for the diagnosis of thyroid disease. After analyzing the concepts behind the different Ensemble Learning methods, we will present a practical case in which we will use AdaBoostM algorithm for the diagnosis of thyroid disorders.
Automated sperm morphology analysis approach using a directional masking technique
2020, Computers in Biology and Medicine
Citation Excerpt :
Alternatives to individual classifiers include ensemble techniques, such as boosting and bagging, which can combine multiple base models to increase the accuracy beyond that of the single best model. These have become popular in the machine learning literature [60–63]. Ensemble methods use multiple learning algorithms (also called weak learners) to improve the overall performance by combining the individual decisions of each weak learner.
Sperm Morphology is the key step in the assessment of sperm quality. Due to the effect of misleading human factors in manual assessments, computer-based techniques should be employed in the analysis. In this study, a computation framework including multi-stage cascade connected preprocessing techniques, region based descriptor features, and non-linear kernel SVM based learning is proposed for the classification of any stained sperm images for the assessment of the morphology. The proposed framework was evaluated on two sperm morphology datasets: the Human Sperm Head Morphology dataset (HuSHeM) and Sperm Morphology Image Data Set (SMIDS). The results indicate that cascading the preprocessing techniques used in the proposed framework, such as wavelet based local adaptive de-noising, modified overlapping group shrinkage, image gradient, and automatic directional masking, increased the classification accuracy by 10% and 5% for the HuSHeM and SMIDS, respectively. The proposed framework results in better overall accuracy than most state-of-the-art methods, while having significant advantages, such as eliminating the exhaustive manual orientation and cropping operations of the competitors with reasonable rates of consumption of time and source.
Bootstrap dual complementary hashing with semi-supervised re-ranking for image retrieval
2020, Neurocomputing
Citation Excerpt :
This could significantly lower the retrieval accuracy. Multi-hashing methods (e.g. [15–18]) generate multiple hash tables in order to improve recall rate without yielding a significant drop in precision. The pairwise similarity matrix which records the semantic relationship between image pairs is generally introduced into the objective function of hash functions training for semantic relationship preservation.
With the rapid growth of multimedia data on the Internet, content-based image retrieval becomes a key technique for the Internet development. Hashing methods are efficient and effective for image retrieval. Dual Complementary Hashing (DCH) is one such method, which uses multiple hash tables and has good performance. However, DCH utilizes wrongly hashed image pairs to train the following hash table and discards correctly hashed image pairs. Therefore, the number of image pairs utilized for training the following hash tables will decrease rapidly. Moreover, each hash function in a hash table of DCH is trained by correcting the errors caused by its preceding one instead of holistically considering errors made by all previous hash functions. These restrictions significantly reduce the training efficiency and the overall performance of DCH. In this paper, we propose a new hashing method for image retrieval, Bootstrap Dual Complementary Hashing with semi-supervised Re-ranking (BDCHR). It is a semi-supervised multi-hashing method consisting of two parts: bootstrap DCH and semi-supervised re-ranking. The first part relieves the restrictions of DCH while the second part further enhances the image retrieval performance. Experimental results show that BDCHR yields better performance than other state-of-the-art multi-hashing methods.
Predicting rank for scientific research papers using supervised learning
2019, Applied Computing and Informatics
Citation Excerpt :
The hyper planes can be determined by means of a few points which will be called “support vectors”. The Boosting [19] is summarized as follows: A large set of simple features.
Automatic data processing represents the future for the development of any system, especially in scientific research. In this paper, we describe one of the automatic classification methods applied to scientific research as a supervised learning task. Throughout the process, we identify the main features that are used as keys to play a significant role in terms of predicting the new rank under the supervised learning setup. First, we propose an overview of the work that has been realized in ranking scientific research papers. Second, we evaluate and compare some of state-of-the-art for the classification by supervised learning, semi-supervised learning and non-supervised learning. During the preliminary tests, we have obtained good results for performance on realistic corpus then we have compared performance metrics, such as NDCG, MAP, GMAP, F-Measure, Precision and Recall in order to define the influential features in our work.
Unsupervised adaptive hashing based on feature clustering
2019, Neurocomputing
Citation Excerpt :
In many critical applications of computer vision, approximate nearest neighbor (ANN) search [3,4] on image dataset is a fundamental step. As a particular approach to ANN search, hashing methods [5–7] have promising performance for high-dimensional data [1] due to the compact encoding and Hamming space. Existing hashing methods can be categorized into data-independent methods and data-dependent methods.
An attractive method for image retrieval is binary hashing, which aims to reduce the dimensionality and generate similarity-preserving binary codes. To map the high-dimensional data into a low-dimensional subspace, majority of current unsupervised hashing approaches reduce the dimensionality by principal component analysis (PCA). However, PCA will yield unbalanced variances of projection directions and cause inconvenience in the quantization step. Besides, preserving the original similarity in existing unsupervised hashing methods remains as an NP-hard problem. For addressing these problems, we explore a novel hashing method based on feature clustering to simultaneously generate low-dimensional data with balanced variance and preserve the similarity in Euclidean space. Furthermore, we also propose an adaptive quantization approach to displace the fixed threshold quantization. Our novel method, dubbed as Feature Clustering Hashing (FCH), has shown its superiority to state-of-the-art methods on three benchmark datasets.

View all citing articles on Scopus

Wing W. Y. Ng (S’ 02-M’ 05-SM’ 15) received his B.Sc. and Ph.D. degrees from Hong Kong Polytechnic University in 2001 and 2006, respectively. He is now a Professor in the School of Computer Science and Engineering, South China University of Technology, China. His major research directions include machine learning and information retrieval. He is currently an associate editor of the International Journal of Machine Learning and Cybernetics. He is the principle investigator of three China National Nature Science Foundation projects and a Program for New Century Excellent Talents in University from China Ministry of Education. He served as the Board of Governor of IEEE Systems, Man and Cybernetics Society in 2011–2013.

Xiancheng Zhou received the B.Sc. and M.Sc. degrees in computer science from the South China University of Technology. His research interests include machine learning and information retrieval.

Xing Tian received his B.Sc. degree in Computer Science from the South China University of Technology, Guangzhou, China and is currently a Ph.D. candidate of the School of Computer Science and Engineering, South China University of Technology. His current research interests focus on image retrieval and machine learning in non-stationary big data environments.

Professor Xizhao Wang received the Ph.D. degree in computer science from the Harbin Institute of Technology, Harbin, China, in 1998. He is currently a Professor with the Big Data Institute, Shenzhen University, Shenzhen, China. His current research interests include uncertainty modeling and machine learning for big data. He has edited more than ten special issues and published three monographs, two textbooks, and more than 200 peer-reviewed research papers. By the Google scholar, the total number of citations is over 5000. He is on the list of Elsevier 2015/2016 most cited Chinese authors. He is the Chair of the IEEE SMC Technical Committee on Computational Intelligence, the Editor-in-Chief of Machine Learning and Cybernetics Journal, and Associate Editor for a couple of journals in the related areas. He was a recipient of the IEEE SMCS Outstanding Contribution Award in 2004 and a recipient of the IEEE SMCS Best Associate Editor Award in 2006.

Professor Daniel S. Yeung (M’ 89-SM’ 99-F’ 04) is a past President of the IEEE SMC Society. He was Head and Chair Professor of the Computing Department of Hong Kong Polytechnic University, Hong Kong, and a faculty member of Rochester Institute of Technology, USA. He has also worked for TRW Inc., General Electric Corporation R&D Centre and Computer Consoles Inc. in USA. He is a Fellow of the IEEE.

View full text

Bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking

Highlights

Abstract

Introduction

Section snippets

Related works

The BBSHR

Experiments

Conclusion

Acknowledgment

IEEE Trans. Image Process.

IEEE Trans. Geosci. Remote Sens.

IEEE Trans. Cybern.

IEEE Trans. Multimed.

Comput. Vis. Image Underst.

Mach. Learn.

User term feedback in interactive text-based image retrieval

Proceedings of the Twenty-eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Text-based image retrieval using progressive multi-instance learning

Proceedings of the 2011 International Conference on Computer Vision

Using concept hierarchies in text-based image retrieval: a user evaluation

Lect. Notes Comput. Sci.

Research on content-based multimedia information retrieval

Proceedings of the 2011 International Conference on Computational and Information Sciences

Cognitive View Mechanism for Content-Based Multimedia Information Retrieval

Relevance feature mapping for content-based multimedia information retrieval

Pattern Recognit.

Song intersection by approximate nearest neighbor search

Proceedings of the 2006 International Society for Music Information Retrieval (ISMIR)

Spectral hashing with semantically consistent graph for image indexing

IEEE Trans. Multimed.

Sparse spectral hashing

Pattern Recognit. Lett.

Two-phase mapping hashing

Neurocomputing

Sequential compact code learning for unsupervised image hashing

IEEE Trans. Neural Netw. Learn. Syst.

Unsupervised visual hashing with semantic assistant for content-based image retrieval

IEEE Trans. Knowl. Data Eng.

Unsupervised local feature hashing for image similarity search

IEEE Trans. Cybern.