Bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking
Introduction
The explosive growth of multi-media contents on the Internet creates a huge challenge for image retrieval researches. Image retrieval methods can be categorized into text-based [1], [2], [3] and content-based (CBIR) [4], [5], [6]. CBIR methods develop rapidly in the past decades. For a large scale CBIR problem, linear search methods may still use too much time and therefore sub-linear methods are needed. Instead of taking a long time to search for exact matches, approximated nearest neighbor search methods [7], [8] finding similar images in an approximated manner are much more efficient, especially for very large scale problems and no particular image is targeted. Hashing-based image retrieval methods [9], [10], [11] are instances of approximated nearest neighbor search methods which represent images with binary hash codes and have shown to be highly efficient in large scale image searches [12]. For a given query image q, hashing method tries to find its similar by finding images in the database yielding the smallest Hamming distances from q in their hash codes. Therefore, hashing methods generate hash codes for images such that similar images share similar hash codes while dissimilar images have very dissimilar hash codes.
In general, image retrieval performance improves when more hash bits and multiple hash tables are used. However, existing boosting-based multi-hashing methods do not improve and even sometimes reduce retrieval performance after the number of hash tables reaches a certain threshold. Therefore, the bagging–boosting-based semi-supervised multi-hashing method is proposed to address this problem. The proposed method consists of two steps: multi-hashing construction and query-adaptive re-ranking.
Major contributions of this paper include:
- •
The proposal of a semi-supervised multi-hashing using bagging to relieve the disadvantage of boosting-based multi-hashing methods: new hash table being highly similar to existing one’s after a number of tables being created. Then, boosting is used to train individual hash function in each hash table. This hybrid method takes advantages of both bagging and boosting and applies them in different parts of the whole algorithm to maximize their benefits.
- •
Proposing a semi-supervised weighting scheme for query-adaptive re-ranking to improve retrieval performance of multi-hashing for semi-supervised image retrieval problem.
Related works are introduced in Section 2. The bagging–boosting-based semi-supervised multi-hashing with re-ranking (BBSHR) is proposed in Section 3. Experimental results are shown and discussed in Section 4. Section 5 concludes this paper.
Section snippets
Related works
Current hashing methods and multi-hashing methods are introduced in Sections 2.1 and 2.2, respectively.
The BBSHR
The Bagging–Boosting-based Semi-supervised Hashing with query-adaptive Re-ranking (BBSHR) consists of three major components: a hybrid semi-supervised multi-hashing to train multiple hash tables, semi-supervised category-specific weight generation, and a semi-supervised query-adaptive re-ranking to order the retrieved images for a given query. These three components will be proposed in Sections 3.1–3.3, respectively.
Experiments
In this section, we compare the BBSHR with state-of-the-art hashing methods using three databases: the MNIST, the USPS, and the CIFAR10. Methods in comparisons include: the LSH [18], [19], the CH [32], the DCH [33], the BIQH [34], the BSPLH [26], and the SPLH [25].
The USPS and the MNIST are handwritten digits databases consisting of 10 categories: 0–9 digits. The MNIST consists of 70K 28 × 28-pixel images being represented by 784-dimensional feature vectors. The USPS consists of 9298
Conclusion
A bagging–boosting-based semi-supervised multi-hashing method with query-adaptive re-ranking (BBSHR) is proposed in this paper. The BBSHR uses the semi-supervised bagging to construct multiple hash tables and then individual hash table is trained using a boosting-based method. The semi-supervised query-adaptive re-ranking is proposed to further improve retrieval performance. Experimental results show that the BBSHR outperforms state-of-the-art hashing methods with statistical significance. ann1
Acknowledgment
This work is under support of the National Natural Science Foundation of China under Grants (61272201 and 61572201) and the Fundamental Research Funds for the Central Universities (2017ZD052).
Wing W. Y. Ng (S’ 02-M’ 05-SM’ 15) received his B.Sc. and Ph.D. degrees from Hong Kong Polytechnic University in 2001 and 2006, respectively. He is now a Professor in the School of Computer Science and Engineering, South China University of Technology, China. His major research directions include machine learning and information retrieval. He is currently an associate editor of the International Journal of Machine Learning and Cybernetics. He is the principle investigator of three China
References (41)
- et al.
Adaptive approximate nearest neighbor search for fractal image compression
IEEE Trans. Image Process.
(2002) - et al.
Hashing-based scalable remote sensing image search and retrieval in large archives
IEEE Trans. Geosci. Remote Sens.
(2016) - et al.
Large-scale unsupervised hashing with shared structure learning
IEEE Trans. Cybern.
(2015) - et al.
Contextual query expansion for image retrieval
IEEE Trans. Multimed.
(2014) - et al.
Semi-supervised multi-graph hashing for scalable similarity search
Comput. Vis. Image Underst.
(2014) Using iterated bagging to Debias regressions
Mach. Learn.
(2001)- et al.
Experimental perspectives on learning from imbalanced data
Proceedings of the Twenty-Fourth International Conference on Machine Learning
(2008) - et al.
User term feedback in interactive text-based image retrieval
Proceedings of the Twenty-eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
(2005) - et al.
Text-based image retrieval using progressive multi-instance learning
Proceedings of the 2011 International Conference on Computer Vision
(2011) - et al.
Using concept hierarchies in text-based image retrieval: a user evaluation
Lect. Notes Comput. Sci.
(2005)
Research on content-based multimedia information retrieval
Proceedings of the 2011 International Conference on Computational and Information Sciences
Cognitive View Mechanism for Content-Based Multimedia Information Retrieval
Relevance feature mapping for content-based multimedia information retrieval
Pattern Recognit.
Song intersection by approximate nearest neighbor search
Proceedings of the 2006 International Society for Music Information Retrieval (ISMIR)
Spectral hashing with semantically consistent graph for image indexing
IEEE Trans. Multimed.
Sparse spectral hashing
Pattern Recognit. Lett.
Two-phase mapping hashing
Neurocomputing
Sequential compact code learning for unsupervised image hashing
IEEE Trans. Neural Netw. Learn. Syst.
Unsupervised visual hashing with semantic assistant for content-based image retrieval
IEEE Trans. Knowl. Data Eng.
Unsupervised local feature hashing for image similarity search
IEEE Trans. Cybern.
Cited by (24)
Modified boosting and bagging for building tilt rate prediction in tunnel construction
2023, Automation in ConstructionAn ensemble classifier approach for thyroid disease diagnosis using the AdaBoostM algorithm
2021, Machine Learning, Big Data, and IoT for Medical InformaticsAutomated sperm morphology analysis approach using a directional masking technique
2020, Computers in Biology and MedicineCitation Excerpt :Alternatives to individual classifiers include ensemble techniques, such as boosting and bagging, which can combine multiple base models to increase the accuracy beyond that of the single best model. These have become popular in the machine learning literature [60–63]. Ensemble methods use multiple learning algorithms (also called weak learners) to improve the overall performance by combining the individual decisions of each weak learner.
Bootstrap dual complementary hashing with semi-supervised re-ranking for image retrieval
2020, NeurocomputingCitation Excerpt :This could significantly lower the retrieval accuracy. Multi-hashing methods (e.g. [15–18]) generate multiple hash tables in order to improve recall rate without yielding a significant drop in precision. The pairwise similarity matrix which records the semantic relationship between image pairs is generally introduced into the objective function of hash functions training for semantic relationship preservation.
Predicting rank for scientific research papers using supervised learning
2019, Applied Computing and InformaticsCitation Excerpt :The hyper planes can be determined by means of a few points which will be called “support vectors”. The Boosting [19] is summarized as follows: A large set of simple features.
Unsupervised adaptive hashing based on feature clustering
2019, NeurocomputingCitation Excerpt :In many critical applications of computer vision, approximate nearest neighbor (ANN) search [3,4] on image dataset is a fundamental step. As a particular approach to ANN search, hashing methods [5–7] have promising performance for high-dimensional data [1] due to the compact encoding and Hamming space. Existing hashing methods can be categorized into data-independent methods and data-dependent methods.
Wing W. Y. Ng (S’ 02-M’ 05-SM’ 15) received his B.Sc. and Ph.D. degrees from Hong Kong Polytechnic University in 2001 and 2006, respectively. He is now a Professor in the School of Computer Science and Engineering, South China University of Technology, China. His major research directions include machine learning and information retrieval. He is currently an associate editor of the International Journal of Machine Learning and Cybernetics. He is the principle investigator of three China National Nature Science Foundation projects and a Program for New Century Excellent Talents in University from China Ministry of Education. He served as the Board of Governor of IEEE Systems, Man and Cybernetics Society in 2011–2013.
Xiancheng Zhou received the B.Sc. and M.Sc. degrees in computer science from the South China University of Technology. His research interests include machine learning and information retrieval.
Xing Tian received his B.Sc. degree in Computer Science from the South China University of Technology, Guangzhou, China and is currently a Ph.D. candidate of the School of Computer Science and Engineering, South China University of Technology. His current research interests focus on image retrieval and machine learning in non-stationary big data environments.
Professor Xizhao Wang received the Ph.D. degree in computer science from the Harbin Institute of Technology, Harbin, China, in 1998. He is currently a Professor with the Big Data Institute, Shenzhen University, Shenzhen, China. His current research interests include uncertainty modeling and machine learning for big data. He has edited more than ten special issues and published three monographs, two textbooks, and more than 200 peer-reviewed research papers. By the Google scholar, the total number of citations is over 5000. He is on the list of Elsevier 2015/2016 most cited Chinese authors. He is the Chair of the IEEE SMC Technical Committee on Computational Intelligence, the Editor-in-Chief of Machine Learning and Cybernetics Journal, and Associate Editor for a couple of journals in the related areas. He was a recipient of the IEEE SMCS Outstanding Contribution Award in 2004 and a recipient of the IEEE SMCS Best Associate Editor Award in 2006.
Professor Daniel S. Yeung (M’ 89-SM’ 99-F’ 04) is a past President of the IEEE SMC Society. He was Head and Chair Professor of the Computing Department of Hong Kong Polytechnic University, Hong Kong, and a faculty member of Rochester Institute of Technology, USA. He has also worked for TRW Inc., General Electric Corporation R&D Centre and Computer Consoles Inc. in USA. He is a Fellow of the IEEE.