Abstract
As is well known, the semantics of documents are exposed to us in latent way. However, most existing hashing methods ignore this fact and thus fail to discover the hidden semantic structure. To overcome this issue, we pay more attention to discover its latent semantic structure when hashing for document corpus in this paper. We mainly adopt two measures to discover the hidden structures. On the one hand, the Laplacian graph constructed in semantic space rather than in term-document space is used to capture the semantic structure for document corpus during hashing. On the other hand, motivated by the fact that non-negative matrix factorization (NMF) is an effective algorithm to discover the latent semantic structure for documents, we employ NMF to extract a parts-based representation for document. In addition, to reduce semantic loss when mapping parts-based representation into Hamming space, we impose sparse constraints to make the element of parts-based representation more close to binary values. The experimental results demonstrate that the proposed hashing method is competitive with the state-of-the-art methods in document hashing.





Similar content being viewed by others
References
Bentley JL (1990) K-d trees for semidynamic point sets. In: Proceedings of the sixth annual symposium on computational geometry, pp 187–197
Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, ICML 2006, vol 148, pp 97–104
Blei D M, Ng A Y, Trevor JMI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
Cai D, He X, Han J, Huang T S (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell IEEE 33 (8):1548–1560
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst PP(99):1–12
Chang EY, Zhu K, Wang H, Bai H, Li J, Qiu Z, Cui H (2007) PSVM: parallelizing support vector machines on distributed computers. In: Proceedings of the conference on the advances in neural information processing systems, vol 20, pp 1–8
Chang X, Ma Z, Yang Y, Zeng Z et al (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Chang X, Ma Z, Lin M, Yang Y et al (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632
Datar M, Indyk P, Immorlica N, Mirrokni V S (2004) Locality-sensitive hashing scheme based on p-stable distributions.. In: Proceedings of the 20th annual symposium on computational geometry (SCG’04), pp 253–262
Deerwester S C, Dumais S T, Landauer T K, Furnas GW, Harshman R A (1990) Indexing by latent semantic analysis. JASIS 41(6):391–407
Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence, chi-square statistic, and a hybrid method. In: Proceedings of the national conference on artificial intelligence. IEEE, pp 342–347
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 12(35):2916–2929
Gonzalez E F, Zhang Y (2015) Accelerating the Lee-Seung algorithm for nonnegative matrix factorization, Department of Computational and Applied Mathematics, Rice University, Houston, Texas 77005, technical report: TR05-02
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. ACM SIGMOD Rec 14(2):47–57
Hoyer P O (2002) Non-negative sparse coding.. In: Proceedings of the 2002 IEEE signal processing society workshop, vol 2002, pp 557–565
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. of 30th STOC. ACM pp 604–613
Jabeen F, Khusro S, Majid A, Rauf A et al (2016) Semantics discovery in social tagging systems: a review. Multimed Tools Appl 75(1):573–605
Jiang Q-Y, Li W-J (2015) Scalable graph hashing with feature transformation. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015), vol 2015, pp 2248–2254
Jiang X, Zhang H, Liu R, Zuo Y (2016) A diversifying hidden units method based on NMF for document representation.. In: Proceedings of the 2016 IEEE international conference on knowledge engineering and applications, vol 2016, pp 103–107
Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the IEEE international conference on computer vision, pp 2130–2137
Lee H, Battle A, Raina R, Ng A (2006) Efficient sparse coding algorithms, advances in neural information processing systems. NIPGS 401(6755):801–808
Lei Z, Jialie S, Liang X, Zhiyong C (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern PP(99):1–14
Li H, Guan Y, Liu L, Wang F et al (2016) Re-ranking for microblog retrieval via multiple graph model. Multimed Tools Appl 75(1):8939–89548
Liang R-Z, Shi L, Wang H, Meng J, Wang JJ-Y, Sun Q, Gu Y (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. In: Proceedings of the international conference on pattern recognition, pp 2954–2958
Lin C-J (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw IEEE 18(6):1589–1596
Liu W, Wang J, Kumar S, Chang S-F (2011) Hashing with graphs.. In: Proceedings of the 28th international conference on machine learning (ICML 2011), pp 1–8
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd international conference on very large data bases (VLDB 2007), pp 950–961
Ma Z, Chang X, Yang Y, Sebe N et al (2017) The many shades of negativity. IEEE Trans Multimed 7(19):1558–1568
Nugumanova A, Mansurova M, Baiburin Y, Alimzhanov Y (2017) Using non-negative matrix factorization for text segmentation.. In: Proceedings of the international conference mathematical and information technologies, MIT 2016, vol 1839, pp 233–242
Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: Proceedings of the annual ACM-SIAM symposium on discrete algorithms. IEEE, pp 1186–1195
Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reas IET 50(7):213–222
Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter-sensitive hashing. Proc IEEE Int Conf Comput Vis 2(1):750–757
Seung D, Lee L (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13(1):556–562
Tatwawadi K, Hernaez M, Ochoa I, WeissmanBentley T (2016) GTRAC: fast retrieval from compressed collections of genomic variants. Bioinformatics 17(32):i479–i486
Wachsmuth E, Oram M W, Perrett D I (1994) Recognition of objects and their component parts Responses of single units in the temporal cortex of the macaque. Cogn Psychol 4(1):509–522
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing, advances in neural information processing systems. NIPS 1753–1760
Xie L, Shen J, Zhu L et al (2016) Online cross-modal hashing for web image retrieval. Proc AAAI 2016:294–300
Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H (2015) Convolutional neural networks for text hashing.. In: Proceedings of the 24th international joint conference on artificial intelligence, vol 2015, pp 1369–1375
Yang J, Li B, Tian K, Lv Z (2017) A fast image retrieval method designed for network big data. IEEE Trans Indus Inform PP(99):1–1
Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search.. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2010), pp 18–25
Zhang D, Wang J, Cai D, Lu J (2010) Laplacian co-hashing of terms and documents. Adv Inf Retriev Springer XX(01):577–580
Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search.. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI 2016), vol 2016, pp 3959–3965
Zhu L, Shen J, Xie L, Cheng Z et al (2017) Unsupervised visual hashing with semantic assistance for efficient content-based web image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Acknowledgment
This work is supported in part by the National Natural Science Foundation of China under Grant No.61672254 and 61300222, Key project of National Natural Science Foundation of China Grant No U1536203, Natural Science Foundation of Hubei Province Grant No.2015CFB687 and Natural Science Foundation of Fujian Province, Grant No. 2015J01288, the Fundamental Research Funds for the Central Universities, HUST:2016YXMS088. The authors appreciate the valuable suggestions from the anonymous reviewers and the Editors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zou, F., Tang, X., Li, K. et al. Hidden semantic hashing for fast retrieval over large scale document collection. Multimed Tools Appl 77, 3677–3697 (2018). https://doi.org/10.1007/s11042-017-5219-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5219-3