skip to main content
research-article

An Efficient and Robust Semantic Hashing Framework for Similar Text Search

Published: 22 March 2023 Publication History

Abstract

Similar text search aims to find texts relevant to a given query from a database, which is fundamental in many information retrieval applications, such as question search and exercise search. Since millions of texts always exist behind practical search engine systems, a well-developed text search system usually consists of recall and ranking stages. Specifically, the recall stage serves as the basis in the system, where the main purpose is to find a small set of relevant candidates accurately and efficiently. Towards this goal, deep semantic hashing, which projects original texts into compact hash codes, can support good search performance. However, learning desired textual hash codes is extremely difficult due to the following problems. First, compact hash codes (with short length) can improve retrieval efficiency, but the demand for learning compact hash codes cannot guarantee accuracy due to severe information loss. Second, existing methods always learn the unevenly distributed codes in the space from a local perspective, leading to unsatisfactory code-balance results. Third, a large fraction of textual data contains various types of noise in real-world applications, which causes the deviation of semantics in hash codes. To this end, in this article, we first propose a general unsupervised encoder-decoder semantic hashing framework, namely MASH (short for Memory-bAsed Semantic Hashing), to learn the balanced and compact hash codes for similar text search. Specifically, with a target of retaining semantic information as much as possible, the encoder introduces a novel relevance constraint among informative high-dimensional representations to guide the compact hash code learning. Then, we design an external memory where the hashing learning can be optimized in the global space to ensure the code balance of the learning results, which can promote search efficiency. Besides, to alleviate the performance degradation problem of the model caused by text noise, we propose an improved SMASH (short for denoiSing Memory-bAsed Semantic Hashing) model by incorporating a noise-aware encoder-decoder framework. This framework considers the noise degree for each text from the semantic deviation aspect, ensuring the robustness of hash codes. Finally, we conduct extensive experiments in three real-world datasets. The experimental results clearly demonstrate the effectiveness and efficiency of MASH and SMASH in generating balanced and compact hash codes, as well as the superior denoising ability of SMASH.

References

[1]
Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. 2015. Stochastic query covering for fast approximate document retrieval. ACM Transactions on Information Systems 33, 3 (2015), 1–35.
[2]
Tao Lei Hrishikesh Joshi Regina Barzilay, Tommi Jaakkola, Katerina Tymoshenko, and Alessandro Moschitti Lluıs Marquez. 2016. Semi-supervised question retrieval with gated convolutions. In Proceedings of the NAACL-HLT. 1279–1289.
[3]
Guilherme Torresan Bazzo, Gustavo Acauan Lorentz, Danny Suarez Vargas, and Viviane P. Moreira. 2020. Assessing the impact of ocr errors in information retrieval. In Proceedings of the European Conference on Information Retrieval. Springer, 102–109.
[4]
Saddam Bekhet and Amr Ahmed. 2018. An integrated signature-based framework for efficient visual similarity detection and measurement in video shots. ACM Transactions on Information Systems 36, 4 (2018), 1–38.
[5]
David Berthelot, Colin Raffel, Aurko Roy, and Ian Goodfellow. 2018. Understanding and improving interpolation in autoencoders via an adversarial regularizer. In Proceedings of the International Conference on Learning Representations.
[6]
Suthee Chaidaroon, Travis Ebesu, and Yi Fang. 2018. Deep semantic text hashing with weak supervision. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 1109–1112.
[7]
Suthee Chaidaroon, Dae Hoon Park, Yi Chang, and Yi Fang. 2020. node2hash: Graph aware deep semantic text hashing. Information Processing and Management 57, 6 (2020), 102143.
[8]
Suthee Chaidaroon and Fang Yi. 2017. Variational deep semantic hashing for text documents. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 75–84.
[9]
Miaomiao Cheng, Liping Jing, and Michael K. Ng. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Transactions on Information Systems 38, 3 (2020), 1–25.
[10]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.
[11]
Silviu Cucerzan and Eric Brill. 2004. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 293–300.
[12]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. 253–262.
[13]
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391–407.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics. 4171–4186.
[15]
Khoa D. Doan and Chandan K. Reddy. 2020. Efficient implicit unsupervised text hashing using adversarial autoencoder. In Proceedings of the Web Conference 2020. 684–694.
[16]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2012. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12 (2012), 2916–2929.
[17]
Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819.
[18]
Kristen Grauman and Rob Fergus. 2013. Learning binary hash codes for large-scale image search. In Proceedings of the Machine Learning for Computer Vision. Springer, 49–87.
[19]
Yanhui Gu, Zhenglu Yang, Guandong Xu, Miyuki Nakano, Masashi Toyoda, and Masaru Kitsuregawa. 2014. Exploration on efficient similar sentences extraction. World Wide Web 17, 4 (2014), 595–626.
[20]
Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, and Xueqi Cheng. 2022. Semantic models for the first-stage retrieval: A comprehensive review. ACM Transactions on Information Systems 40, 4 (2022), 66:1–66:42.
[21]
Lei Guo, Hongzhi Yin, Tong Chen, Xiangliang Zhang, and Kai Zheng. 2021. Hierarchical hyperedge embedding-based representation learning for group recommendation. ACM Transactions on Information Systems 40, 1 (2021), 1–27.
[22]
Raiza Hanada, Maria da Graça C. Pimentel, Marco Cristo, and Fernando Anglada Lores. 2016. Effective spelling correction for eye-based typing using domain-specific information about error distribution. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1723–1732.
[23]
Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2019. Unsupervised neural generative semantic hashing. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 735–744.
[24]
Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2020. Unsupervised semantic hashing with pairwise reconstruction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009–2012.
[25]
Junfeng He, Shih-Fu Chang, Regunathan Radhakrishnan, and Claus Bauer. 2011. Compact hashing with joint optimization of search accuracy and time. In Proceedings of the CVPR 2011. IEEE, 753–760.
[26]
Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, and Sung-Eui Yoon. 2012. Spherical hashing. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2957–2964.
[27]
Wei Hua, Yulei Sui, Yao Wan, Guangzhong Liu, and Guandong Xu. 2020. Fcca: Hybrid code representation for functional clone detection using attention networks. IEEE Transactions on Reliability 70, 1 (2020), 304–318.
[28]
Jizhou Huang, Haifeng Wang, Yibo Sun, Miao Fan, Zhengjie Huang, Chunyuan Yuan, and Yawen Li. 2021. HGAMN: Heterogeneous graph attention matching network for multilingual POI retrieval at baidu maps. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3032–3040.
[29]
Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding-based retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2553–2561.
[30]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2333–2338.
[31]
Zhenya Huang, Binbin Jin, Hongke Zhao, Qi Liu, Defu Lian, Tengfei Bao, and Enhong Chen. 2022. Personal or general? A hybrid strategy with multi-factors for news recommendation. ACM Transactions on Information Systems (2022). Just Accepted.
[32]
Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and Wei Tong. 2021. Disenqnet: Disentangled representation learning for educational questions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 696–704.
[33]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing.Jeffrey Scott Vitter (Ed.), ACM, 604–613.
[34]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2010), 117–128.
[35]
Catherine Kobus, François Yvon, and Géraldine Damnati. 2008. Normalizing SMS: Are two metaphors better than one?. In Proceedings of the 22nd International Conference on Computational Linguistics. 441–448.
[36]
Noam Koenigstein, Parikshit Ram, and Yuval Shavitt. 2012. Efficient retrieval of recommendations in a matrix factorization framework. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 535–544.
[37]
Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. Advances in Neural Information Processing Systems 22 (2009), 1042–1050.
[38]
Dan Li, Tong Xu, Peilun Zhou, Weidong He, Yanbin Hao, Yi Zheng, and Enhong Chen. 2021. Social context-aware person search in videos via multi-modal cues. ACM Transactions on Information Systems 40, 3 (2021), 1–25.
[39]
Hang Li and Jun Xu. 2014. Semantic matching in search. Foundations and Trends in Information Retrieval 7, 5 (2014), 343–469.
[40]
Jing Li, Dafei Yin, Haozhao Wang, and Yonggang Wang. 2021. DCSpell: A detector-corrector framework for chinese spelling error correction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1870–1874.
[41]
Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xiaoyi Zeng, Xiao-Ming Wu, and Qianli Ma. 2021. Embedding-based product retrieval in taobao search. In Proceedings of the KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 3181–3189.
[42]
Yawen Li, Di Jiang, Rongzhong Lian, Xueyang Wu, Conghui Tan, Yi Xu, and Zhiyang Su. 2021. Heterogeneous latent topic discovery for semantic text mining. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2021), 533–544.
[43]
Xin Lin, Zhenya Huang, Hongke Zhao, Enhong Chen, Qi Liu, Hao Wang, and Shijin Wang. 2021. Hms: A hierarchical solver with dependency-enhanced understanding for math word problem. In Proceedings of the AAAI Conference on Artificial Intelligence. 4232–4240.
[44]
Qi Liu, Zai Huang, Zhenya Huang, Chuanren Liu, Enhong Chen, Yu Su, and Guoping Hu. 2018. Finding similar exercises in online education systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1821–1830.
[45]
Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, and Guoping Hu. 2019. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering 33, 1 (2019), 100–115.
[46]
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2074–2081.
[47]
Xiao Luo, Haixin Wang, Daqing Wu, Chong Chen, Minghua Deng, Jianqiang Huang, and Xian-Sheng Hua. 2020. A survey on deep hashing methods. ACM Transactions on Knowledge Discovery from Data (2020). Just Accepted.
[48]
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. 1291–1299.
[49]
Sara Morsy and George Karypis. 2016. Accounting for language changes over time in document similarity search. ACM Transactions on Information Systems 35, 1 (2016), 1–26.
[50]
Andrew Ng, Michael Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (2001), 849–856.
[51]
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2876–2885.
[52]
Mohammad Norouzi, Ali Punjani, and David J. Fleet. 2012. Fast search in hamming space with multi-index hashing. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3108–3115.
[53]
Nouha Othman, Rim Faiz, and Kamel Smaïli. 2019. Manhattan siamese LSTM for question retrieval in community question answering. In Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”. Springer, 661–677.
[54]
Hongbin Pei, Bingzhe Wei, Kevin Chang, Chunxu Zhang, and Bo Yang. 2020. Curvature regularization to prevent distortion in graph embedding. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020.
[55]
Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-GCN: Geometric graph convolutional networks. In Proceedings of the 8th International Conference on Learning Representations. Retrieved from OpenReview.net. https://openreview.net/forum?id=S1e2agrFvS.
[56]
Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J. Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. In Proceedings of the NIPS.
[57]
Zongyue Qin, Yunsheng Bai, and Yizhou Sun. 2020. Ghashing: Semantic graph hashing for approximate similarity search in graph databases. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2062–2072.
[58]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. PMLR, 1278–1286.
[59]
Jirí Rihák and Radek Pelánek. 2017. Measuring similarity of educational items using data on learners’ performance. In Proceedings of the 10th International Conference on Educational Data Mining. 16–23.
[60]
Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the SIGIR’94. Springer, 232–241.
[61]
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969–978.
[62]
Ying Shan, Jian Jiao, Jie Zhu, and J. C. Mao. 2018. Recurrent binary embedding for gpu-enabled exhaustive retrieval from billion-scale semantic vectors. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2170–2179.
[63]
Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Ricardo Henao, and Lawrence Carin. 2018. NASH: Toward end-to-end neural architecture for generative semantic hashing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2041–2050.
[64]
Yansong Shen, Lin Li, Qing Xie, Xin Li, and Guandong Xu. 2022. A two-tower spatial-temporal graph neural network for traffic speed prediction. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 406–418.
[65]
Robert F. Sproull. 1991. Refinements to nearest-neighbor searching ink-dimensional trees. Algorithmica 6, 1 (1991), 579–589.
[66]
Ankit Srivastava, Piyush Makhija, and Anuj Gupta. 2020. Noisy text data: Achilles’ heel of BERT. In Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020). 16–21.
[67]
Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 455–464.
[68]
Lynda Tamine, Laure Soulier, Gia-Hung Nguyen, and Nathalie Souf. 2019. Offline versus online representation learning of documents using external knowledge. ACM Transactions on Information Systems 37, 4 (2019), 1–34.
[69]
George R. Terrell and David W. Scott. 1992. Variable kernel density estimation. The Annals of Statistics 20, 3 (1992), 1236–1265.
[70]
Wei Tong, Shiwei Tong, Wei Huang, Liyang He, Jianhui Ma, Qi Liu, and Enhong Chen. 2020. Exploiting knowledge hierarchy for finding similar exercises in online education systems. In Proceedings of the 20th IEEE International Conference on Data Mining. IEEE, 1298–1303.
[71]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.
[72]
Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2015. Learning to hash for indexing big data-A survey. Proc. IEEE 104, 1 (2015), 34–57.
[73]
Jun Wang, Wei Liu, Andy X Sun, and Yu-Gang Jiang. 2013. Learning hash codes with listwise supervision. In Proceedings of the IEEE International Conference on Computer Vision. 3032–3039.
[74]
Jingdong Wang, Ting Zhang, Nicu Sebe, and Heng Tao Shen. 2017. A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 769–790.
[75]
Qifan Wang, Dan Zhang, and Luo Si. 2013. Semantic hashing using tags and topic modeling. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 213–222.
[76]
Xin Wang, Wei Huang, Qi Liu, Yu Yin, Zhenya Huang, Le Wu, Jianhui Ma, and Xue Wang. 2020. Fine-grained similarity measurement between educational videos and exercises. In Proceedings of the 28th ACM International Conference on Multimedia. 331–339.
[77]
Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xueqi Cheng, Hui Jiang, and Xiaozhao Wang. 2020. Match\(^2\): A matching over matching model for similar question identification. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 559–568.
[78]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2008. Spectral hashing. Advances in Neural Information Processing Systems 21 (2008), 1753–1760.
[79]
Philip C. Woodland, Sue E. Johnson, Pierre Jourlin, and K. Spärck Jones. 2000. Effects of out of vocabulary words in spoken document retrieval. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 372–374.
[80]
Hao Yan, Shuai Ding, and Torsten Suel. 2009. Inverted index compression and query processing with optimized document ordering. In Proceedings of the 18th International Conference on World Wide Web. 401–410.
[81]
R. Baeza Yates and B. Ribeiro Neto. 2011. Modern Information Retrieval: The Concepts and Technology Behind Search. Pearson Education Ltd., Harlow.
[82]
Fanghua Ye, Jarana Manotumruksa, and Emine Yilmaz. 2020. Unsupervised few-bits semantic hashing with implicit topics modeling. In Proceedings of the EMNLP (Findings), Vol. 20. Association for Computational Linguistics (ACL), 2566–2575.
[83]
Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, and Jack Xin. 2018. Understanding straight-through estimator in training activation quantized neural nets. In Proceedings of the International Conference on Learning Representations.
[84]
Yu Yin, Qi Liu, Zhenya Huang, Enhong Chen, Wei Tong, Shijin Wang, and Yu Su. 2019. Quesnet: A unified representation for heterogeneous test questions. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1328–1336.
[85]
Chunyu Yuan and Sos S. Agaian. 2021. A comprehensive review of binary neural network. arXiv:2110.06804. Retrieved from https://arxiv.org/abs/2110.06804.
[86]
Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010. Self-taught hashing for fast similarity search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 18–25.
[87]
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, Juanzi Li, Walter Luyten, and Marie-Francine Moens. 2017. Fast and flexible top-k similarity search on large networks. ACM Transactions on Information Systems 36, 2 (2017), 1–30.
[88]
Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, and Lawrence Carin. 2017. Deconvolutional paragraph representation learning. In Proceedings of the Advances in Neural Information Processing Systems. Neural information processing systems foundation, 4170–4180.
[89]
Yan Zhang, Ivor Tsang, Hongzhi Yin, Guowu Yang, Defu Lian, and Jingjing Li. 2020. Deep pairwise hashing for cold-start recommendation. IEEE Transactions on Knowledge and Data Engineering 34, 7 (2020), 3169–3181.
[90]
Yifei Zhang and Hao Zhu. 2019. Doc2hash: Learning discrete latent variables for documents retrieval. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2235–2240.
[91]
Le Zhao and Jamie Callan. 2010. Term necessity prediction. In Proceedings of the 19th ACM Conference on Information and Knowledge Management. ACM, 259–268.
[92]
Meng Zhao, Hao Wang, Liangliang Cao, Chen Zhang, Hongzhi Yin, and Fanjiang Xu. 2015. Lsif: A system for large-scale information flow detection based on topic-related semantic similarity measurement. In Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE, 417–424.
[93]
Wenyu Zhao, Teli Ma, Xuan Gong, Baochang Zhang, and David Doermann. 2020. A review of recent advances of binary neural networks for edge computing. IEEE Journal on Miniaturization for Air and Space Systems 2, 1 (2020), 25–35.
[94]
Lin Zheng, Qinliang Su, Dinghan Shen, and Changyou Chen. 2020. Generative semantic hashing enhanced via boltzmann machines. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 777–788.

Cited By

View all
  • (2024)CONSIDERProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i8.28713(8679-8687)Online publication date: 20-Feb-2024
  • (2024)LTP-MMF: Toward Long-Term Provider Max-Min Fairness under Recommendation Feedback LoopsACM Transactions on Information Systems10.1145/369586743:1(1-29)Online publication date: 26-Nov-2024
  • (2024)Deep Causal Reasoning for RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/365398515:4(1-25)Online publication date: 18-Jun-2024
  • Show More Cited By

Index Terms

  1. An Efficient and Robust Semantic Hashing Framework for Similar Text Search

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 41, Issue 4
      October 2023
      958 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3587261
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 March 2023
      Online AM: 30 January 2023
      Accepted: 20 October 2022
      Revised: 14 September 2022
      Received: 31 May 2022
      Published in TOIS Volume 41, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Semantic hashing
      2. similarity search
      3. efficient codes
      4. robust codes

      Qualifiers

      • Research-article

      Funding Sources

      • National Key Research and Development Program of China
      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)373
      • Downloads (Last 6 weeks)45
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)CONSIDERProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i8.28713(8679-8687)Online publication date: 20-Feb-2024
      • (2024)LTP-MMF: Toward Long-Term Provider Max-Min Fairness under Recommendation Feedback LoopsACM Transactions on Information Systems10.1145/369586743:1(1-29)Online publication date: 26-Nov-2024
      • (2024)Deep Causal Reasoning for RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/365398515:4(1-25)Online publication date: 18-Jun-2024
      • (2024)Causal Inference in Recommender Systems: A Survey and Future DirectionsACM Transactions on Information Systems10.1145/363904842:4(1-32)Online publication date: 9-Feb-2024
      • (2024)FairGap: Fairness-Aware Recommendation via Generating Counterfactual GraphACM Transactions on Information Systems10.1145/363835242:4(1-25)Online publication date: 9-Feb-2024
      • (2024)M-scan: A Multi-Scenario Causal-driven Adaptive Network for RecommendationProceedings of the ACM on Web Conference 202410.1145/3589334.3645635(3844-3853)Online publication date: 13-May-2024
      • (2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
      • (2024)Disentangled causal representation learning for debiasing recommendation with uniform dataApplied Intelligence10.1007/s10489-024-05497-954:8(6760-6775)Online publication date: 24-May-2024
      • (2023)RecruitPro: A Pretrained Language Model with Skill-Aware Prompt Learning for Intelligent RecruitmentProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599894(3991-4002)Online publication date: 6-Aug-2023
      • (2023)Towards Trustworthy Recommender System: A Faithful and Responsible Recommendation PerspectiveProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591798(3497-3497)Online publication date: 19-Jul-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media