ABSTRACT
In the big data era, with the increasing amount of multi-media data, approximate nearest neighbor~(ANN) search has been an important but challenging problem. As a widely applied large-scale ANN search method, hashing has made great progress, and achieved sub-linear search time with low memory space. However, the advances in hashing are based on the availability of large and representative datasets, which often contain sensitive information. Typically, the privacy of this individually sensitive information is compromised. In this paper, we tackle this valuable yet challenging problem and formulate a task termed as private hashing, which takes into account both searching performance and privacy protection. Specifically, we propose a novel noise mechanism, i.e., Random Flipping, and two private hashing algorithms, i.e., PHashing and PITQ, with the refined analysis within the framework of differential privacy, since differential privacy is a well-established technique to measure the privacy leakage of an algorithm. Random Flipping targets binary scenarios and leverages the "Imperceptible Lying" idea to guarantee ε-differential privacy by flipping each datum of the binary matrix (noise addition). To preserve ε-differential privacy, PHashing perturbs and adds noise to the hash codes learned by non-private hashing algorithms using Random Flipping. However, the noise addition for privacy in PHashing will cause severe performance drops. To alleviate this problem, PITQ leverages the power of alternative learning to distribute the noise generated by Random Flipping into each iteration while preserving ε-differential privacy. Furthermore, to empirically evaluate our algorithms, we conduct comprehensive experiments on the image search task and demonstrate that proposed algorithms achieve equal performance compared with non-private hashing methods.
Supplemental Material
Available for Download
Here, we present proofs of theorems and lemmas, and additional experiments results which are not presented in the main paper due to the limitation of space.
- Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, New York, NY, USA, 308--318. https://doi.org/10.1145/2976749.2978318Google ScholarDigital Library
- Naman Agarwal and Karan Singh. 2017. The Price of Differential Privacy for Online Learning. In Proceedings of the 34th International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia, 32--40.Google Scholar
- Alexandr Andoni and Piotr Indyk. 2008. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. Commun. ACM, Vol. 51, 1 (Jan. 2008), 117--122. https://doi.org/10.1145/1327452.1327494Google ScholarDigital Library
- Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, New York, NY, USA, 793--801. https://doi.org/10.1145/2746539.2746553Google ScholarDigital Library
- Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2017. HashNet: Deep Learning to Hash by Continuation. In The IEEE International Conference on Computer Vision. 5608--5617.Google Scholar
- Zhangjie Cao, Ziping Sun, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2018. Deep Priority Hashing. In Proceedings of the 26th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 1653--1661. https://doi.org/10.1145/3240508.3240543Google Scholar
- Kamalika Chaudhuri and Claire Monteleoni. 2009. Privacy-preserving logistic regression. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.). Curran Associates, Inc., 289--296.Google ScholarDigital Library
- Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. Journal of Machine Learning Research, Vol. 12 (July 2011), 1069--1109.Google Scholar
- Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. 2013. A Near-Optimal Algorithm for Differentially-Private Principal Components. Journal of Machine Learning Research, Vol. 14, 1 (Jan. 2013), 2905--2943.Google Scholar
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. Association for Computing Machinery, New York, NY, USA, Article 48, 9 pages. https://doi.org/10.1145/1646396.1646452Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
- John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2014. Privacy Aware Learning. J. ACM, Vol. 61, 6, Article 38 (Dec. 2014), 57 pages. https://doi.org/10.1145/2666468Google ScholarDigital Library
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the Third Conference on Theory of Cryptography. Springer-Verlag, Berlin, Heidelberg, 265--284. https://doi.org/10.1007/11681878_14Google ScholarDigital Library
- Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, Vol. 9, 3--4 (Aug. 2014), 211--407. https://doi.org/10.1561/0400000042Google ScholarDigital Library
- Cynthia Dwork, Guy N. Rothblum, and Salil Vadhan. 2010. Boosting and Differential Privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 51--60.Google ScholarDigital Library
- Lianli Gao, Jingkuan Song, Fuhao Zou, Dongxiang Zhang, and Jie Shao. 2015. Scalable Multimedia Retrieval by Deep Learning Hashing with Relative Similarity Learning. In Proceedings of the 23rd ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 903--906. https://doi.org/10.1145/2733373.2806360Google ScholarDigital Library
- Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 518--529.Google ScholarDigital Library
- Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 12 (2013), 2916--2929.Google ScholarDigital Library
- Abhradeep Guha Thakurta and Adam Smith. 2013. (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2733--2741.Google Scholar
- Kun He, Fatih Cakir, Sarah Adel Bargal, and Stan Sclaroff. 2018. Hashing as Tie-Aware Learning to Rank. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. 4023--4032.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
- Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. 2012. Differentially Private Online Learning. In Proceedings of the 25th Annual Conference on Learning Theory, Shie Mannor, Nathan Srebro, and Robert C. Williamson (Eds.). PMLR, Edinburgh, Scotland, 24.1--24.34.Google Scholar
- I-Hong Jhuo. 2019. Supervised Set-to-Set Hashing in Visual Recognition. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 803--810. https://doi.org/10.24963/ijcai.2019/113Google ScholarCross Ref
- Qing-Yuan Jiang and Wu-Jun Li. 2018. Asymmetric Deep Supervised Hashing. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 3342--3349. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17296Google Scholar
- Qing-Yuan Jiang and Wu-Jun Li. 2015. Scalable Graph Hashing with Feature Transformation. In Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2248--2254.Google Scholar
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM, Vol. 60, 6 (May 2017), 84--90. https://doi.org/10.1145/3065386Google ScholarDigital Library
- Brian Kulis and Trevor Darrell. 2009. Learning to Hash with Binary Reconstructive Embeddings. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 1042--1050.Google Scholar
- Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2016. Feature Learning Based Deep Supervised Hashing with Pairwise Labels. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, 1711--1717.Google Scholar
- Jie Lin, Zechao Li, and Jinhui Tang. 2017. Discriminative Deep Hashing for Scalable Face Image Retrieval. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 2266--2272. https://doi.org/10.24963/ijcai.2017/315Google ScholarCross Ref
- Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep Supervised Hashing for Fast Image Retrieval. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2064--2072.Google Scholar
- Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2074--2081.Google ScholarCross Ref
- Xingbo Liu, Xiushan Nie, Quan Zhou, Xiaoming Xi, Lei Zhu, and Yilong Yin. 2019. Supervised Short-Length Hashing. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 3031--3037. https://doi.org/10.24963/ijcai.2019/420Google ScholarCross Ref
- Xu Lu, Lei Zhu, Zhiyong Cheng, Jingjing Li, Xiushan Nie, and Huaxiang Zhang. 2019. Flexible Online Multi-Modal Hashing for Large-Scale Multimedia Retrieval. In Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 1129--1137. https://doi.org/10.1145/3343031.3350999Google ScholarDigital Library
- Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. 2013. Fine-Grained Visual Classification of Aircraft. Technical Report. arxiv: cs-cv/1306.5151Google Scholar
- H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In International Conference on Learning Representations .Google Scholar
- Benjamin I. P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft. 2012. Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning. Journal of Privacy and Confidentiality, Vol. 4, 1 (Jul. 2012). https://doi.org/10.29012/jpc.v4i1.612Google ScholarCross Ref
- Anand D. Sarwate and Kamalika Chaudhuri. 2013. Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data. IEEE Signal Processing Magazine, Vol. 30, 5 (2013), 86--94.Google ScholarCross Ref
- Peter H Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika, Vol. 31, 1 (1966), 1--10.Google ScholarCross Ref
- Fumin Shen, Xin Gao, Li Liu, Yang Yang, and Heng Tao Shen. 2017. Deep Asymmetric Pairwise Hashing. In Proceedings of the 25th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 1522--1530. https://doi.org/10.1145/3123266.3123345Google ScholarDigital Library
- Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015a. Supervised Discrete Hashing. In 2015 IEEE Conference on Computer Vision and Pattern Recognition. 37--45.Google Scholar
- Xiaobo Shen, Fumin Shen, Quan-Sen Sun, and Yun-Hao Yuan. 2015b. Multi-View Latent Hashing for Efficient Multimedia Search. In Proceedings of the 23rd ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 831--834. https://doi.org/10.1145/2733373.2806342Google ScholarDigital Library
- Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, New York, NY, USA, 1310--1321. https://doi.org/10.1145/2810103.2813687Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).Google Scholar
- Ge Song and Xiaoyang Tan. 2017. Hierarchical deep hashing for image retrieval. Frontiers of Computer Science, Vol. 11, 2 (2017), 253--265. https://doi.org/10.1007/s11704-017--6537--3Google ScholarDigital Library
- Jingkuan Song, Lianli Gao, Yan Yan, Dongxiang Zhang, and Nicu Sebe. 2015. Supervised Hashing with Pseudo Labels for Scalable Multimedia Retrieval. In Proceedings of the 23rd ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 827--830. https://doi.org/10.1145/2733373.2806341Google ScholarDigital Library
- Catherine Wah, Steve Branson, Peter Welinder, and Serge Belongie Pietro Perona. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.Google Scholar
- Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010. Semi-supervised hashing for scalable image retrieval. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3424--3431.Google ScholarCross Ref
- Yimu Wang, Renjie Song, Xiu-Shen Wei, and Lijun Zhang. 2020. An Adversarial Domain Adaptation Network For Cross-Domain Fine-Grained Recognition. In 2020 IEEE Winter Conference on Applications of Computer Vision. 1217--1225.Google Scholar
- Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral Hashing. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.). Curran Associates, Inc., 1753--1760.Google Scholar
- Liang Xie, Jialie Shen, Jungong Han, Lei Zhu, and Ling Shao. 2017. Dynamic Multi-View Hashing for Online Image Retrieval. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 3133--3139.Google ScholarCross Ref
- Xinyu Yan, Lijun Zhang, and Wu-Jun Li. 2017. Semi-Supervised Deep Hashing with a Bipartite Graph. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3238--3244.Google ScholarCross Ref
- Chengyuan Zhang, Lei Zhu, Shichao Zhang, and Weiren Yu. 2020. TDHPPIR: An Efficient Deep Hashing Based Privacy-Preserving Image Retrieval Method. Neurocomputing, Vol. 406 (2020), 386 -- 398. https://doi.org/10.1016/j.neucom.2019.11.119Google ScholarCross Ref
- Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. 2012. Functional Mechanism: Regression Analysis under Differential Privacy. Proceedings of the VLDB Endowment, Vol. 5, 11 (July 2012), 1364--1375. https://doi.org/10.14778/2350229.2350253Google ScholarDigital Library
- Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification. IEEE Transactions on Image Processing, Vol. 24, 12 (2015), 4766--4779.Google ScholarDigital Library
Index Terms
- Searching Privately by Imperceptible Lying: A Novel Private Hashing Method with Differential Privacy
Recommendations
A Novel Differential Privacy Approach that Enhances Classification Accuracy
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringIn the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either ...
Applying Differential Privacy to Matrix Factorization
RecSys '15: Proceedings of the 9th ACM Conference on Recommender SystemsRecommender systems are increasingly becoming an integral part of on-line services. As the recommendations rely on personal user information, there is an inherent loss of privacy resulting from the use of such systems. While several works studied ...
Private record matching using differential privacy
EDBT '10: Proceedings of the 13th International Conference on Extending Database TechnologyPrivate matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no ...
Comments