skip to main content
10.1145/3352460.3358320acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

DeepStore: In-Storage Acceleration for Intelligent Queries

Published:12 October 2019Publication History

ABSTRACT

Recent advancements in deep learning techniques facilitate intelligent-query support in diverse applications, such as content-based image retrieval and audio texturing. Unlike conventional key-based queries, these intelligent queries lack efficient indexing and require complex compute operations for feature matching. To achieve high-performance intelligent querying against massive datasets, modern computing systems employ GPUs in-conjunction with solid-state drives (SSDs) for fast data access and parallel data processing. However, our characterization with various intelligent-query workloads developed with deep neural networks (DNNs), shows that the storage I/O bandwidth is still the major bottleneck that contributes 56%--90% of the query execution time.

To this end, we present DeepStore, an in-storage accelerator architecture for intelligent queries. It consists of (1) energy-efficient in-storage accelerators designed specifically for supporting DNN-based intelligent queries, under the resource constraints in modern SSD controllers; (2) a similarity-based in-storage query cache to exploit the temporal locality of user queries for further performance improvement; and (3) a lightweight in-storage runtime system working as the query engine, which provides a simple software abstraction to support different types of intelligent queries. DeepStore exploits SSD parallelisms with design space exploration for achieving the maximal energy efficiency for in-storage accelerators. We validate DeepStore design with an SSD simulator, and evaluate it with a variety of vision, text, and audio based intelligent queries. Compared with the state-of-the-art GPU+SSD approach, DeepStore improves the query performance by up to 17.7×, and energy-efficiency by up to 78.6×.

References

  1. 2007. Micron C200 1.8inch NAND Flash SSD.Google ScholarGoogle Scholar
  2. 2015. PCIe 3.0 Specification. https://pcisig.com/specifications.Google ScholarGoogle Scholar
  3. 2016. NVIDIA Tesla P100 Architecture Whitepaper. https://www.nvidia.com/object/pascal-architecture-whitepaper.html.Google ScholarGoogle Scholar
  4. 2017. Intel/Micron 64L 3D NAND Analysis.Google ScholarGoogle Scholar
  5. 2017. NVIDIA Tesla V100 GPU Architecture Whitepaper. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf.Google ScholarGoogle Scholar
  6. 2017. Open Neural Network Exchange format. https://onnx.ai/.Google ScholarGoogle Scholar
  7. 2018. Intel Nervana Nueral Network Processors. https://ai.intel.com/nervana-nnp/.Google ScholarGoogle Scholar
  8. 2018. Intel SSD DC P4500 Series.Google ScholarGoogle Scholar
  9. 2018. Micron 9200 NVMe SSD.Google ScholarGoogle Scholar
  10. 2018. Ultra-Low Latency with Samsung Z-NAND SSD.Google ScholarGoogle Scholar
  11. 2019. Open NAND Flash Interface Specification 4.1. http://www.onfi.org/-/media/client/onfi/specs/onfi_4_1_gold.pdf?la=en.Google ScholarGoogle Scholar
  12. 2019. See Our Machine Learning Accelerator at Embedded World.Google ScholarGoogle Scholar
  13. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). Savannah, GA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ahmed Abulila, Vikram S Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-mei Hwu. 2019. FlatFlash: Exploiting the Byte-Accessibility of SSDs within A Unified Memory-Storage Hierarchy. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'19). Providence, RI, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proceeding of the USENIX 2008 Annual Technical Conference (USENIX ATC'08). Boston, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An Improved Deep Learning Architecture for Person Re-identification. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition(CVPR'15). Boston, MA.Google ScholarGoogle ScholarCross RefCross Ref
  17. Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the European conference on computer vision (ECCV'14). Zurich, Switzerland.Google ScholarGoogle ScholarCross RefCross Ref
  18. Duck-Ho Bae, Jin-Hyung Kim, Sang-Wook Kim, Hyunok Oh, and Chanik Park. 2013. Intelligent SSD: A Turbo for Big Data Mining. In Proceedings of the 22nd ACM International Conference of Information Knowledge Management (CIKM'13). San Francisco, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Boboila, Y. Kim, S. S. Vazhkudai, P. Desnoyers, and G. M. Shipman. 2012. Active Flash: Out-of-core Data Analytics on Flash Storage. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST'12). Monterey, CA.Google ScholarGoogle Scholar
  20. Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. 2018. Rosetta: Large Scale System for Text Detection and Recognition in Images. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18). London, United Kingdom.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15). Lisbon, Portugal.Google ScholarGoogle Scholar
  22. Tolga Bozkaya and Meral Ozsoyoglu. 1997. Distance-based Indexing for High-dimensional Metric Spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '97). Tucson, AZ.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Joel Brogan, Paolo Bestagini, Aparna Bharati, Allan Pinto, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, and Walter Scheirer. 2017. Spotting The Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization. In Proceedings of the IEEE International Conference on Image Processing (ICIP'17). Beijing, China.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification Using a "Siamese" Time Delay Neural Network. In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS'93). San Francisco, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Matthew Brown, Gang Hua, and Simon Winder. 2011. Discriminative Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI'11) 33, 1 (2011).Google ScholarGoogle Scholar
  26. Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. arXiv e-prints (March 2018).Google ScholarGoogle Scholar
  27. Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C. Lawrence Zitnick. 2015. Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv:cs.CV/1504.00325Google ScholarGoogle Scholar
  28. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th IEEE/ACM International Symposium on Microarchitecture (MICRO'14). Cambridge, England.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits (SSC'17) 52, 1 (Jan 2017).Google ScholarGoogle Scholar
  30. Z. Cheng, X. Wu, Y. Liu, and X. Hua. 2017. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). Honolulu, HI.Google ScholarGoogle Scholar
  31. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google ScholarGoogle Scholar
  32. Flavio Chierichetti, Ravi Kumar, and Sergei Vassilvitskii. 2009. Similarity Caching. In Proceedings of the Twenty-eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'09). Providence, Rhode Island, USA.Google ScholarGoogle Scholar
  33. Benjamin Y. Cho, Won Seob Jeong, Doohwan Oh, and Won Woo Ro. 2013. XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD. In Proceedings of the 1st Workshop on Near-Data Processing in Conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (WoNDP). Davis, CA.Google ScholarGoogle Scholar
  34. Jason Clemons, Chih-Chi Cheng, Iuri Frosio, Daniel Johnson, and Stephen W Keckler. 2016. A Patch Memory System for Image Processing and Computer Vision. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'16). Taipei, Taiwan.Google ScholarGoogle ScholarCross RefCross Ref
  35. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC'10). ACM, New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Deng, A. C. Berg, and L. Fei-Fei. 2011. Hierarchical semantic indexing for large scale image retrieval. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11).Google ScholarGoogle Scholar
  38. Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). New York, NY.Google ScholarGoogle Scholar
  39. Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13). New York, NY, USA.Google ScholarGoogle Scholar
  40. Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference (EuroSys'18). Porto, Portugal.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018. Bandana: Using non-volatile memory for storing deep learning models. In proceedings of SysML Conference (SysML'18) (2018).Google ScholarGoogle Scholar
  42. Fabrizio Falchi, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fausto Rabitti. 2008. A Metric Cache for Similarity Search. In Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval. Napa Valley, California, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yuxun Fang, Qiuxia Wu, and Wenxiong Kang. 2018. A Novel Finger Vein Verification System Based on Two-stream Convolutional Network Learning. Neurocomputing (2018).Google ScholarGoogle Scholar
  44. Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Xi'an, China.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. B. Gu, A. S. Yoon, D. H. Bae, I. Jo, J. Lee, J. Yoon, J. U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, and D. Chang. 2016. Biscuit: A Framework for Near-Data Processing of Big Data Workloads. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16). Seoul, Korea.Google ScholarGoogle Scholar
  46. X. Gu, Y. Wong, L. Shou, P. Peng, G. Chen, and M. S. Kankanhalli. 2018. MultiModal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis. IEEE Transactions on Multimedia (2018).Google ScholarGoogle Scholar
  47. Aayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). Washington, DC, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C Berg, and Tamara L Berg. 2015. Where to Buy It: Matching Street Clothing Photos in Online Shops. In Proceedings of the IEEE international conference on computer vision (ICCV'15). Santiago, Chile.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16). Seoul, Republic of Korea.Google ScholarGoogle Scholar
  50. Song Han, Huizi Mao, and William J Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In Proceedings of the 6th International Conference on Learning Representations (ICLR'16). Vancouver, Canada.Google ScholarGoogle Scholar
  51. Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, and Alexander C Berg. 2015. Matchnet: Unifying Feature and Metric Learning for Patch-based Matching. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition(CVPR'15). Boston, MA.Google ScholarGoogle Scholar
  52. Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W Fletcher. 2018. Morph: Flexible Acceleration for 3D CNN-based Video Understanding. In Proceedings of the 51th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jian Huang, Anirudh Badam, Laura Caulfield, Suman Nath, Sudipta Sengupta, Bikash Sharma, and Moinuddin K. Qureshi. 2017. FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs. In Proceedings of the 15th Usenix Conference on File and Storage Technologies (FAST'17). Santa clara, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jian Huang, Anirudh Badam, Moinuddin K. Qureshi, and Karsten Schwan. 2015. Unified Address Translation for Memory-mapped SSDs with FlashMap. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). Portland, OR.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Qi Huang, Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito VI, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke, and Wyatt Lloyd. 2017. SVE: Distributed Video Processing at Facebook Scale. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP'17). Shanghai, China.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. An Analysis of Facebook Photo Caching. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP'13). Farmington, PA.Google ScholarGoogle Scholar
  57. Y. Huang, W. Wang, and L. Wang. 2017. Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). Honolulu, HI.Google ScholarGoogle Scholar
  58. Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual Search at Pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15). Sydney, Australia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with gpus. arXiv preprint arXiv:1702.08734 (2017).Google ScholarGoogle Scholar
  60. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA'17). Toronto, Canada.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. S. Jun, A. Wright, S. Zhang, S. Xu, and Arvind. 2018. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA'18). Los Angeles, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An Appliance for Big Data Analytics. SIGARCH Comput. Archit. News 43, 3 (June 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Y. Kang, Y. Kee, E. L. Miller, and C. Park. 2013. Enabling cost-effective data processing with smart SSD. In Proceedings of the 28th IEEE Conference on Mass Storage Systems and Technologies (MSST'13). Lake Arrowhead, CA.Google ScholarGoogle Scholar
  64. Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'17). Cambridge, Massachusetts.Google ScholarGoogle Scholar
  65. Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. SIGPLAN Not. 53, 2 (March 2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature (2015).Google ScholarGoogle Scholar
  67. Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. Deepreid: Deep Filter Pairing Neural Network for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). Columbus, OH.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Youjie Li, Xiaohao Wang, Iou-Jen Liu, Deming Chen, Alexander Schwing, and Jian Huang. 2019. Accelerating Distributed Reinforcement Learning with InSwitch Computing. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA'19). Phoenix, AZ.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Hongye Liu, Yonghong Tian, Yaowei Yang, Lu Pang, and Tiejun Huang. 2016. Deep relative distance learning: Tell the difference between similar vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). Las Vegas, NV.Google ScholarGoogle ScholarCross RefCross Ref
  70. Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, and Ling Shao. 2017. Deep Sketch Hashing: Fast Free-hand Sketch-based Image Retrieval. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR'17). Honolulu, HI.Google ScholarGoogle ScholarCross RefCross Ref
  71. Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An Instruction Set Architecture for Neural Networks. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16). Seoul, South Korea.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. R. Lu, K. Wu, Z. Duan, and C. Zhang. 2017. Deep ranking: Triplet MatchNet for music metric learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17).Google ScholarGoogle Scholar
  73. Micron. 2017. Micron 3D NAND technology. https://www.micron.com/products/nand-flash/3d-nand.Google ScholarGoogle Scholar
  74. Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. HP laboratories (2009).Google ScholarGoogle Scholar
  75. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. SIGARCH Comput. Archit. News 45, 2 (June 2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications. arXiv preprint arXiv:1811.09886 (2018).Google ScholarGoogle Scholar
  77. Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2017. Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. International Journal of Computer Vision (IJCV'17) 123 (2017).Google ScholarGoogle Scholar
  78. Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proceedings of the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO'16). Taipei, Taiwan.Google ScholarGoogle ScholarCross RefCross Ref
  79. Dong-ryul Ryu. 2012. Solid State Disk Controller Apparatus. US Patent 8,159,889.Google ScholarGoogle Scholar
  80. Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN Accelerator. arXiv preprint arXiv:1811.02883 (2018).Google ScholarGoogle Scholar
  81. Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A User-programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). Broomfield, CO.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'15). New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, and Xiaogang Wang. 2017. Learning Deep Neural Networks for Vehicle Re-id with Visual-spatio-temporal Path Proposals. In Proceedings of the International Conference on Computer Vision (ICCV'17). Venice, Italy.Google ScholarGoogle ScholarCross RefCross Ref
  84. Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, and Xiaogang Wang. 2018. End-to-End Deep Kronecker-Product Matching for Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18). Salt Lake City, UT.Google ScholarGoogle ScholarCross RefCross Ref
  85. Cooper Smith. 2013. Facebook users are uploading 350 million new photos each day. Business insider 18 (2013).Google ScholarGoogle Scholar
  86. Vinay Ashok Somanache, Timothy W Swatosh, Pamela S Hempstead, Jackson L Ellis, Michael S Hicken, and Martin S Dell. 2013. Flash controller hardware architecture for flash devices. US Patent App. 13/432,394.Google ScholarGoogle Scholar
  87. Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-efficient, In-situ Data Analytics on Extreme-scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST'13). San Jose, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers. 2012. Reducing Data Movement Costs Using Energy Efficient, Active Computation on SSD. In Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems (HotPower'12). Hollywood, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. H. Tseng, Q. Zhao, Y. Zhou, M. Gahagan, and S. Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In Proceedings of the 43rd IEEE Annual International Symposium on Computer Architecture (ISCA'16). Taipei, Taiwan.Google ScholarGoogle Scholar
  90. Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In Proceedings of the 22nd ACM International Conference on Multimedia (ACM Multimedia'14). Orlando, FL.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Jingdong Wang and Xian-Sheng Hua. 2011. Interactive Image Search by Color Map. ACM Trans. Intell. Syst. Technol. 3, 1 (Oct. 2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 (2016).Google ScholarGoogle Scholar
  93. Liwei Wang, Yin Li, Jing Huang, and Svetlana Lazebnik. 2018. Learning Two-branch Neural Networks for Image-text Matching Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI'18) (2018).Google ScholarGoogle Scholar
  94. Xiaohao Wang, You Zhou, Chance C. Coats, and Jian Huang. 2019. Project Almanac: A Time-Traveling Solid-State Drive. In Proceedings of the 14th European Conference on Computer Systems (EuroSys'19). Dresden, Germany.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. S. Winder, G. Hua, and M. Brown. 2009. Picking the best DAISY. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). Miami, FL.Google ScholarGoogle Scholar
  96. Pengcheng Wu, Steven C.H. Hoi, Hao Xia, Peilin Zhao, Dayong Wang, and Chunyan Miao. 2013. Online Multimodal Deep Similarity Learning with Application to Image Retrieval. In Proceedings of the 21st ACM International Conference on Multimedia (MM'13). New YorK, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Baixi Xing, Kejun Zhang, Shouqian Sun, Lekai Zhang, Zenggui Gao, Jiaxi Wang, and Shi Chen. 2015. Emotion-driven Chinese Folk Music-image Retrieval Based on DE-SVM. Neurocomputing 148 (2015).Google ScholarGoogle Scholar
  98. Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2010. Image Search by Concept Map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). Geneva, Switzerland.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2010. Interactive Image Search by 2D Semantic Map. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). Raleigh, NC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, and Robinson Piramuthu. 2017. Visual Search at eBay. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'17). Halifax, Canada.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Emberton Bell, Jeff Ou Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, and Mark Horowitz. 2018. DNN Dataflow Choice is Overrated. arXiv preprint arXiv:1809.04070 (2018).Google ScholarGoogle Scholar
  102. Hantao Yao, Shiliang Zhang, Dongming Zhang, Yongdong Zhang, Jintao Li, Yu Wang, and Qi Tian. 2017. Large-scale person re-identification as retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'17). Hong Kong.Google ScholarGoogle ScholarCross RefCross Ref
  103. R. Yazdani, M. Riera, J. Arnau, and A. GonzÃąlez. 2018. The Dark Side of DNN Pruning. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA'18). Los Angeles, CA.Google ScholarGoogle Scholar
  104. Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics (TACL'14) 2 (2014).Google ScholarGoogle Scholar
  105. Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'16). Taipei, Taiwan.Google ScholarGoogle ScholarCross RefCross Ref
  106. Wengang Zhou, Houqiang Li, and Qi Tian. 2017. Recent Advance in Content-based Image Retrieval: A Literature Survey. CoRR (2017).Google ScholarGoogle Scholar

Index Terms

  1. DeepStore: In-Storage Acceleration for Intelligent Queries

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
          October 2019
          1104 pages
          ISBN:9781450369381
          DOI:10.1145/3352460

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 October 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate484of2,242submissions,22%

          Upcoming Conference

          MICRO '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader