research-article

DeepStore: In-Storage Acceleration for Intelligent Queries

Authors:
Vikram Sharma Mailthody

UIUC

UIUC
View Profile

,
Zaid Qureshi

UIUC

UIUC
View Profile

,
Weixin Liang

Stanford University

Stanford University
View Profile

,
Ziyan Feng

UIUC

UIUC
View Profile

,
Simon Garcia de Gonzalo

UIUC

UIUC
View Profile

,
Youjie Li

UIUC

UIUC
View Profile

,
Hubertus Franke

IBM Research

IBM Research
View Profile

,
Jinjun Xiong

IBM Research

IBM Research
View Profile

,
Jian Huang

UIUC

UIUC
View Profile

,
Wen-mei Hwu

UIUC

UIUC
View Profile

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2019Pages 224–238https://doi.org/10.1145/3352460.3358320

Published:12 October 2019Publication History

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 224–238

ABSTRACT

Recent advancements in deep learning techniques facilitate intelligent-query support in diverse applications, such as content-based image retrieval and audio texturing. Unlike conventional key-based queries, these intelligent queries lack efficient indexing and require complex compute operations for feature matching. To achieve high-performance intelligent querying against massive datasets, modern computing systems employ GPUs in-conjunction with solid-state drives (SSDs) for fast data access and parallel data processing. However, our characterization with various intelligent-query workloads developed with deep neural networks (DNNs), shows that the storage I/O bandwidth is still the major bottleneck that contributes 56%--90% of the query execution time.

To this end, we present DeepStore, an in-storage accelerator architecture for intelligent queries. It consists of (1) energy-efficient in-storage accelerators designed specifically for supporting DNN-based intelligent queries, under the resource constraints in modern SSD controllers; (2) a similarity-based in-storage query cache to exploit the temporal locality of user queries for further performance improvement; and (3) a lightweight in-storage runtime system working as the query engine, which provides a simple software abstraction to support different types of intelligent queries. DeepStore exploits SSD parallelisms with design space exploration for achieving the maximal energy efficiency for in-storage accelerators. We validate DeepStore design with an SSD simulator, and evaluate it with a variety of vision, text, and audio based intelligent queries. Compared with the state-of-the-art GPU+SSD approach, DeepStore improves the query performance by up to 17.7×, and energy-efficiency by up to 78.6×.

References

2007. Micron C200 1.8inch NAND Flash SSD.Google Scholar
2015. PCIe 3.0 Specification. https://pcisig.com/specifications.Google Scholar
2016. NVIDIA Tesla P100 Architecture Whitepaper. https://www.nvidia.com/object/pascal-architecture-whitepaper.html.Google Scholar
2017. Intel/Micron 64L 3D NAND Analysis.Google Scholar
2017. NVIDIA Tesla V100 GPU Architecture Whitepaper. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf.Google Scholar
2017. Open Neural Network Exchange format. https://onnx.ai/.Google Scholar
2018. Intel Nervana Nueral Network Processors. https://ai.intel.com/nervana-nnp/.Google Scholar
2018. Intel SSD DC P4500 Series.Google Scholar
2018. Micron 9200 NVMe SSD.Google Scholar
2018. Ultra-Low Latency with Samsung Z-NAND SSD.Google Scholar
2019. Open NAND Flash Interface Specification 4.1. http://www.onfi.org/-/media/client/onfi/specs/onfi_4_1_gold.pdf?la=en.Google Scholar
2019. See Our Machine Learning Accelerator at Embedded World.Google Scholar
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). Savannah, GA.Google ScholarDigital Library
Ahmed Abulila, Vikram S Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-mei Hwu. 2019. FlatFlash: Exploiting the Byte-Accessibility of SSDs within A Unified Memory-Storage Hierarchy. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'19). Providence, RI, USA.Google ScholarDigital Library
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proceeding of the USENIX 2008 Annual Technical Conference (USENIX ATC'08). Boston, MA.Google ScholarDigital Library
Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An Improved Deep Learning Architecture for Person Re-identification. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition(CVPR'15). Boston, MA.Google ScholarCross Ref
Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the European conference on computer vision (ECCV'14). Zurich, Switzerland.Google ScholarCross Ref
Duck-Ho Bae, Jin-Hyung Kim, Sang-Wook Kim, Hyunok Oh, and Chanik Park. 2013. Intelligent SSD: A Turbo for Big Data Mining. In Proceedings of the 22nd ACM International Conference of Information Knowledge Management (CIKM'13). San Francisco, CA.Google ScholarDigital Library
S. Boboila, Y. Kim, S. S. Vazhkudai, P. Desnoyers, and G. M. Shipman. 2012. Active Flash: Out-of-core Data Analytics on Flash Storage. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST'12). Monterey, CA.Google Scholar
Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. 2018. Rosetta: Large Scale System for Text Detection and Recognition in Images. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18). London, United Kingdom.Google ScholarDigital Library
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15). Lisbon, Portugal.Google Scholar
Tolga Bozkaya and Meral Ozsoyoglu. 1997. Distance-based Indexing for High-dimensional Metric Spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '97). Tucson, AZ.Google ScholarDigital Library
Joel Brogan, Paolo Bestagini, Aparna Bharati, Allan Pinto, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, and Walter Scheirer. 2017. Spotting The Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization. In Proceedings of the IEEE International Conference on Image Processing (ICIP'17). Beijing, China.Google ScholarCross Ref
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification Using a "Siamese" Time Delay Neural Network. In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS'93). San Francisco, CA.Google ScholarDigital Library
Matthew Brown, Gang Hua, and Simon Winder. 2011. Discriminative Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI'11) 33, 1 (2011).Google Scholar
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. arXiv e-prints (March 2018).Google Scholar
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C. Lawrence Zitnick. 2015. Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv:cs.CV/1504.00325Google Scholar
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th IEEE/ACM International Symposium on Microarchitecture (MICRO'14). Cambridge, England.Google ScholarDigital Library
Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits (SSC'17) 52, 1 (Jan 2017).Google Scholar
Z. Cheng, X. Wu, Y. Liu, and X. Hua. 2017. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). Honolulu, HI.Google Scholar
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).Google Scholar
Flavio Chierichetti, Ravi Kumar, and Sergei Vassilvitskii. 2009. Similarity Caching. In Proceedings of the Twenty-eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'09). Providence, Rhode Island, USA.Google Scholar
Benjamin Y. Cho, Won Seob Jeong, Doohwan Oh, and Won Woo Ro. 2013. XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD. In Proceedings of the 1st Workshop on Near-Data Processing in Conjunction with the 46th IEEE/ACM International Symposium on Microarchitecture (WoNDP). Davis, CA.Google Scholar
Jason Clemons, Chih-Chi Cheng, Iuri Frosio, Daniel Johnson, and Stephen W Keckler. 2016. A Patch Memory System for Image Processing and Computer Vision. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'16). Taipei, Taiwan.Google ScholarCross Ref
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC'10). ACM, New York, NY, USA.Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.Google ScholarDigital Library
J. Deng, A. C. Berg, and L. Fei-Fei. 2011. Hierarchical semantic indexing for large scale image retrieval. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11).Google Scholar
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). New York, NY.Google Scholar
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13). New York, NY, USA.Google Scholar
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference (EuroSys'18). Porto, Portugal.Google ScholarDigital Library
Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018. Bandana: Using non-volatile memory for storing deep learning models. In proceedings of SysML Conference (SysML'18) (2018).Google Scholar
Fabrizio Falchi, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fausto Rabitti. 2008. A Metric Cache for Similarity Search. In Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval. Napa Valley, California, USA.Google ScholarDigital Library
Yuxun Fang, Qiuxia Wu, and Wenxiong Kang. 2018. A Novel Finger Vein Verification System Based on Two-stream Convolutional Network Learning. Neurocomputing (2018).Google Scholar
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Xi'an, China.Google ScholarDigital Library
B. Gu, A. S. Yoon, D. H. Bae, I. Jo, J. Lee, J. Yoon, J. U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, and D. Chang. 2016. Biscuit: A Framework for Near-Data Processing of Big Data Workloads. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16). Seoul, Korea.Google Scholar
X. Gu, Y. Wong, L. Shou, P. Peng, G. Chen, and M. S. Kankanhalli. 2018. MultiModal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis. IEEE Transactions on Multimedia (2018).Google Scholar
Aayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). Washington, DC, USA.Google ScholarDigital Library
M Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C Berg, and Tamara L Berg. 2015. Where to Buy It: Matching Street Clothing Photos in Online Shops. In Proceedings of the IEEE international conference on computer vision (ICCV'15). Santiago, Chile.Google ScholarDigital Library
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16). Seoul, Republic of Korea.Google Scholar
Song Han, Huizi Mao, and William J Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In Proceedings of the 6th International Conference on Learning Representations (ICLR'16). Vancouver, Canada.Google Scholar
Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, and Alexander C Berg. 2015. Matchnet: Unifying Feature and Metric Learning for Patch-based Matching. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition(CVPR'15). Boston, MA.Google Scholar
Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W Fletcher. 2018. Morph: Flexible Acceleration for 3D CNN-based Video Understanding. In Proceedings of the 51th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'18).Google ScholarDigital Library
Jian Huang, Anirudh Badam, Laura Caulfield, Suman Nath, Sudipta Sengupta, Bikash Sharma, and Moinuddin K. Qureshi. 2017. FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs. In Proceedings of the 15th Usenix Conference on File and Storage Technologies (FAST'17). Santa clara, CA.Google ScholarDigital Library
Jian Huang, Anirudh Badam, Moinuddin K. Qureshi, and Karsten Schwan. 2015. Unified Address Translation for Memory-mapped SSDs with FlashMap. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). Portland, OR.Google ScholarDigital Library
Qi Huang, Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito VI, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke, and Wyatt Lloyd. 2017. SVE: Distributed Video Processing at Facebook Scale. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP'17). Shanghai, China.Google ScholarDigital Library
Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. An Analysis of Facebook Photo Caching. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP'13). Farmington, PA.Google Scholar
Y. Huang, W. Wang, and L. Wang. 2017. Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). Honolulu, HI.Google Scholar
Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual Search at Pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15). Sydney, Australia.Google ScholarDigital Library
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with gpus. arXiv preprint arXiv:1702.08734 (2017).Google Scholar
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA'17). Toronto, Canada.Google ScholarDigital Library
S. Jun, A. Wright, S. Zhang, S. Xu, and Arvind. 2018. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA'18). Los Angeles, CA.Google ScholarDigital Library
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An Appliance for Big Data Analytics. SIGARCH Comput. Archit. News 43, 3 (June 2015).Google ScholarDigital Library
Y. Kang, Y. Kee, E. L. Miller, and C. Park. 2013. Enabling cost-effective data processing with smart SSD. In Proceedings of the 28th IEEE Conference on Mass Storage Systems and Technologies (MSST'13). Lake Arrowhead, CA.Google Scholar
Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'17). Cambridge, Massachusetts.Google Scholar
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. SIGPLAN Not. 53, 2 (March 2018).Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature (2015).Google Scholar
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. Deepreid: Deep Filter Pairing Neural Network for Person Re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). Columbus, OH.Google ScholarDigital Library
Youjie Li, Xiaohao Wang, Iou-Jen Liu, Deming Chen, Alexander Schwing, and Jian Huang. 2019. Accelerating Distributed Reinforcement Learning with InSwitch Computing. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA'19). Phoenix, AZ.Google ScholarDigital Library
Hongye Liu, Yonghong Tian, Yaowei Yang, Lu Pang, and Tiejun Huang. 2016. Deep relative distance learning: Tell the difference between similar vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). Las Vegas, NV.Google ScholarCross Ref
Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, and Ling Shao. 2017. Deep Sketch Hashing: Fast Free-hand Sketch-based Image Retrieval. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR'17). Honolulu, HI.Google ScholarCross Ref
Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An Instruction Set Architecture for Neural Networks. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA'16). Seoul, South Korea.Google ScholarDigital Library
R. Lu, K. Wu, Z. Duan, and C. Zhang. 2017. Deep ranking: Triplet MatchNet for music metric learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17).Google Scholar
Micron. 2017. Micron 3D NAND technology. https://www.micron.com/products/nand-flash/3d-nand.Google Scholar
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. HP laboratories (2009).Google Scholar
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. SIGARCH Comput. Archit. News 45, 2 (June 2017).Google ScholarDigital Library
Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications. arXiv preprint arXiv:1811.09886 (2018).Google Scholar
Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2017. Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. International Journal of Computer Vision (IJCV'17) 123 (2017).Google Scholar
Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proceedings of the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO'16). Taipei, Taiwan.Google ScholarCross Ref
Dong-ryul Ryu. 2012. Solid State Disk Controller Apparatus. US Patent 8,159,889.Google Scholar
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN Accelerator. arXiv preprint arXiv:1811.02883 (2018).Google Scholar
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A User-programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). Broomfield, CO.Google ScholarDigital Library
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'15). New York, NY, USA.Google ScholarDigital Library
Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, and Xiaogang Wang. 2017. Learning Deep Neural Networks for Vehicle Re-id with Visual-spatio-temporal Path Proposals. In Proceedings of the International Conference on Computer Vision (ICCV'17). Venice, Italy.Google ScholarCross Ref
Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, and Xiaogang Wang. 2018. End-to-End Deep Kronecker-Product Matching for Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18). Salt Lake City, UT.Google ScholarCross Ref
Cooper Smith. 2013. Facebook users are uploading 350 million new photos each day. Business insider 18 (2013).Google Scholar
Vinay Ashok Somanache, Timothy W Swatosh, Pamela S Hempstead, Jackson L Ellis, Michael S Hicken, and Martin S Dell. 2013. Flash controller hardware architecture for flash devices. US Patent App. 13/432,394.Google Scholar
Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-efficient, In-situ Data Analytics on Extreme-scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST'13). San Jose, CA.Google ScholarDigital Library
Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers. 2012. Reducing Data Movement Costs Using Energy Efficient, Active Computation on SSD. In Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems (HotPower'12). Hollywood, CA.Google ScholarDigital Library
H. Tseng, Q. Zhao, Y. Zhou, M. Gahagan, and S. Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In Proceedings of the 43rd IEEE Annual International Symposium on Computer Architecture (ISCA'16). Taipei, Taiwan.Google Scholar
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In Proceedings of the 22nd ACM International Conference on Multimedia (ACM Multimedia'14). Orlando, FL.Google ScholarDigital Library
Jingdong Wang and Xian-Sheng Hua. 2011. Interactive Image Search by Color Map. ACM Trans. Intell. Syst. Technol. 3, 1 (Oct. 2011).Google ScholarDigital Library
Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 (2016).Google Scholar
Liwei Wang, Yin Li, Jing Huang, and Svetlana Lazebnik. 2018. Learning Two-branch Neural Networks for Image-text Matching Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI'18) (2018).Google Scholar
Xiaohao Wang, You Zhou, Chance C. Coats, and Jian Huang. 2019. Project Almanac: A Time-Traveling Solid-State Drive. In Proceedings of the 14th European Conference on Computer Systems (EuroSys'19). Dresden, Germany.Google ScholarDigital Library
S. Winder, G. Hua, and M. Brown. 2009. Picking the best DAISY. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). Miami, FL.Google Scholar
Pengcheng Wu, Steven C.H. Hoi, Hao Xia, Peilin Zhao, Dayong Wang, and Chunyan Miao. 2013. Online Multimodal Deep Similarity Learning with Application to Image Retrieval. In Proceedings of the 21st ACM International Conference on Multimedia (MM'13). New YorK, NY.Google ScholarDigital Library
Baixi Xing, Kejun Zhang, Shouqian Sun, Lekai Zhang, Zenggui Gao, Jiaxi Wang, and Shi Chen. 2015. Emotion-driven Chinese Folk Music-image Retrieval Based on DE-SVM. Neurocomputing 148 (2015).Google Scholar
Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2010. Image Search by Concept Map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). Geneva, Switzerland.Google ScholarDigital Library
Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2010. Interactive Image Search by 2D Semantic Map. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). Raleigh, NC.Google ScholarDigital Library
Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, and Robinson Piramuthu. 2017. Visual Search at eBay. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'17). Halifax, Canada.Google ScholarDigital Library
Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Emberton Bell, Jeff Ou Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, and Mark Horowitz. 2018. DNN Dataflow Choice is Overrated. arXiv preprint arXiv:1809.04070 (2018).Google Scholar
Hantao Yao, Shiliang Zhang, Dongming Zhang, Yongdong Zhang, Jintao Li, Yu Wang, and Qi Tian. 2017. Large-scale person re-identification as retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'17). Hong Kong.Google ScholarCross Ref
R. Yazdani, M. Riera, J. Arnau, and A. GonzÃąlez. 2018. The Dark Side of DNN Pruning. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA'18). Los Angeles, CA.Google Scholar
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics (TACL'14) 2 (2014).Google Scholar
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'16). Taipei, Taiwan.Google ScholarCross Ref
Wengang Zhou, Houqiang Li, and Qi Tian. 2017. Recent Advance in Content-based Image Retrieval: A Literature Survey. CoRR (2017).Google Scholar

Index Terms

DeepStore: In-Storage Acceleration for Intelligent Queries
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
  2. Dependable and fault-tolerant systems and networks
    1. Secondary storage organization
2. Information systems
  1. Information retrieval

Recommendations

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has ...
Read More
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, ...
Read More
Big Data Analytics on Flash Storage with Accelerators
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Complex analytics of the vast amount of data collected via social media, cell phones, ubiquitous smart sensors, and satellites is likely to be the biggest economic driver for the IT industry over the next decade. For many "Big Data" applications, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
October 2019
1104 pages
ISBN:9781450369381
DOI:10.1145/3352460

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hardware Accelerators
In-Storage Computing
Information Retrieval
Intelligent Query
Solid-State Drive
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 1,413
  Total Downloads
- Downloads (Last 12 months)222
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DeepStore: In-Storage Acceleration for Intelligent Queries

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Big Data Analytics on Flash Storage with Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media