Abstract
In state-of-the-art large-scale data service systems, a data analysis request (e.g., data retrieval) must go through multiple data processing modules cross the I/O stack, and move a large quantity of irrelevant data across the secondary storage, DRAM, and eventually to the on-chip cache. It contributes to tediously long response latency and rising energy consumption in data storage systems. To address this issue, we proposed a Cognitive SSD+ system and used it to build a deep learning-based unstructured data retrieval engine. In the proposed cognitive SSD+, a flash-accessing accelerator, DHS-x, is placed by the side of flash devices to achieve near-data deep learning and hybrid data search (DHS). Such functions of in-SSD deep learning and data search are exposed to the users as library APIs via NVMe command extension, so that it allows the flexible customization of Cognitive SSD+ for different dataset and application scenarios. Unlike the Cognitive SSD that only supports graph search, Cognitive SSD+ integrates a hybrid data search engine to support brute force, kd-tree, and graph search simultaneously. Meanwhile, an auto-selection model is proposed to pick the most appropriate search algorithm according to the inherent characteristics of the dataset to be retrieved. Experimental results on the FPGA-based prototype reveal that the cognitive SSD+ running the proposed DHS framework achieves performance speedup of 3.48X in comparison with the counterpart based on conventional CPU and storage system, and it reduces the overall system energy consumption by up to 4.89X and 1.77X respectively when compared to CPU and GPU based solutions that deliver comparable performance.























Similar content being viewed by others
References
Acharya, A., Uysal, M., Saltz, J.: Active disks: programming model, algorithms and evaluation. SIGPLAN Not 33(11), 81–91 (1998). https://doi.org/10.1145/291006.291026
Andersen, D.G., Franklin, J., Kaminsky, M., et al.: Fawn: A fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles. ACM, New York, NY, USA, SOSP ’09, pp 1–14,(2009) https://doi.org/10.1145/1629575.1629577,
Balasubramonian, R., Chang, J., Manning, T., et al.: Near-data processing: Insights from a micro-46 workshop. IEEE Micro 34(4), 36–42 (2014)https://doi.org/10.1109/MM.2014.55, https://ieeexplore.ieee.org/document/6871738
Boboila, S., Kim, Y., Vazhkudai, S.S., et al.: Active flash: Out-of-core data analytics on flash storage. In: 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–12, (2012) https://doi.org/10.1109/MSST.2012.6232366, https://ieeexplore.ieee.org/document/6232366
Caulfield, A.M., De, A., Coburn, J., et al.: Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, Washington, DC, USA, MICRO ’43, pp 385–395, (2010) https://doi.org/10.1109/MICRO.2010.33,
Chen, T., Du, Z., Sun, N., et al.: Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, ASPLOS ’14, pp 269–284, (2014) https://doi.org/10.1145/2541940.2541967,
Chen, Y.H., Emer, J., Sze, V.: Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, USA, ISCA ’16, pp 367–379, (2016) https://doi.org/10.1109/ISCA.2016.40,
Cheong, W., Yoon, C., Woo, S., et al.: A flash memory controller for 15s ultra-low-latency ssd using high-speed 3d nand flash with 3s read time. In: 2018 IEEE International Solid - State Circuits Conference - (ISSCC), pp 338–340, (2018) https://doi.org/10.1109/ISSCC.2018.8310322
Cho, S., Park, C., Oh, H., et al.: Active disk meets flash: A case for intelligent ssds. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ACM, New York, NY, USA, ICS ’13, pp 91–102,(2013) https://doi.org/10.1145/2464996.2465003,
Choe, H., Lee, S., Park, S., et al.: Near-data processing for machine learning. CoRR abs/1610.02273. (2016) arXiv:1610.02273
De, A., Gokhale, M., Gupta, R., et al.: Minerva: Accelerating data analysis in next-generation ssds. In: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE Computer Society, Washington, DC, USA, FCCM ’13, pp 9–16,(2013) https://doi.org/10.1109/FCCM.2013.46,
Do, J., Kee, Y.S., Patel, J.M., et al.: Query processing on smart ssds: Opportunities and challenges. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, SIGMOD ’13, pp 1221–1230, (2013) https://doi.org/10.1145/2463676.2465295,
Friedman, J., Baskett, F., Shustek, L.: An algorithm for finding nearest neighbors. IEEE Transactions on Computers C-24(10):1000–1006. (1975) https://doi.org/10.1109/T-C.1975.224110
Fu, C., Cai, D.: EFANNA : An extremely fast approximate nearest neighbor search algorithm based on knn graph. CoRR abs/1609.07228 (2016a) , arXiv:1609.07228
Fu, C., Cai, D.: Efanna: An extremely fast approximate nearest neighbor search algorithm based on knn graph (2016b) arXiv preprint arXiv:1609.07228
Fu, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with navigating spreading-out graphs. CoRR abs/1707.00143. (2017a) arXiv:1707.00143
Fu, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with navigating spreading-out graphs. CoRR abs/1707.00143. (2017b) arXiv:1707.00143
Gong, Y., Lazebnik, S., Gordo, A., et al.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12), 2916–2929 (2013). https://doi.org/10.1109/TPAMI.2012.193
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, SIGMOD ’84, p 47–57, (1984) https://doi.org/10.1145/602259.602266,
Ha, J.: crow: Crow is very fast and easy to use C++ micro web framework). (2018) https://github.com/ipkn/crow
Harwood, B., Drummond, T.: Fanng: Fast approximate nearest neighbour graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5713–5722 (2016)
Hurson, A., Miller, L., Pakzad, S., et al.: Parallel architectures for database systems. Advances in Computers, vol 28. Elsevier, p 107 – 151, (1989) https://doi.org/10.1016/S0065-2458(08)60047-9, http://www.sciencedirect.com/science/article/pii/S0065245808600479
Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM International Conference on Multimedia. ACM, New York, NY, USA, MM ’14, pp 675–678, (2014) https://doi.org/10.1145/2647868.2654889,
Jun, S.W., Liu, M., Lee, S., et al.: Bluedbm: An appliance for big data analytics. In: Proceedings of the 42Nd Annual International Symposium on Computer Architecture. ACM, New York, NY, USA, ISCA ’15, pp 1–13, (2015) https://doi.org/10.1145/2749469.2750412,
Jun, S.W., Wright, A., Zhang, S., et al.: Grafboost: Using accelerated flash storage for external graph analytics. In: Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, USA, ISCA ’18, pp 411–424, (2018) https://doi.org/10.1109/ISCA.2018.00042,
Kang, Y., Kee, Y.S., Miller, E.L., et al.: Enabling cost-effective data processing with smart ssd. 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST) pp 1–12 (2013) ftp://ftp.cse.ucsc.edu/pub/darrell/kang-msst13.pdf
Katayama, N., Satoh, S.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, SIGMOD ’97, p 369–380, (1997) https://doi.org/10.1145/253260.253347,
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Kwak, J., Lee, S., Park, K., et al.: Cosmos+ openssd: Rapid prototype for flash storage systems. ACM Trans Storage 16(3) (2020) https://doi.org/10.1145/3385073,
Lee, G., Shin, S., Song, W., et al.: Asynchronous I/O stack: A low-latency kernel I/O stack for Ultra-Low latency SSDs. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, pp 603–616, (2019) https://www.usenix.org/conference/atc19/presentation/lee-gyusun
Leilich, H.O., Stiege, G., Zeidler, H.C.: A search processor for data base management systems. In: Proceedings of the Fourth International Conference on Very Large Data Bases - Volume 4. VLDB Endowment, VLDB ’78, pp 280–287, (1978) http://dl.acm.org/citation.cfm?id=1286643.1286682
Li, W., Zhang, Y., Sun, Y., et al.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455. (2016a) http://arxiv.org/abs/1610.02455, arXiv:1610.02455
Li, W.J., Wang, S., Kang, W.C.: Feature learning based deep supervised hashing with pairwise labels. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, IJCAI’16, pp 1711–1717, (2016b) http://dl.acm.org/citation.cfm?id=3060832.3060860
Liang, S., Wang, Y., Lu, Y., et al.: Cognitive SSD: A deep learning engine for In-Storage data retrieval. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, pp 395–410, (2019) https://www.usenix.org/conference/atc19/presentation/liang
Lin, C.S., Smith, D.C.P., Smith, J.M.: The design of a rotating associative memory for relational database applications. ACM Trans Database Syst 1(1), 53–65 (1976). https://doi.org/10.1145/320434.320447
Lin, K., Yang, H., Hsiao, J., et al.: Deep learning of binary hash codes for fast image retrieval. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 27–35, (2015) http://ieeexplore.ieee.org/document/7301269/
Liong, V.E., Lu, J., Wang, G., et al.: Deep hashing for compact binary codes learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2475–2483, (2015) http://ieeexplore.ieee.org/document/7298862/
Liu, G., Xu, J., Wang, C., et al.: A performance comparison of http servers in a 10g/40g network. In: Proceedings of the 2018 International Conference on Big Data and Computing. Association for Computing Machinery, New York, NY, USA, ICBDC ’18, p 115–118, (2018) https://doi.org/10.1145/3220199.3220216,
Liu, H., Wang, R., Shan, S., et al.: Deep supervised hashing for fast image retrieval. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2064–2072, (2016) http://ieeexplore.ieee.org/document/7780596/
Mailthody, V.S., Qureshi, Z., Liang, W., et al.: Deepstore: In-storage acceleration for intelligent queries. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, USA, MICRO ’52, p 224–238, (2019) https://doi.org/10.1145/3352460.3358320,
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42(4), 824–836 (2018)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340):2 (2009)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3), 145–175 (2001). https://doi.org/10.1023/A:1011139631724
Ouyang, J., Lin, S., Hou, Z., et al.: Active ssd design for energy-efficiency improvement of web-scale data analysis. In: Proceedings of the 2013 International Symposium on Low Power Electronics and Design. IEEE Press, Piscataway, NJ, USA, ISLPED ’13, pp 286–291, (2013) http://dl.acm.org/citation.cfm?id=2648668.2648739
Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large-scale data mining and multimedia. In: Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp 62–73, (1998) http://dl.acm.org/citation.cfm?id=645924.671345
Riedel, E., Faloutsos, C., Gibson, G.A., et al.: Active disks for large-scale data processing. Computer 34(6), 68–74 (2001). https://doi.org/10.1109/2.928624
Schuster, S.A., Nguyen, H.B., Ozkarahan, E.A., et al.: Rap.2 an associative processor for databases and its applications. IEEE Trans Comput 28(6):446–458 (1979) https://doi.org/10.1109/TC.1979.1675383,
Seshadri, S., Gahagan, M., Bhaskaran, S., et al.: Willow: A user-programmable ssd. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, USA, OSDI’14, pp 67–80, (2014) http://dl.acm.org/citation.cfm?id=2685048.2685055
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. (2014) http://arxiv.org/abs/1409.1556, arXiv:1409.1556
Son, Y., Song, N.Y., Han, H., et al.: A user-level file system for fast storage devices. In: Proceedings of the 2014 International Conference on Cloud and Autonomic Computing. IEEE Computer Society, Washington, DC, USA, ICCAC ’14, pp 258–264, (2014) https://doi.org/10.1109/ICCAC.2014.14
Tiwari, D., Vazhkudai, S.S., Kim, Y., et al.: Reducing data movement costs using energy efficient, active computation on ssd. In: Proceedings of the 2012 USENIX conference on power-aware computing and systems. USENIX Association, Berkeley, CA, USA, HotPower’12, pp 4–4, (2012) http://dl.acm.org/citation.cfm?id=2387869.2387873
Tiwari, D., Boboila, S., Vazhkudai, S.S., et al.: Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In: Proceedings of the 11th USENIX conference on file and storage technologies. USENIX association, Berkeley, CA, USA, FAST’13, pp 119–132, (2013) http://dl.acm.org/citation.cfm?id=2591272.2591286
Tripathy, S., Sahoo, D., Satpathy, M., et al.: Formal modeling and verification of nand flash memory supporting advanced operations. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp 313–316, (2019) 10.1109/ICCD46524.2019.00048
Tripathy, S., Sahoo, D., Satpathy, M., et al.: Fuzzy fairness controller for nvme ssds. In: Proceedings of the 34th ACM International Conference on Supercomputing. Association for Computing Machinery, New York, NY, USA, ICS ’20, (2020) https://doi.org/10.1145/3392717.3392766
Tseng, H.W., Zhao, Q., Zhou, Y., et al.: Morpheus: creating application objects efficiently for heterogeneous computing. SIGARCH Comput Archit News 44(3), 53–65 (2016). https://doi.org/10.1145/3007787.3001143
Wang, J., Shen, H.T., Song, J., et al.: Hashing for similarity search: a survey. arXiv:1408.2927 [cs] (2014) http://arxiv.org/abs/1408.2927
Wang, J., Park, D., Kee, YS., et al.: Ssd in-storage computing for list intersection. In: Proceedings of the 12th international workshop on data management on new hardware, DaMoN ’16, pp 4:1–4:7,(2016) https://doi.org/10.1145/2933349.2933353
Wang, M., Xu, X., Yue, Q., et al.: A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proc VLDB Endow 14(11):1964–1978 (2021) https://doi.org/10.14778/3476249.3476255
Wei, C., Wu, B., Wang, S., et al.: Analyticdb-v: a hybrid analytical engine towards query fusion for structured and unstructured data. Proc VLDB Endowment 13(12), 3152–3165 (2020)
Woods, L., István, Z., Alonso, G.: Ibex: An intelligent storage engine with support for advanced sql offloading. Proc VLDB Endow 7(11):963–974 (2014) https://doi.org/10.14778/2732967.2732972
Yang, H.F., Lin, K., Chen, C.S.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2), 437–451 (2018). https://doi.org/10.1109/TPAMI.2017.2666812
Zhang, J., Kwon, M., Gouk, D., et al.: Flashshare: Punching through server storage stack from kernel to firmware for ultra-low latency ssds. In: Proceedings of the 12th USENIX conference on operating systems design and implementation. USENIX Association, Berkeley, CA, USA, OSDI’18, pp 477–492,(2018) http://dl.acm.org/citation.cfm?id=3291168.3291203
Zhao, F., Huang, Y., Wang, L., et al.: Deep semantic ranking based hashing for multi-label image retrieval. CoRR abs/1501.06272.(2015) arXiv:1501.06272
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. CoRR abs/1608.01807 (2016) http://arxiv.org/abs/1608.01807, arXiv:1608.01807
Acknowledgements
We thank the professor Jiafeng Guo of the CAS key lab of network data science and technology for his supports and suggestions. This paper is supported in part by the National Key Research and Development Program of China under grant 2018YFA0701502, and in part by the National Natural Science Foundation of China (NSFC) under grant No.(62090024, U20A20202, 61876173) and YESS hip program No.YESS2016qnrc001.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This paper is submitted for possible publication in the Special Issue on Intelligent Storage and Edge Computing.
Rights and permissions
About this article
Cite this article
Liang, S., Wang, Y., Li, H. et al. Cognitive SSD+: a deep learning engine for energy-efficient unstructured data retrieval. CCF Trans. HPC 4, 302–320 (2022). https://doi.org/10.1007/s42514-022-00103-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-022-00103-1