Skip to main content

Advertisement

Log in

Cognitive SSD+: a deep learning engine for energy-efficient unstructured data retrieval

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

In state-of-the-art large-scale data service systems, a data analysis request (e.g., data retrieval) must go through multiple data processing modules cross the I/O stack, and move a large quantity of irrelevant data across the secondary storage, DRAM, and eventually to the on-chip cache. It contributes to tediously long response latency and rising energy consumption in data storage systems. To address this issue, we proposed a Cognitive SSD+ system and used it to build a deep learning-based unstructured data retrieval engine. In the proposed cognitive SSD+, a flash-accessing accelerator, DHS-x, is placed by the side of flash devices to achieve near-data deep learning and hybrid data search (DHS). Such functions of in-SSD deep learning and data search are exposed to the users as library APIs via NVMe command extension, so that it allows the flexible customization of Cognitive SSD+ for different dataset and application scenarios. Unlike the Cognitive SSD that only supports graph search, Cognitive SSD+ integrates a hybrid data search engine to support brute force, kd-tree, and graph search simultaneously. Meanwhile, an auto-selection model is proposed to pick the most appropriate search algorithm according to the inherent characteristics of the dataset to be retrieved. Experimental results on the FPGA-based prototype reveal that the cognitive SSD+ running the proposed DHS framework achieves performance speedup of 3.48X in comparison with the counterpart based on conventional CPU and storage system, and it reduces the overall system energy consumption by up to 4.89X and 1.77X respectively when compared to CPU and GPU based solutions that deliver comparable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  • Acharya, A., Uysal, M., Saltz, J.: Active disks: programming model, algorithms and evaluation. SIGPLAN Not 33(11), 81–91 (1998). https://doi.org/10.1145/291006.291026

    Article  Google Scholar 

  • Andersen, D.G., Franklin, J., Kaminsky, M., et al.: Fawn: A fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles. ACM, New York, NY, USA, SOSP ’09, pp 1–14,(2009) https://doi.org/10.1145/1629575.1629577,

  • Balasubramonian, R., Chang, J., Manning, T., et al.: Near-data processing: Insights from a micro-46 workshop. IEEE Micro 34(4), 36–42 (2014)https://doi.org/10.1109/MM.2014.55, https://ieeexplore.ieee.org/document/6871738

  • Boboila, S., Kim, Y., Vazhkudai, S.S., et al.: Active flash: Out-of-core data analytics on flash storage. In: 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–12, (2012) https://doi.org/10.1109/MSST.2012.6232366, https://ieeexplore.ieee.org/document/6232366

  • Caulfield, A.M., De, A., Coburn, J., et al.: Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, Washington, DC, USA, MICRO ’43, pp 385–395, (2010) https://doi.org/10.1109/MICRO.2010.33,

  • Chen, T., Du, Z., Sun, N., et al.: Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, ASPLOS ’14, pp 269–284, (2014) https://doi.org/10.1145/2541940.2541967,

  • Chen, Y.H., Emer, J., Sze, V.: Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, USA, ISCA ’16, pp 367–379, (2016) https://doi.org/10.1109/ISCA.2016.40,

  • Cheong, W., Yoon, C., Woo, S., et al.: A flash memory controller for 15s ultra-low-latency ssd using high-speed 3d nand flash with 3s read time. In: 2018 IEEE International Solid - State Circuits Conference - (ISSCC), pp 338–340, (2018) https://doi.org/10.1109/ISSCC.2018.8310322

  • Cho, S., Park, C., Oh, H., et al.: Active disk meets flash: A case for intelligent ssds. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ACM, New York, NY, USA, ICS ’13, pp 91–102,(2013) https://doi.org/10.1145/2464996.2465003,

  • Choe, H., Lee, S., Park, S., et al.: Near-data processing for machine learning. CoRR abs/1610.02273. (2016) arXiv:1610.02273

  • De, A., Gokhale, M., Gupta, R., et al.: Minerva: Accelerating data analysis in next-generation ssds. In: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE Computer Society, Washington, DC, USA, FCCM ’13, pp 9–16,(2013) https://doi.org/10.1109/FCCM.2013.46,

  • Do, J., Kee, Y.S., Patel, J.M., et al.: Query processing on smart ssds: Opportunities and challenges. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, SIGMOD ’13, pp 1221–1230, (2013) https://doi.org/10.1145/2463676.2465295,

  • Friedman, J., Baskett, F., Shustek, L.: An algorithm for finding nearest neighbors. IEEE Transactions on Computers C-24(10):1000–1006. (1975) https://doi.org/10.1109/T-C.1975.224110

  • Fu, C., Cai, D.: EFANNA : An extremely fast approximate nearest neighbor search algorithm based on knn graph. CoRR abs/1609.07228 (2016a) , arXiv:1609.07228

  • Fu, C., Cai, D.: Efanna: An extremely fast approximate nearest neighbor search algorithm based on knn graph (2016b) arXiv preprint arXiv:1609.07228

  • Fu, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with navigating spreading-out graphs. CoRR abs/1707.00143. (2017a) arXiv:1707.00143

  • Fu, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with navigating spreading-out graphs. CoRR abs/1707.00143. (2017b) arXiv:1707.00143

  • Gong, Y., Lazebnik, S., Gordo, A., et al.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12), 2916–2929 (2013). https://doi.org/10.1109/TPAMI.2012.193

    Article  Google Scholar 

  • Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, SIGMOD ’84, p 47–57, (1984) https://doi.org/10.1145/602259.602266,

  • Ha, J.: crow: Crow is very fast and easy to use C++ micro web framework). (2018) https://github.com/ipkn/crow

  • Harwood, B., Drummond, T.: Fanng: Fast approximate nearest neighbour graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5713–5722 (2016)

  • Hurson, A., Miller, L., Pakzad, S., et al.: Parallel architectures for database systems. Advances in Computers, vol 28. Elsevier, p 107 – 151, (1989) https://doi.org/10.1016/S0065-2458(08)60047-9, http://www.sciencedirect.com/science/article/pii/S0065245808600479

  • Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM International Conference on Multimedia. ACM, New York, NY, USA, MM ’14, pp 675–678, (2014) https://doi.org/10.1145/2647868.2654889,

  • Jun, S.W., Liu, M., Lee, S., et al.: Bluedbm: An appliance for big data analytics. In: Proceedings of the 42Nd Annual International Symposium on Computer Architecture. ACM, New York, NY, USA, ISCA ’15, pp 1–13, (2015) https://doi.org/10.1145/2749469.2750412,

  • Jun, S.W., Wright, A., Zhang, S., et al.: Grafboost: Using accelerated flash storage for external graph analytics. In: Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, USA, ISCA ’18, pp 411–424, (2018) https://doi.org/10.1109/ISCA.2018.00042,

  • Kang, Y., Kee, Y.S., Miller, E.L., et al.: Enabling cost-effective data processing with smart ssd. 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST) pp 1–12 (2013) ftp://ftp.cse.ucsc.edu/pub/darrell/kang-msst13.pdf

  • Katayama, N., Satoh, S.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, SIGMOD ’97, p 369–380, (1997) https://doi.org/10.1145/253260.253347,

  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  • Kwak, J., Lee, S., Park, K., et al.: Cosmos+ openssd: Rapid prototype for flash storage systems. ACM Trans Storage 16(3) (2020) https://doi.org/10.1145/3385073,

  • Lee, G., Shin, S., Song, W., et al.: Asynchronous I/O stack: A low-latency kernel I/O stack for Ultra-Low latency SSDs. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, pp 603–616, (2019) https://www.usenix.org/conference/atc19/presentation/lee-gyusun

  • Leilich, H.O., Stiege, G., Zeidler, H.C.: A search processor for data base management systems. In: Proceedings of the Fourth International Conference on Very Large Data Bases - Volume 4. VLDB Endowment, VLDB ’78, pp 280–287, (1978) http://dl.acm.org/citation.cfm?id=1286643.1286682

  • Li, W., Zhang, Y., Sun, Y., et al.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455. (2016a) http://arxiv.org/abs/1610.02455, arXiv:1610.02455

  • Li, W.J., Wang, S., Kang, W.C.: Feature learning based deep supervised hashing with pairwise labels. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, IJCAI’16, pp 1711–1717, (2016b) http://dl.acm.org/citation.cfm?id=3060832.3060860

  • Liang, S., Wang, Y., Lu, Y., et al.: Cognitive SSD: A deep learning engine for In-Storage data retrieval. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, pp 395–410, (2019) https://www.usenix.org/conference/atc19/presentation/liang

  • Lin, C.S., Smith, D.C.P., Smith, J.M.: The design of a rotating associative memory for relational database applications. ACM Trans Database Syst 1(1), 53–65 (1976). https://doi.org/10.1145/320434.320447

    Article  Google Scholar 

  • Lin, K., Yang, H., Hsiao, J., et al.: Deep learning of binary hash codes for fast image retrieval. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 27–35, (2015) http://ieeexplore.ieee.org/document/7301269/

  • Liong, V.E., Lu, J., Wang, G., et al.: Deep hashing for compact binary codes learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2475–2483, (2015) http://ieeexplore.ieee.org/document/7298862/

  • Liu, G., Xu, J., Wang, C., et al.: A performance comparison of http servers in a 10g/40g network. In: Proceedings of the 2018 International Conference on Big Data and Computing. Association for Computing Machinery, New York, NY, USA, ICBDC ’18, p 115–118, (2018) https://doi.org/10.1145/3220199.3220216,

  • Liu, H., Wang, R., Shan, S., et al.: Deep supervised hashing for fast image retrieval. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2064–2072, (2016) http://ieeexplore.ieee.org/document/7780596/

  • Mailthody, V.S., Qureshi, Z., Liang, W., et al.: Deepstore: In-storage acceleration for intelligent queries. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, USA, MICRO ’52, p 224–238, (2019) https://doi.org/10.1145/3352460.3358320,

  • Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42(4), 824–836 (2018)

    Article  Google Scholar 

  • Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340):2 (2009)

  • Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3), 145–175 (2001). https://doi.org/10.1023/A:1011139631724

    Article  MATH  Google Scholar 

  • Ouyang, J., Lin, S., Hou, Z., et al.: Active ssd design for energy-efficiency improvement of web-scale data analysis. In: Proceedings of the 2013 International Symposium on Low Power Electronics and Design. IEEE Press, Piscataway, NJ, USA, ISLPED ’13, pp 286–291, (2013) http://dl.acm.org/citation.cfm?id=2648668.2648739

  • Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large-scale data mining and multimedia. In: Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp 62–73, (1998) http://dl.acm.org/citation.cfm?id=645924.671345

  • Riedel, E., Faloutsos, C., Gibson, G.A., et al.: Active disks for large-scale data processing. Computer 34(6), 68–74 (2001). https://doi.org/10.1109/2.928624

    Article  Google Scholar 

  • Schuster, S.A., Nguyen, H.B., Ozkarahan, E.A., et al.: Rap.2 an associative processor for databases and its applications. IEEE Trans Comput 28(6):446–458 (1979) https://doi.org/10.1109/TC.1979.1675383,

  • Seshadri, S., Gahagan, M., Bhaskaran, S., et al.: Willow: A user-programmable ssd. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, USA, OSDI’14, pp 67–80, (2014) http://dl.acm.org/citation.cfm?id=2685048.2685055

  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. (2014) http://arxiv.org/abs/1409.1556, arXiv:1409.1556

  • Son, Y., Song, N.Y., Han, H., et al.: A user-level file system for fast storage devices. In: Proceedings of the 2014 International Conference on Cloud and Autonomic Computing. IEEE Computer Society, Washington, DC, USA, ICCAC ’14, pp 258–264, (2014) https://doi.org/10.1109/ICCAC.2014.14

  • Tiwari, D., Vazhkudai, S.S., Kim, Y., et al.: Reducing data movement costs using energy efficient, active computation on ssd. In: Proceedings of the 2012 USENIX conference on power-aware computing and systems. USENIX Association, Berkeley, CA, USA, HotPower’12, pp 4–4, (2012) http://dl.acm.org/citation.cfm?id=2387869.2387873

  • Tiwari, D., Boboila, S., Vazhkudai, S.S., et al.: Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In: Proceedings of the 11th USENIX conference on file and storage technologies. USENIX association, Berkeley, CA, USA, FAST’13, pp 119–132, (2013) http://dl.acm.org/citation.cfm?id=2591272.2591286

  • Tripathy, S., Sahoo, D., Satpathy, M., et al.: Formal modeling and verification of nand flash memory supporting advanced operations. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp 313–316, (2019) 10.1109/ICCD46524.2019.00048

  • Tripathy, S., Sahoo, D., Satpathy, M., et al.: Fuzzy fairness controller for nvme ssds. In: Proceedings of the 34th ACM International Conference on Supercomputing. Association for Computing Machinery, New York, NY, USA, ICS ’20, (2020) https://doi.org/10.1145/3392717.3392766

  • Tseng, H.W., Zhao, Q., Zhou, Y., et al.: Morpheus: creating application objects efficiently for heterogeneous computing. SIGARCH Comput Archit News 44(3), 53–65 (2016). https://doi.org/10.1145/3007787.3001143

    Article  Google Scholar 

  • Wang, J., Shen, H.T., Song, J., et al.: Hashing for similarity search: a survey. arXiv:1408.2927 [cs] (2014) http://arxiv.org/abs/1408.2927

  • Wang, J., Park, D., Kee, YS., et al.: Ssd in-storage computing for list intersection. In: Proceedings of the 12th international workshop on data management on new hardware, DaMoN ’16, pp 4:1–4:7,(2016) https://doi.org/10.1145/2933349.2933353

  • Wang, M., Xu, X., Yue, Q., et al.: A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proc VLDB Endow 14(11):1964–1978 (2021) https://doi.org/10.14778/3476249.3476255

  • Wei, C., Wu, B., Wang, S., et al.: Analyticdb-v: a hybrid analytical engine towards query fusion for structured and unstructured data. Proc VLDB Endowment 13(12), 3152–3165 (2020)

    Article  Google Scholar 

  • Woods, L., István, Z., Alonso, G.: Ibex: An intelligent storage engine with support for advanced sql offloading. Proc VLDB Endow 7(11):963–974 (2014) https://doi.org/10.14778/2732967.2732972

  • Yang, H.F., Lin, K., Chen, C.S.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2), 437–451 (2018). https://doi.org/10.1109/TPAMI.2017.2666812

    Article  Google Scholar 

  • Zhang, J., Kwon, M., Gouk, D., et al.: Flashshare: Punching through server storage stack from kernel to firmware for ultra-low latency ssds. In: Proceedings of the 12th USENIX conference on operating systems design and implementation. USENIX Association, Berkeley, CA, USA, OSDI’18, pp 477–492,(2018) http://dl.acm.org/citation.cfm?id=3291168.3291203

  • Zhao, F., Huang, Y., Wang, L., et al.: Deep semantic ranking based hashing for multi-label image retrieval. CoRR abs/1501.06272.(2015) arXiv:1501.06272

  • Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. CoRR abs/1608.01807 (2016) http://arxiv.org/abs/1608.01807, arXiv:1608.01807

Download references

Acknowledgements

We thank the professor Jiafeng Guo of the CAS key lab of network data science and technology for his supports and suggestions. This paper is supported in part by the National Key Research and Development Program of China under grant 2018YFA0701502, and in part by the National Natural Science Foundation of China (NSFC) under grant No.(62090024, U20A20202, 61876173) and YESS hip program No.YESS2016qnrc001.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ying Wang or Huawei Li.

Additional information

This paper is submitted for possible publication in the Special Issue on Intelligent Storage and Edge Computing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, S., Wang, Y., Li, H. et al. Cognitive SSD+: a deep learning engine for energy-efficient unstructured data retrieval. CCF Trans. HPC 4, 302–320 (2022). https://doi.org/10.1007/s42514-022-00103-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-022-00103-1

Keywords

Navigation