Skip to main content
Log in

A survey on AI for storage

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Storage, as a core function and fundamental component of computers, provides services for saving and reading digital data. The increasing complexity of data operations and storage architectures is challenging the performance and reliability of storage services. Artificial intelligence (AI) advancements in the field of intelligent algorithms show significant promise for resolving storage issues. A hot topic of current studies is how to marry AI and storage. In this paper, we present a comprehensive survey of “AI for Storage” and categorize storage research employing intelligent algorithms according to public literature in recent years into Architecture-oriented, Data-specific, and Operation & Maintenance. Based on this classification, we fine-categorize all of the studies by application environment and elaborate on their development history in order to provide guidelines for future research on how to employ AI technologies based on this history. Finally, we present a discussion and future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://scholar.google.com/.

  2. https://dblp.org/.

  3. https://www.gartner.com/en/information-technology/glossary/aiops-artificial-intelligence-operations.

  4. http://idsm.wnlo.hust.edu.cn/index.htm.

References

  • C., C.A.R., Pâris, J., Vilalta, R., Cheng, A.M.K., Long, D.D.E.: Disk failure prediction in heterogeneous environments. In: International Symposium on Performance Evaluation of Computer and Telecommunication Systems, SPECTS, pp. 1–7. IEEE, Seattle, WA, USA (2017)

  • Abu-Libdeh, H., Altinbüken, D., Beutel, A., Chi, E.H., Doshi, L., Kraska, T., Li, X., Ly, A., Olston, C.: Learned indexes for a google-scale disk-based database. CoRR abs/2012.12501 (2020)

  • Agarwal, V., Bhattacharyya, C., Niranjan, T., Susarla, S.: Discovering rules from disk events for predicting hard drive failures. In: International Conference on Machine Learning and Applications, ICMLA, pp. 782–786. IEEE Computer Society, Miami Beach (2009)

    Google Scholar 

  • Aken, D.V., Pavlo, A., Gordon, G.J., Zhang, B.: Automatic database management system tuning through large-scale machine learning. In: International Conference on Management of Data, SIGMOD, pp. 1009–1024. ACM, Chicago (2017)

    Google Scholar 

  • Alter, J., Xue, J., Dimnaku, A., Smirni, E.: SSD failures in the field: symptoms, causes, and prediction models. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 75–17514. ACM, Denver (2019)

    Google Scholar 

  • Anantharaman, P., Qiao, M., Jadav, D.: Large scale predictive analytics for hard disk remaining useful life estimation. In: International Congress on Big Data, BigData Congress, pp. 251–254. IEEE Computer Society, San Francisco (2018)

    Google Scholar 

  • Arzani, B., Ciraci, S., Loo, B.T., Schuster, A., Outhred, G.: Taking the blame game out of data centers operations with netpoirot. In: SIGCOMM, pp. 440–453. ACM, Florianopolis, Brazil (2016)

  • Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., Chabridon, S.: Predictive models of hard drive failures based on operational data. In: International Conference on Machine Learning and Applications, pp. 619–625. IEEE, Cancun, Mexico (2017)

  • Bagbaba, A.: Improving collective I/O performance with machine learning supported auto-tuning. In: International Parallel and Distributed Processing Symposium Workshops, IPDPSW, pp. 814–821. IEEE, New Orleans (2020)

    Google Scholar 

  • Basak, S., Sengupta, S., Dubey, A.: Mechanisms for integrated feature normalization and remaining useful life estimation using lstms applied to hard-disks. In: International Conference on Smart Computing, SMARTCOMP, pp. 208–216. IEEE, Washington (2019)

    Google Scholar 

  • Baseman, E., DeBardeleben, N., Ferreira, K.B., Levy, S., Raasch, S., Sridharan, V., Siddiqua, T., Guan, Q.: Improving DRAM fault characterization through machine learning. In: 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, DSN Workshops, pp. 250–253. IEEE Computer Society, Toulouse, France (2016)

    Google Scholar 

  • Behzad, B., Luu, H.V.T., Huchette, J., Byna, S.: Prabhat, Aydt, R.A., Koziol, Q., Snir, M.: Taming parallel I/O complexity with auto-tuning. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 68–16812. ACM, Denver (2013)

    Google Scholar 

  • Behzad, B., Byna, S., Wild, S.M.: Prabhat, Snir, M.: Dynamic model-driven parallel I/O performance tuning. In: International Conference on Cluster Computing, CLUSTER, pp. 184–193. IEEE Computer Society, Chicago (2015)

    Google Scholar 

  • Bei, Z., Yu, Z., Zhang, H., Xiong, W., Xu, C., Eeckhout, L., Feng, S.: RFHOC: A random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2016)

    Article  Google Scholar 

  • Berger, D.S.: Towards lightweight and robust machine learning for CDN caching. In: Workshop on Hot Topics in Networks, HotNets, pp. 134–140. ACM, Redmond (2018)

    Chapter  Google Scholar 

  • Berger, D.S., Sitaraman, R.K., Harchol-Balter, M.: Adaptsize: Orchestrating the hot object memory cache in a content delivery network. In: Symposium on Networked Systems Design and Implementation, NSDI, pp. 483–498. USENIX Association, Boston (2017)

    Google Scholar 

  • Beutel, A., Kraska, T., Chi, E., Dean, J., Polyzotis, N.: A machine learning approach to databases indexes. In: ML Systems Workshop, Annual Conference on Neural Information Processing Systems, NIPS, Long Beach, CA, USA (2017)

  • Bhatia, E., Chacon, G., Pugsley, S.H., Teran, E., Gratz, P.V., Jiménez, D.A.: Perceptron-based prefetch filtering. In: International Symposium on Computer Architecture, ISCA, pp. 1–13. ACM, Phoenix (2019)

    Google Scholar 

  • Boixaderas, I., Zivanovic, D., Moré, S., Bartolome, J., Vicente, D., Casas, M., Carpenter, P.M., Radojkovic, P., Ayguadé, E.: Cost-aware prediction of uncorrected DRAM errors in the field. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, p. 61. IEEE/ACM, Virtual Event / Atlanta, Georgia, USA (2020)

  • Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: International Conference on Knowledge Discovery and Data Mining, SIGKDD, pp. 39–48. ACM, San Francisco (2016)

    Google Scholar 

  • Braam, P.: The lustre storage architecture. CoRR abs/1903.01955 (2019)

  • Braun, P., Litz, H.: Understanding memory access patterns for prefetching. In: International Workshop on AI-assisted Design for Architecture (AIDArc), Held in Conjunction with ISCA, Phoenix, AZ, USA (2019)

  • Bux, W., Iliadis, I.: Performance of greedy garbage collection in flash-based solid-state drives. Perform. Eval. 67(11), 1172–1186 (2010)

    Article  Google Scholar 

  • Cai, Z., Li, W., Zhu, W., Liu, L., Yang, B.: A real-time trace-level root-cause diagnosis system in alibaba datacenters. IEEE Access 7, 142692–142702 (2019)

    Article  Google Scholar 

  • Cao, Z., Tarasov, V., Tiwari, S., Zadok, E.: Towards better understanding of black-box auto-tuning: A comparative analysis for storage systems. In: 2018 USENIX Annual Technical Conference, USENIX ATC 2018, , July 11-13, 2018, pp. 893–907. USENIX Association, Boston, MA, USA (2018)

  • Cao, S., Gao, Y., Gao, X., Chen, G.: Adam: An adaptive fine-grained scheme for distributed metadata management. In: International Conference on Parallel Processing, ICPP, pp. 37–13710. ACM, Kyoto (2019)

    Google Scholar 

  • Chakrabortti, C., Litz, H.: Learning I/O access patterns to improve prefetching in ssds. In: Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track - European Conference, ECML PKDD, Lecture Notes in Computer Science, vol. 12460, pp. 427–443. Springer, Ghent (2020)

    Google Scholar 

  • Chandramouli, B., Prasaad, G., Kossmann, D., Levandoski, J.J., Hunter, J., Barnett, M.: FASTER: A concurrent key-value store with in-place updates. In: International Conference on Management of Data, SIGMOD, pp. 275–290. ACM, Houston (2018)

    Google Scholar 

  • Chaves, I.C., de Paula, M.R.P., de,: Moura Leite, L.G., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on A bayesian network. In: International Joint Conference on Neural Networks, IJCNN, pp. 1–7. IEEE, Rio de Janeiro, Brazil (2018)

  • Chaves, I.C., de Paula, M.R.P., de,: Moura Leite, L.G., Queiroz, L.P., Gomes, J.P.P., Machado, J.C.: Banhfap: A bayesian network based failure prediction approach for hard disk drives. In: Brazilian Conference on Intelligent Systems, BRACIS, pp. 427–432. IEEE Computer Society, Recife, Brazil (2016)

  • Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search and similarity joins. IEEE Trans. Knowl. Data Eng. 29(3), 556–571 (2017)

    Article  Google Scholar 

  • Cheng, P., Lu, Y., Du,: Y., Chen, Z., Liu, Y.: Optimizing data placement on hierarchical storage architecture via machine learning. In: Network and Parallel Computing, NPC, Lecture Notes in Computer Science, vol. 11783, pp. 289–302. Springer, Hohhot, China (2019)

  • Cheng, W., Zhang, K., Chen, H., Jiang, G., Chen, Z., Wang, W.: Ranking causal anomalies via temporal and dynamical analysis on vanishing correlations. In: International Conference on Knowledge Discovery and Data Mining, KDD, pp. 805–814. ACM, San Francisco (2016)

    Google Scholar 

  • Cherubini, G., Jelitto, J., Venkatesan, V.: Cognitive storage for big data. Computer 49(4), 43–51 (2016)

    Article  Google Scholar 

  • Chledowski, J., Polak, A., Szabucki, B., Zolna, K.T.: Robust learning-augmented caching: An experimental study. In: International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research, vol. 139, pp. 1920–1930. PMLR, Virtual Event (2021)

  • Cohn, D.A., Singh, S.P.: Predicting lifetimes in dynamically allocated memory. In: Advances in Neural Information Processing Systems, NIPS, pp. 939–945. MIT Press, Denver (1996)

    Google Scholar 

  • Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fontoura, M., Bianchini, R.: Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In: Symposium on Operating Systems Principles, SOSP, pp. 153–167. ACM, Shanghai (2017)

    Chapter  Google Scholar 

  • Dai, Y., Xu, Y., Ganesan, A., Alagappan, R., Kroth, B., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: From wisckey to bourbon: A learned index for log-structured merge trees. In: Symposium on Operating Systems Design and Implementation, OSDI, pp. 155–171. USENIX Association, Virtual Event (2020)

  • Davitkova, A., Milchevski, E., Michel, S.: The ml-index: A multidimensional, learned index for point, range, and nearest-neighbor queries. In: International Conference on Extending Database Technology, EDBT, pp. 407–410. OpenProceedings.org, Copenhagen, Denmark (2020)

  • Ding, J., Minhas, U.F., Yu, J., Wang, C., Do, J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D.B., Kraska, T.: ALEX: an updatable adaptive learned index. In: International Conference on Management of Data, SIGMOD, pp. 969–984. ACM, Portland (2020)

    Google Scholar 

  • dos Santos Lima, F.D.: Pereira, F.L.F., Chaves, I.C., Gomes, J.P.P., de Castro Machado, J.: Evaluation of recurrent neural networks for hard disk drives failure prediction. In: Brazilian Conference on Intelligent Systems, BRACIS, pp. 85–90. IEEE Computer Society, São Paulo (2018)

    Google Scholar 

  • Featherstun, R.W., Fulp, E.W.: Using syslog message sequences for predicting disk failures. In: Large Installation System Administration Conference, LISA. USENIX Association, San Jose, CA, USA (2010)

  • Ferragina, P., Vinciguerra, G.: The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 13(8), 1162–1175 (2020)

    Article  Google Scholar 

  • Fu, C., Cai, D.: EFANNA : An extremely fast approximate nearest neighbor search algorithm based on knn graph. CoRR abs/1609.07228 (2016) 1609.07228

  • Gan, Y., Zhang, Y., Hu, K., Cheng, D., He, Y., Pancholi, M., Delimitrou, C.: Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pp. 19–33. ACM, Providence (2019)

    Google Scholar 

  • Ganguly, S., Consul, A., Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms: Big data model for failure management in datacenters. In: International Conference on Big Data Computing Service and Applications, pp. 105–116. IEEE Computer Society, Oxford, United Kingdom (2016)

  • Gao, J., Yaseen, N., MacDavid, R., Frujeri, F.V., Liu, V., Bianchini, R., Aditya, R., Wang, X., Lee, H., Maltz, D.A., Yu, M., Arzani, B.: Scouts: Improving the diagnosis process through domain-customized incident routing. In: Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM, pp. 253–269. ACM, Virtual Event, USA (2020)

  • Gao, X., Zha, S., Li, X., Yan, B., Jing, X., Li, J., Xu, J.: Incremental prediction model of disk failures based on the density metric of edge samples. IEEE Access 7, 114285–114296 (2019)

    Article  Google Scholar 

  • Gao, Y., Gao, X., Chen, G.: Deephash: An end-to-end learning approach for metadata management in distributed file systems. In: International Conference on Parallel Processing, ICPP, pp. 36–13610. ACM, Kyoto (2019)

    Google Scholar 

  • Gheisari, M., Movassagh, A.A., Qin, Y., Yong, J., Tao, X., Zhang, J., Shen, H.: NSSSD: A new semantic hierarchical storage for sensor data. In: International Conference on Computer Supported Cooperative Work in Design, CSCWD, pp. 174–179. IEEE, Nanchang (2016)

    Google Scholar 

  • Giurgiu, I., Szabó, J., Wiesmann, D., Bird, J.: Predicting DRAM reliability in the field with machine learning. In: Zhu, X., Roy, I. (eds.) Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference: Industrial Track, pp. 15–21. ACM, Las Vegas, NV, USA (2017)

  • Guan, Y., Zhang, X., Guo, Z.: CACA: learning-based content-aware cache admission for video content in edge caching. In: International Conference on Multimedia, MM, pp. 456–464. ACM, Nice (2019)

    Google Scholar 

  • Hadian, A., Heinis, T.: Shift-table: A low-latency learned index for range queries using model correction. In: International Conference on Extending Database Technology, EDBT, pp. 253–264. OpenProceedings.org, Nicosia, Cyprus (2021)

  • Hadian, A., Heinis, T.: Considerations for handling updates in learned index structures. In: International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD, pp. 3–134. ACM, Amsterdam (2019)

    Google Scholar 

  • Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: International Conference on Machine Learning (ICML, pp. 202–209. Morgan Kaufmann, Williams College, Williamstown, MA, USA (2001)

  • Hashemi, M., Swersky, K., Smith, J.A., Ayers, G., Litz, H., Chang, J., Kozyrakis, C., Ranganathan, P.: Learning memory access patterns. In: International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research, vol. 80, pp. 1924–1933. PMLR, Stockholmsmässan (2018)

    Google Scholar 

  • Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)

    Article  Google Scholar 

  • Higuchi, S., Takemasa, J., Koizumi, Y., Tagami, A., Hasegawa, T.: Feasibility of longest prefix matching using learned index structures. SIGMETRICS Perform. Eval. Rev. 48(4), 45–48 (2021)

    Article  Google Scholar 

  • Hu, G., Shao, J., Zhang, D., Yang, Y., Shen, H.T.: Preserving-ignoring transformation based index for approximate k nearest neighbor search. In: International Conference on Data Engineering, ICDE, pp. 91–94. IEEE Computer Society, San Diego (2017)

    Google Scholar 

  • Hua, Y., Jiang, H., Zhu, Y., Feng, D., Tian, L.: Smartstore: a new metadata organization paradigm with semantic-awareness for next-generation file systems. In: Conference on High Performance Computing, SC. ACM, Portland, Oregon, USA (2009)

  • Hua, Y., Jiang, H., Feng, D.: FAST: near real-time searchable data analytics for the cloud. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 754–765. IEEE Computer Society, New Orleans (2014)

    Chapter  Google Scholar 

  • Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: An early experience. In: International Symposium on Workload Characterization, IISWC, pp. 150–159. IEEE Computer Society, Atlanta (2015)

    Google Scholar 

  • Indyk, P., Motwani, R., Raghavan, P., Vempala, S.S.: Locality-preserving hashing in multidimensional spaces. In: Symposium on the Theory of Computing, STOC, pp. 618–625. ACM, El Paso (1997)

    Google Scholar 

  • Jain, R., Panda, P.R., Subramoney, S.: A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioning. In: Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 800–805. IEEE, Lausanne (2017)

    Google Scholar 

  • Ji, X., Ma, Y., Ma, R., Li, P., Ma, J., Wang, G., Liu, X., Li, Z.: A proactive fault tolerance scheme for large scale storage systems. In: International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP, Lecture Notes in Computer Science, vol. 9530, pp. 337–350. Springer, Zhangjiajie (2015)

    Google Scholar 

  • Jiang, T., Zeng, J., Zhou, K., Huang, P., Yang, T.: Lifelong disk failure prediction via gan-based anomaly detection. In: International Conference on Computer Design, ICCD, pp. 199–207. IEEE, Abu Dhabi (2019)

    Google Scholar 

  • Jiang, T., Huang, P., Zhou, K.: Scrub unleveling: Achieving high data reliability at low scrubbing cost. In: Teich, J., Fummi, F. (eds.) Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 1403–1408. IEEE, Florence (2019)

    Google Scholar 

  • Jiménez, D.A., Teran, E.: Multiperspective reuse prediction. In: International Symposium on Microarchitecture, MICRO, pp. 436–448. ACM, Cambridge (2017)

    Google Scholar 

  • Jin, X., Agun, D., Yang, T., Wu, Q., Shen, Y., Zhao, S.: Hybrid indexing for versioned document search with cluster-based retrieval. In: International Conference on Information and Knowledge Management, CIKM, pp. 377–386. ACM, Indianapolis (2016)

    Google Scholar 

  • Kang, W., Yoo, S.: Dynamic management of key states for reinforcement learning-assisted garbage collection to reduce long tail latency in SSD. In: Design Automation Conference, DAC, pp. 8–186. ACM, San Francisco (2018)

    Google Scholar 

  • Kang, W., Yoo, S.: \(q\) -value prediction for reinforcement learning assisted garbage collection to reduce long tail latency in SSD. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst 39(10), 2240–2253 (2020)

    Article  Google Scholar 

  • Kang, W., Shin, D., Yoo, S.: Reinforcement learning-assisted garbage collection to mitigate long-tail latency in SSD. ACM Trans. Embed. Comput. Syst. 16(5s), 134–113420 (2017)

    Article  Google Scholar 

  • Kim, M., Lee, S.: Reducing tail latency of dnn-based recommender systems using in-storage processing. In: SIGOPS Asia-Pacific Workshop on Systems, pp. 90–97. ACM, Tsukuba, Japan (2020)

  • Kim, J.: An ftl-aware host system alleviating severe long latency of NAND flash-based storage. In: International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA, pp. 189–194. IEEE, Houston (2021)

    Google Scholar 

  • Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture. In: International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS, pp. 93–104. ACM, Pittsburgh (2013)

    Google Scholar 

  • Kim, Y., More, A., Shriver, E., Rosing, T.: Application performance prediction and optimization under cache allocation technology. In: Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 1285–1288. IEEE, Florence (2019)

    Google Scholar 

  • Kipf, A., Marcus, R., van Renen, A., Stoian, M., Kemper, A., Kraska, T., Neumann, T.: Radixspline: a single-pass learned index. In: Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD, pp. 5–155. ACM, Portland (2020)

    Google Scholar 

  • Kirilin, V., Sundarrajan, A., Gorinsky, S., Sitaraman, R.K.: Rl-cache: Learning-based cache admission for content delivery. In: Proceedings of the 2019 Workshop on Network Meets AI & ML, NetAI@SIGCOMM 2019, pp. 57–63. ACM, Beijing, China (2019)

  • Kirilin, V., Sundarrajan, A., Gorinsky, S., Sitaraman, R.K.: Rl-cache: Learning-based cache admission for content delivery. IEEE J. Sel. Areas Commun. 38(10), 2372–2385 (2020)

    Article  Google Scholar 

  • Klein, K., Kriege, N.M., Mutzel, P.: Ct-index: Fingerprint-based graph indexing combining cycles and trees. In: International Conference on Data Engineering, ICDE, pp. 1115–1126. IEEE Computer Society, Hannover (2011)

    Google Scholar 

  • Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Kristo, A., Leclerc, G., Madden, S., Mao, H., Nathan, V.: Sagedb: A learned database system. In: Biennial Conference on Innovative Data Systems Research, CIDR. www.cidrdb.org, Asilomar, CA, USA (2019)

  • Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: International Conference on Management of Data, SIGMOD, pp. 489–504. ACM, Houston (2018)

    Google Scholar 

  • Leuoth, S., Benn, W.: A self-adaptive insert strategy for content-based multidimensional database storage. In: GI-Workshop on Foundations of Databases (Grundlagen Von Datenbanken). Preprints aus dem Institut für Informatik, vol. CS-02-09, pp. 75–79. Universität Rostock, Mecklenburg-Vorpommern, Germany (2009)

  • Leuoth, S., Benn, W.: Towards SISI - a self adaptive insert strategy for the intelligent cluster index (icix). In: Machine Learning and Data Mining in Pattern Recognition, MLDM, pp. 141–155. ibai Publishing, Leipzig, Germany (2009)

  • Li, P., Hua, Y., Zuo, P., Jia, J.: A scalable learned index scheme in storage systems. CoRR abs/1905.06256 (2019) 1905.06256

  • Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., Liu, X.: Hard drive failure prediction using classification and regression trees. In: International Conference on Dependable Systems and Networks, DSN, pp. 383–394. IEEE Computer Society, Atlanta (2014)

    Google Scholar 

  • Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Xiao, K.: Being accurate is not enough: New metrics for disk failure prediction. In: Symposium on Reliable Distributed Systems, SRDS, pp. 71–80. IEEE Computer Society, Budapest (2016)

    Google Scholar 

  • Li, Y., Chang, K., Bel, O., Miller, E.L., Long, D.D.E.: CAPES: unsupervised storage performance tuning using neural network-based deep reinforcement learning. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 42–14214. ACM, Denver (2017)

    Google Scholar 

  • Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)

    Article  Google Scholar 

  • Li, Z.L., Liang, C.M., He, W., Zhu, L., Dai, W., Jiang, J., Sun, G.: Metis: Robustly tuning tail latencies of cloud systems. In: Annual Technical Conference, USENIX ATC, pp. 981–992. USENIX Association, Boston (2018)

    Google Scholar 

  • Li, G., Zhou, X., Li, S., Gao, B.: Qtune: A query-aware database tuning system with deep reinforcement learning. Proc. VLDB Endow. 12(12), 2118–2130 (2019)

    Article  Google Scholar 

  • Li, P., Lu, H., Zheng, Q., Yang, L., Pan, G.: LISA: A learned index structure for spatial data. In: Maier, D., Pottinger, R., Doan, A., Tan, W., Alawini, A., Ngo, H.Q. (eds.) International Conference on Management of Data, SIGMOD, pp. 2119–2133. ACM, Portland (2020)

    Google Scholar 

  • Li, C., Wang, Y., Liu, C., Liang, S., Li, H., Li, X.: GLIST: towards in-storage graph learning. In: Annual Technical Conference, ATC, pp. 225–238. USENIX Association, Ho Chi Minh City (2021)

    Google Scholar 

  • Liang, S., Wang, Y., Lu, Y., Yang, Z., Li, H., Li, X.: Cognitive SSD: A deep learning engine for in-storage data retrieval. In: Annual Technical Conference, ATC, pp. 395–410. USENIX Association, Renton (2019)

    Google Scholar 

  • Lin, W., Ma, M., Pan, D., Wang, P.: Facgraph: Frequent anomaly correlation graph mining for root cause diagnose in micro-service architecture. In: International Performance Computing and Communications Conference, IPCCC, pp. 1–8. IEEE, Orlando (2018)

    Google Scholar 

  • Liu, J., Wang, R., Gao, X., Yang, X., Chen, G.: Anglecut: A ring-based hashing scheme for distributed metadata management. In: Database Systems for Advanced Applications, DASFAA, Lecture Notes in Computer Science, vol. 10177, pp. 71–86. Springer, Suzhou (2017)

    Chapter  Google Scholar 

  • Liu, Y., Song, J., Zhou, K., Yan, L., Liu, L., Zou, F., Shao, L.: Deep self-taught hashing for image retrieval. IEEE Trans. Cybern. 49(6), 2229–2241 (2019)

    Article  Google Scholar 

  • Liu, P., Chen, Y., Nie, X., Zhu, J., Zhang, S., Sui, K., Zhang, M., Pei, D.: Fluxrank: A widely-deployable framework to automatically localizing root cause machines for software service failure mitigation. In: International Symposium on Software Reliability Engineering, ISSRE, pp. 35–46. IEEE, Berlin (2019)

    Google Scholar 

  • Liu, Y., Jiang, H., Wang, Y., Zhou, K., Liu, Y., Liu, L.: Content sifting storage: Achieving fast read for large-scale image dataset analysis. In: Design Automation Conference, DAC, pp. 1–6. IEEE, San Francisco (2020)

    Google Scholar 

  • Liu, Y., Wang, Y., Song, J., Guo, C., Zhou, K., Xiao, Z.: Deep self-taught graph embedding hashing with pseudo labels for image retrieval. In: International Conference on Multimedia and Expo, ICME, pp. 1–6. IEEE, London (2020)

    Google Scholar 

  • Liu, W., Cui, J., Liu, J., Yang, L.T.: Mlcache: A space-efficient cache scheme based on reuse distance and machine learning for nvme ssds. In: International Conference On Computer Aided Design, ICCAD, pp. 58–1589. IEEE, San Diego (2020)

    Google Scholar 

  • Liu, P., Xu, H., Ouyang, Q., Jiao, R., Chen, Z., Zhang, S., Yang, J., Mo, L., Zeng, J., Xue, W., Pei, D.: Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks. In: International Symposium on Software Reliability Engineering, ISSRE, pp. 48–58. IEEE, Coimbra (2020)

    Google Scholar 

  • Lu, S., Luo, B., Patel, T., Yao, Y., Tiwari, D., Shi, W.: Making disk failure predictions smarter! In: Conference on File and Storage Technologies, FAST, pp. 151–167. USENIX Association, Santa Clara, CA, USA (2020)

  • Luaces, D., Viqueira, J.R.R., Pena, T.F., Cotos, J.M.: Leveraging bitmap indexing for subgraph searching. In: International Conference on Extending Database Technology, EDBT, pp. 49–60. OpenProceedings.org, Lisbon, Portugal (2019)

  • Luo, C., Zhao, P., Qiao, B., Wu, Y., Zhang, H., Wu, W., Lu, W., Dang, Y., Rajmohan, S., Lin, Q., Zhang, D.: NTAM: neighborhood-temporal attention model for disk failure prediction in cloud platforms. In: The Web Conference, WWW, pp. 1181–1191. ACM / IW3C2, Virtual Event / Ljubljana, Slovenia (2021)

  • Luo, C., Lou, J., Lin, Q., Fu, Q., Ding, R., Zhang, D., Wang, Z.: Correlating events with time series for incident diagnosis. In: International Conference on Knowledge Discovery and Data Mining, KDD, pp. 1583–1592. ACM, New York (2014)

    Google Scholar 

  • Luo, Q., Fang, X., Sun, Y., Ai, J., Yang, C.: Self-learning hot data prediction: Where echo state network meets NAND flash memories. IEEE Trans. Circuits Syst. I Regul. Pap. 67(I(3)), 939–950 (2020)

    Article  Google Scholar 

  • Lykouris, T., Vassilvitskii, S.: Competitive caching with machine learned advice. In: International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research, vol. 80, pp. 3302–3311. PMLR, Stockholmsmässan (2018)

    Google Scholar 

  • Ma, M., Xu, J., Wang, Y., Chen, P., Zhang, Z., Wang, P.: Automap: Diagnose your microservice-based web applications automatically. In: The Web Conference, WWW, pp. 246–258. ACM / IW3C2, Taipei, Taiwan (2020)

  • Ma, M., Zhang, S., Chen, J., Xu, J., Li, H., Lin, Y., Nie, X., Zhou, B., Wang, Y., Pei, D.: Jump-starting multivariate time series anomaly detection for online service systems. In: Annual Technical Conference, ATC, pp. 413–426. USENIX Association, Virtual Event (2021)

  • Ma, M., Lin, W., Pan, D., Wang, P.: Ms-rank: Multi-metric and self-adaptive root cause diagnosis for microservice applications. In: International Conference on Web Services, ICWS, pp. 60–67. IEEE, Milan (2019)

    Google Scholar 

  • Maas, M., Andersen, D.G., Isard, M., Javanmard, M.M., McKinley, K.S., Raffel, C.: Learning-based memory allocation for C++ server workloads. In: Architectural Support for Programming Languages and Operating Systems, ASPLOS, pp. 541–556. ACM, Lausanne (2020)

    Google Scholar 

  • Mahdisoltani, F., Stefanovici, I.A., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: Silva, D.D., Ford, B. (eds.) Annual Technical Conference, ATC, pp. 391–402. USENIX Association, Santa Clara (2017)

    Google Scholar 

  • Mailthody, V.S., Qureshi, Z., Liang, W., Feng, Z., Gonzalo, S.G.D., Li, Y., Franke, H., Xiong, J., Huang, J., Hwu, W.: Deepstore: In-storage acceleration for intelligent queries. In: International Symposium on Microarchitecture, MICRO, pp. 224–238. ACM, Columbus (2019)

    Google Scholar 

  • Marcus, R., Kipf, A., van Renen, A., Stoian, M., Misra, S., Kemper, A., Neumann, T., Kraska, T.: Benchmarking learned indexes. Proc. VLDB Endow. 14(1), 1–13 (2020)

    Article  Google Scholar 

  • Meng, Y., Zhang, S., Sun, Y., Zhang, R., Hu, Z., Zhang, Y., Jia, C., Wang, Z., Pei, D.: Localizing failure root causes in a microservice through causality inference. In: International Symposium on Quality of Service, IWQoS, pp. 1–10. IEEE, Hangzhou (2020)

    Google Scholar 

  • Mishra, M., Singhal, R.: RUSLI: real-time updatable spline learned index. In: Bordawekar, R., Amsterdamer, Y., Shmueli, O., Tatbul, N. (eds.) Workshop in Exploiting AI Techniques for Data Management, aiDM, pp. 1–8. ACM, Virtual Event, China (2021)

  • Monjalet, F., Leibovici, T.: Predicting file lifetimes with machine learning. In: High Performance Computing - ISC High Performance 2019 International Workshops, Lecture Notes in Computer Science, vol. 11887, pp. 288–299. Springer, Frankfurt (2019)

    Google Scholar 

  • Mukhanov, L., Tovletoglou, K., Vandierendonck, H., Nikolopoulos, D.S., Karakonstantis, G.: Workload-aware DRAM error prediction using machine learning. In: International Symposium on Workload Characterization, IISWC, pp. 106–118. IEEE, Orlando (2019)

    Google Scholar 

  • Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: ICANN/ICONIP (2003)

  • Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)

    MathSciNet  MATH  Google Scholar 

  • Narayanan, I., Wang, D., Jeon, M., Sharma, B., Caulfield, L., Sivasubramaniam, A., Cutler, B., Liu, J., Khessib, B.M., Vaid, K.: SSD failures in datacenters: What, when and why? In: SIGMETRICS, pp. 407–408. ACM, Antibes Juan-Les-Pins, France (2016)

  • Narayanan, A., Verma, S., Ramadan, E., Babaie, P., Zhang, Z.: Deepcache: A deep learning based framework for content caching. In: Workshop on Network Meets AI & ML, NetAI@SIGCOMM, pp. 48–53. ACM, Budapest (2018)

    Google Scholar 

  • Nathan, V., Ding, J., Alizadeh, M., Kraska, T.: Learning multi-dimensional indexes. In: International Conference on Management of Data, SIGMOD, pp. 985–1000. ACM, Portland (2020)

    Google Scholar 

  • Neubert, R., Görlitz, O., Benn, W.: Towards content-related indexing in databases. In: Datenbanksysteme in Büro, Technik und Wissenschaft (BTW), Informatik Aktuell, pp. 305–321. Springer, GI-Fachtagung (2001)

    Chapter  Google Scholar 

  • Ni, J., Cheng, W., Zhang, K., Song, D., Yan, T., Chen, H., Zhang, X.: Ranking causal anomalies by modeling local propagations on networked systems. In: International Conference on Data Mining, ICDM, pp. 1003–1008. IEEE Computer Society, New Orleans (2017)

    Google Scholar 

  • Pang, S., Jia, Y., Stones, R.J., Wang, G., Liu, X.: A combined bayesian network method for predicting drive failure times from SMART attributes. In: International Joint Conference on Neural Networks, IJCNN, pp. 4850–4856. IEEE, Vancouver (2016)

    Google Scholar 

  • Park, N., Ahmad, I., Lilja, D.J.: Romano: autonomous storage management using performance prediction in multi-tenant datacenters. In: Symposium on Cloud Computing, SOCC, p. 21. ACM, San Jose, CA, USA (2012)

  • Park, J.K., Kim, J.: A method for reducing garbage collection overhead of SSD using machine learning algorithms. In: International Conference on Information and Communication Technology Convergence, ICTC, pp. 775–777. IEEE, Jeju Island (2017)

    Google Scholar 

  • Park, S., Kim, D., Bang, K., Lee, H., Yoo, S., Chung, E.: An adaptive idle-time exploiting method for low latency NAND flash-based storage devices. IEEE Trans. Comput. 63(5), 1085–1096 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Paschos, G.S., Destounis, A., Vigneri, L., Iosifidis, G.: Learning to cache with no regrets. In: Conference on Computer Communications, INFOCOM, pp. 235–243. IEEE, Paris (2019)

    Google Scholar 

  • Peled, L., Mannor, S., Weiser, U.C., Etsion, Y.: Semantic locality and context-based prefetching using reinforcement learning. In: International Symposium on Computer Architecture, ISCA, pp. 285–297. ACM, Portland (2015)

    Google Scholar 

  • Peled, L., Weiser, U.C., Etsion, Y.: A neural network prefetcher for arbitrary memory access patterns. ACM Trans. Archit. Code Optim. 16(4), 37–13727 (2020)

    Google Scholar 

  • Pereira, F.L.F., dos,: Santos Lima, F.D., de Moura Leite, L.G., Gomes, J.P.P., de Castro Machado, J.: Transfer learning for bayesian networks with application on hard disk drives failure prediction. In: Brazilian Conference on Intelligent Systems, BRACIS, pp. 228–233. IEEE Computer Society, Uberlândia, Brazil (2017)

  • Pereira, F., Teixeira, D., Gomes, J.P., Machado, J.C.: Evaluating one-class classifiers for fault detection in hard disk drives. In: Brazilian Conference on Intelligent Systems, BRACIS, pp. 586–591. IEEE, Salvador (2019)

    Google Scholar 

  • Pham, C., Wang, L., Tak, B., Baset, S., Tang, C., Kalbarczyk, Z.T., Iyer, R.K.: Failure diagnosis for distributed systems using targeted fault injection. IEEE Trans. Parallel Distrib. Syst. 28(2), 503–516 (2017)

    Google Scholar 

  • Pitakrat, T., van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: International ACM Sigsoft Symposium on Architecting Critical Systems, ISARCS, pp. 1–10. ACM, Vancouver (2013)

    Google Scholar 

  • Poppe, O., Amuneke, T., Banda, D., De, A., Green, A., Knoertzer, M., Nosakhare, E., Rajendran, K., Shankargouda, D., Wang, M., Au, A., Curino, C., Guo, Q., Jindal, A., Kalhan, A., Oslake, M., Parchani, S., Ramani, V., Sellappan, R., Sen, S., Shrotri, S., Srinivasan, S., Xia, P., Xu, S., Yang, A., Zhu, Y.: Seagull: An infrastructure for load prediction and optimized resource allocation. Proc. VLDB Endow. 14(2), 154–162 (2020)

    Article  Google Scholar 

  • Prats, D.B., Portella, F.A., Costa, C.H.A., Berral, J.L.: You only run once: Spark auto-tuning from a single run. IEEE Trans. Netw. Serv. Manag. 17(4), 2039–2051 (2020)

    Article  Google Scholar 

  • Qiu, J., Du, Q., Yin, K., Zhang, S.-L., Qian, C.: A causality mining and knowledge graph based method of root cause diagnosis for performance anomaly in cloud applications. Appl. Sci. 10(6), 2166 (2020)

    Article  Google Scholar 

  • Queiroz, L.P., Gomes, J.P.P., Rodrigues, F.C.M., Brito, F.T., Chaves, I.C., de Moura Leite, L.G., Machado, J.C.: Fault detection in hard disk drives based on a semi parametric model and statistical estimators. New Gen. Comput 36(1), 5–19 (2018)

    Article  Google Scholar 

  • Rahman, S., Burtscher, M., Zong, Z., Qasem, A.: Maximizing hardware prefetch effectiveness with machine learning. In: International Conference on High Performance Computing and Communications, HPCC, International Symposium on Cyberspace Safety and Security, CSS, International Conference on Embedded Software and Systems, ICESS, pp. 383–389. IEEE, New York (2015)

    Google Scholar 

  • Ravandi, B., Papapanagiotou, I.: A self-organized resource provisioning for cloud block storage. Future Gen. Comput. Syst. 89, 765–776 (2018)

    Article  Google Scholar 

  • Ren, J., Chen, X., Liu, D., Tan, Y., Duan, M., Li, R., Liang, L.: A machine learning assisted data placement mechanism for hybrid storage systems. J. Syst. Archit. 120, 102295 (2021)

    Article  Google Scholar 

  • Rodriguez, L.V., Yusuf, F.B., Lyons, S., Paz, E., Rangaswami, R., Liu, J., Zhao, M., Narasimhan, G.: Learning cache replacement with CACHEUS. In: Conference on File and Storage Technologies, FAST, pp. 341–354. USENIX Association, Virtual Event (2021)

  • Sethumurugan, S., Yin, J., Sartori, J.: Designing a cost-effective cache replacement policy using machine learning. In: International Symposium on High-Performance Computer Architecture, HPCA, pp. 291–303. IEEE, Seoul (2021)

    Google Scholar 

  • Shen, J., Wan, J., Lim, S., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11) (2018)

  • Shi, H., Arumugam, R.V., Foh, C.H., Khaing, K.K.: Optimal disk storage allocation for multi-tier storage system. In: 2012 Digest APMRC, pp. 1–7 (2012)

  • Shi, W., Cheng, P., Zhu, C., Chen, Z.: An intelligent data placement strategy for hierarchical storage systems. In: International Conference on Computer and Communications (ICCC), pp. 2023–2027 (2020). IEEE

  • Shi, Z., Jain, A., Swersky, K., Hashemi, M., Ranganathan, P., Lin, C.: A hierarchical neural model of data prefetching. In: International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pp. 861–873. ACM, Virtual Event, USA (2021)

  • Shi, Z., Huang, X., Jain, A., Lin, C.: Applying deep learning to the cache replacement problem. In: International Symposium on Microarchitecture, MICRO, pp. 413–425. ACM, Columbus (2019)

    Google Scholar 

  • Song, Z., Berger, D.S., Li, K., Lloyd, W.: Learning relaxed belady for content distribution network caching. In: Symposium on Networked Systems Design and Implementation, NSDI, pp. 529–544. USENIX Association, Santa Clara (2020)

    Google Scholar 

  • Spector, B., Kipf, A., Vaidya, K., Wang, C., Minhas, U.F., Kraska, T.: Bounding the last mile: Efficient learned string indexing. CoRR abs/2111.14905 (2021)

  • Srivastava, A., Lazaris, A., Brooks, B., Kannan, R., Prasanna, V.K.: Predicting memory accesses: the road to compact ml-driven prefetcher. In: International Symposium on Memory Systems, MEMSYS, pp. 461–470. ACM, Washington (2019)

    Chapter  Google Scholar 

  • Stoian, M., Kipf, A., Marcus, R., Kraska, T.: Plex: Towards practical learned indexing. (2021) arXiv preprint arXiv:2108.05117

  • Subedi, P., Davis, P.E., Duan, S., Klasky, S., Kolla, H., Parashar, M.: Stacker: an autonomic data movement engine for extreme-scale data staging-based in-situ workflows. In: International Conference for High Performance Computing, Networking, Storage, and Analysis, SC, pp. 73–17311. IEEE / ACM, Dallas (2018)

    Google Scholar 

  • Sun, X., Chakrabarty, K., Huang, R., Chen, Y., Zhao, B., Cao, H., Han, Y., Liang, X., Jiang, L.: System-level hardware failure prediction using deep learning. In: Design Automation Conference, DAC, p. 20. ACM, Las Vegas, NV, USA (2019)

  • Sun, Q., Jin, T., Romanus, M., Bui, H., Zhang, F., Yu, H., Kolla, H., Klasky, S., Chen, J., Parashar, M.: Adaptive data placement for staging-based coupled scientific workflows. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 65–16512. ACM, Austin (2015)

    Google Scholar 

  • Sun, Y., Zhao, Y., Su, Y., Liu, D., Nie, X., Meng, Y., Cheng, S., Pei, D., Zhang, S., Qu, X., Guo, X.: Hotspot: Anomaly localization for additive kpis with multi-dimensional attributes. IEEE Access 6, 10909–10923 (2018)

    Article  Google Scholar 

  • Tang, C., Dong, Z., Wang, M., Wang, Z., Chen, H.: Learned indexes for dynamic workloads. CoRR abs/1902.00655 (2019) 1902.00655

  • Tang, C., Wang, Y., Dong, Z., Hu, G., Wang, Z., Wang, M., Chen, H.: Xindex: a scalable learned index for multicore data storage. In: Gupta, R., Shen, X. (eds.) SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pp. 308–320. ACM, San Diego (2020)

    Google Scholar 

  • Teran, E., Wang, Z., Jiménez, D.A.: Perceptron learning for reuse prediction. In: International Symposium on Microarchitecture, MICRO, pp. 2–1212. IEEE Computer Society, Taipei (2016)

    Google Scholar 

  • Thomas, L., Gougeaud, S., Rubini, S., Deniel, P., Boukhobza, J.: Predicting file lifetimes for data placement in multi-tiered storage systems for HPC. ACM SIGOPS Oper. Syst. Rev. 55(1), 99–107 (2021)

    Article  Google Scholar 

  • Tomes, E., Rush, E.N., Altiparmak, N.: Towards adaptive parallel storage systems. IEEE Trans. Comput. 67(12), 1840–1848 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Tsai, L., Franke, H., Li, C., Liao, W.: Learning-based memory allocation optimization for delay-sensitive big data processing. IEEE Trans. Parallel Distrib.Syst. 29(6), 1332–1341 (2018)

    Article  Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Annual Conference on Neural Information Processing Systems, NIPS, Long Beach, CA, USA, pp. 5998–6008 (2017)

  • Vietri, G., Rodriguez, L.V., Martinez, W.A., Lyons, S., Liu, J., Rangaswami, R., Zhao, M., Narasimhan, G.: Driving cache replacement with ml-based lecar. In: Workshop on Hot Topics in Storage and File Systems, HotStorage. USENIX Association, Boston, MA, USA (2018)

  • Wang, H., He, H., Alizadeh, M., Mao, H.: Learning caching policies with subsampling. In: NeurIPS Machine Learning for Systems Workshop (2019)

  • Wang, X., Li, Y., Chen, Y., Wang, S., Du,: Y., He, C., Zhang, Y., Chen, P., Li, X., Song, W., Xu, Q., Jiang, L.: On workload-aware DRAM failure prediction in large-scale data centers. In: VLSI Test Symposium, VTS, pp. 1–6. IEEE, San Diego, CA, USA (2021)

  • Wang, P., Xu, J., Ma, M., Lin, W., Pan, D., Wang, Y., Chen, P.: Cloudranger: Root cause identification for cloud native systems. In: International Symposium on Cluster, Cloud and Grid Computing, CCGRID, pp. 492–502. IEEE Computer Society, Washington (2018)

    Google Scholar 

  • Wang, H., Yi, X., Huang, P., Cheng, B., Zhou, K.: Efficient SSD caching by avoiding unnecessary writes using machine learning. In: International Conference on Parallel Processing, ICPP, pp. 82–18210. ACM, Eugene (2018)

    Google Scholar 

  • Wang, H., Nguyen, P., Li, J., Köprü, S., Zhang, G., Katariya, S., Ben-Romdhane, S.: GRANO: interactive graph-based root cause analysis for cloud-native distributed data platform. Proc. VLDB Endow. 12(12), 1942–1945 (2019)

    Article  Google Scholar 

  • Wang, H., Yang, Y., Huang, P., Zhang, Y., Zhou, K., Tao, M., Cheng, B.: S-CDA: A smart cloud disk allocation approach in cloud block storage system. In: Design Automation Conference, DAC, pp. 1–6. IEEE, San Francisco (2020)

    Google Scholar 

  • Wang, H., Zhang, J., Huang, P., Yi, X., Cheng, B., Zhou, K.: Cache what you need to cache: Reducing write traffic in cloud cache via “one-time-access-exclusion’’ policy. ACM Trans. Storage 16(3), 18–11824 (2020)

    Article  Google Scholar 

  • Wang, Y., Tang, C., Wang, Z., Chen, H.: Sindex: a scalable learned index for string keys. In: SIGOPS Asia-Pacific Workshop on Systems, APSys, pp. 17–24. ACM, Tsukuba (2020)

    Chapter  Google Scholar 

  • Wang, Y., Song, J., Zhou, K., Liu, Y.: Unsupervised deep hashing with node representation for image retrieval. Pattern Recognit. 112, 107785 (2021)

    Article  Google Scholar 

  • Wei, X., Chen, R., Chen, H.: Fast rdma-based ordered key-value store using remote learned cache. In: Symposium on Operating Systems Design and Implementation, OSDI, pp. 117–135. USENIX Association, Virtual Event (2020)

  • Wei, X., Chen, R., Chen, H., Zang, B.: Xstore: Fast rdma-based ordered key-value store using remote learned cache. ACM Trans. Storage 17(3), 18–11832 (2021)

    Article  Google Scholar 

  • Weng, J., Wang, J.H., Yang, J., Yang, Y.: Root cause analysis of anomalies of multitier services in public clouds. IEEE/ACM Trans. Netw. 26(4), 1646–1659 (2018)

    Article  Google Scholar 

  • Wilkening, M., Gupta, U., Hsia, S., Trippel, C., Wu, C., Brooks, D., Wei, G.: Recssd: near data processing for solid state drive based recommendation inference. In: International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pp. 717–729. ACM, Virtual Event, USA (2021)

  • Wu, Z., Xu, H., Pang, G., Yu, F., Wang, Y., Jian, S., Wang, Y.: DRAM failure prediction in aiops: Empiricalevaluation, challenges and opportunities. CoRR abs/2104.15052 (2021)

  • Wu, C., Ji, C., Xue, C.J.: Reinforcement learning based background segment cleaning for log-structured file system on mobile devices. In: International Conference on Embedded Software and Systems, ICESS, pp. 1–8. IEEE, Las Vegas (2019)

    Google Scholar 

  • Wu, L., Tordsson, J., Elmroth, E., Kao, O.: Microrca: Root cause localization of performance issues in microservices. In: Network Operations and Management Symposium, NOMS, pp. 1–9. IEEE, Budapest (2020)

    Google Scholar 

  • Wu, J., Zhang, Y., Chen, S., Chen, Y., Wang, J., Xing, C.: Updatable learned index with precise positions. Proc. VLDB Endow. 14(8), 1276–1288 (2021)

    Article  Google Scholar 

  • Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: International Conference on Parallel Processing, ICPP, pp. 35–13510. ACM, Eugene (2018)

    Google Scholar 

  • Xie, D., Chandramouli, B., Li, Y., Kossmann, D.: Fishstore: Faster ingestion with subset hashing. In: International Conference on Management of Data, SIGMOD, pp. 1711–1728. ACM, Amsterdam (2019)

    Google Scholar 

  • Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, Y., Sui, K., Yao, R., Zhang, H., Lin, Q., Dang, Y., Li, P., Jiang, K., Zhang, W., Lou, J., Chintalapati, M., Zhang, D.: Improving service availability of cloud systems by predicting disk error. In: Annual Technical Conference, ATC, pp. 481–494. USENIX Association, Boston (2018)

    Google Scholar 

  • Xu, R., Jin, X., Tao, L., Guo, S., Xiang, Z., Tian, T.: An efficient resource-optimized learning prefetcher for solid state drives. In: Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 273–276. IEEE, Dresden (2018)

    Google Scholar 

  • Xu, F., Han, S., Lee, P.P.C., Liu, Y., He, C., Liu, J.: General feature selection for failure prediction in large-scale SSD deployment. In: International Conference on Dependable Systems and Networks, DSN, pp. 263–270. IEEE, Taipei (2021)

    Google Scholar 

  • Yan, G., Li, J.: Rl-bélády: A unified learning framework for content caching. In: Chen, C.W., Cucchiara, R., Hua, X., Qi, G., Ricci, E., Zhang, Z., Zimmermann, R. (eds.) International Conference on Multimedia, MM, pp. 1009–1017. ACM, Virtual Event/Seattle (2020)

    Google Scholar 

  • Yang, P., Xue, N., Zhang, Y., Zhou, Y., Sun, L., Chen, W., Chen, Z., Xia, W., Li, J., Kwon, K.: Reducing garbage collection overhead in SSD based on workload prediction. In: Workshop on Hot Topics in Storage and File Systems, HotStorage. USENIX Association, Renton, WA, USA (2019)

  • Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: Symposium on Reliable Distributed Systems Workshop, SRDS, pp. 13–18. IEEE Computer Society, Montreal (2015)

    Google Scholar 

  • Yang, Y., Misra, V., Rubenstein, D.: On the optimality of greedy garbage collection for ssds. SIGMETRICS Perform. Eval. Rev. 43(2), 63–65 (2015)

    Article  Google Scholar 

  • Yang, L., Wang, F., Tan, Z., Feng, D., Qian, J., Tu, S.: ARS: reducing F2FS fragmentation for smartphones using decision trees. In: Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 1061–1066. IEEE, Grenoble (2020)

    Google Scholar 

  • Yang, L., Tan, Z., Wang, F., Tu, S., Shao, J.: M2H: optimizing F2FS via multi-log delayed writing and modified segment cleaning based on dynamically identified hotness. In: Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 808–811. IEEE, Grenoble (2021)

    Google Scholar 

  • Ye, J., Li, Z., Wang, Z., Zheng, Z., Hu, H., Zhu, W.: Joint cache size scaling and replacement adaptation for small content providers. In: Conference on Computer Communications, INFOCOM, pp. 1–10. IEEE, Vancouver (2021)

    Google Scholar 

  • Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.: Metaformer is actually what you need for vision. CoRR abs/2111.11418 (2021)

  • Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Fut. Gen. Comput. Syst. 26(8), 1200–1214 (2010)

    Article  Google Scholar 

  • Zeng, Y., Guo, X.: Long short term memory based hardware prefetcher: a case study. In: International Symposium on Memory Systems, MEMSYS, pp. 305–311. ACM, Alexandria (2017)

    Chapter  Google Scholar 

  • Zhang, M., He, Y.: Zoom: Multi-view vector search for optimizing accuracy, latency and memory. Technical Report MSR-TR-2018-25 (August 2018). https://www.microsoft.com/en-us/research/publication/zoom-multi-view-vector-search-for-optimizing-accuracy-latency-and-memory/

  • Zhang, J., Huang, P., Zhou, K., Xie, M., Schelter, S.: Hddse: Enabling high-dimensional disk state embedding for generic failure detection system of heterogeneous disks in large data centers. In: Annual Technical Conference, ATC, pp. 111–126. USENIX Association, Virtual Event (2020)

  • Zhang, X., Wu, H., Chang, Z., Jin, S., Tan, J., Li, F., Zhang, T., Cui, B.: Restune: Resource oriented tuning boosted by meta-learning for cloud databases. In: International Conference on Management of Data, SIGMOD, pp. 2102–2114. ACM, Virtual Event, China (2021)

  • Zhang, J., Liu, Y., Zhou, K., Li, G., Xiao, Z., Cheng, B., Xing, J., Wang, Y., Cheng, T., Liu, L., Ran, M., Li, Z.: An end-to-end automatic cloud database tuning system using deep reinforcement learning. In: International Conference on Management of Data, SIGMOD, pp. 415–432. ACM, Amsterdam (2019)

    Google Scholar 

  • Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong, B., Chen, H., Chawla, N.V.: A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In: Conference on Artificial Intelligence, AAAI, pp. 1409–1416. AAAI Press, Honolulu (2019)

    Google Scholar 

  • Zhang, S., Roy, R., Rumancik, L., Wang, A.A.: The composite-file file system: decoupling one-to-one mapping of files and metadata for better performance. ACM Trans. Storage 16(1), 5–1518 (2020)

    Article  Google Scholar 

  • Zhang, J., Zhou, K., Huang, P., He, X., Xie, M., Cheng, B., Ji, Y., Wang, Y.: Minority disk failure prediction based on transfer learning in large data centers of heterogeneous disk systems. IEEE Trans. Parallel Distrib. Syst. 31(9), 2155–2169 (2020)

    Article  Google Scholar 

  • Zhang, Y., Huang, P., Zhou, K., Wang, H., Hu, J., Ji, Y., Cheng, B.: OSCA: an online-model based cache allocation scheme in cloud block storage systems. In: Gavrilovska, A., Zadok, E. (eds.) Annual Technical Conference, ATC, pp. 785–798. USENIX Association, Virtual Event (2020)

    Google Scholar 

  • Zhang, J., Wang, Y., Wang, Y., Zhou, K., Schelter, S., Huang, P., Cheng, B., Ji, Y.: Tier-scrubbing: An adaptive and tiered disk scrubbing scheme with improved MTTD and reduced cost. In: Design Automation Conference, DAC, pp. 1–6. IEEE, San Francisco (2020)

    Google Scholar 

  • Zhang, Y., Zhou, K., Huang, P., Wang, H., Hu, J., Wang, Y., Ji, Y., Cheng, B.: A machine learning based write policy for SSD cache in cloud block storage. In: Design, Automation & Test in Europe Conference & Exhibition, DATE, pp. 1279–1282. IEEE, Grenoble (2020)

    Google Scholar 

  • Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and hsmm-based approaches. In: International Conference on Data Mining, ICDM, Lecture Notes in Computer Science, vol. 6171, pp. 390–404. Springer, Berlin (2010)

    Google Scholar 

  • Zheng, Y., Guo, Q., Tung, A.K.H., Wu, S.: Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. In: International Conference on Management of Data, SIGMOD, pp. 2023–2037. ACM, San Francisco (2016)

    Google Scholar 

  • Zhou, K., Liu, Y., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: Conference on Multimedia Conference, MM, pp. 1215–1218. ACM, Brisbane (2015)

    Google Scholar 

  • Zhou, J., Guo, Q., Jagadish, H.V., Krcál, L., Liu, S., Luan, W., Tung, A.K.H., Yang, Y., Zheng, Y.: A generic inverted index framework for similarity search on the GPU. In: International Conference on Data Engineering, ICDE, pp. 893–904. IEEE Computer Society, Paris (2018)

    Google Scholar 

  • Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Liu, D., Xiang, Q., He, C.: Latent error prediction and fault localization for microservice applications by learning from system trace logs. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/SIGSOFT FSE, pp. 683–694. ACM, Tallinna (2019)

    Google Scholar 

  • Zhu, Y., Liu, J.: Classytune: A performance auto-tuner for systems in the cloud. CoRR abs/1910.05482 (2019)

  • Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Symposium on Mass Storage Systems and Technologies, MSST, pp. 1–5. IEEE Computer Society, Long Beach (2013)

    Google Scholar 

  • Zhu, Y., Liu, J., Guo, M., Bao, Y., Ma, W., Liu, Z., Song, K., Yang, Y.: Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In: Symposium on Cloud Computing, SoCC, pp. 338–350. ACM, Santa Clara (2017)

    Google Scholar 

  • Züfle, M., Krupitzer, C., Erhard, F., Grohmann, J., Kounev, S.: To fail or not to fail: Predicting hard disk drive failure time windows. In: Measurement, Modelling and Evaluation of Computing Systems MMB, Lecture Notes in Computer Science, vol. 12040, pp. 19–36. Springer, Saarbrücken (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China No.61902135 and No.62172180, and the Joint Founds of ShanDong Natural Science Funds (Grant No.ZR2019LZH003). Thanks to Professor Hong Jiang for his advice on classification issues. Thanks to all students in Intelligent Data Storage and Management Laboratory\(^{4}\).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Wang, H., Zhou, K. et al. A survey on AI for storage. CCF Trans. HPC 4, 233–264 (2022). https://doi.org/10.1007/s42514-022-00101-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-022-00101-3

Keywords

Navigation