skip to main content
10.1145/3267809.3267827acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Netco: Cache and I/O Management for Analytics over Disaggregated Stores

Published: 11 October 2018 Publication History

Abstract

We consider a common setting where storage is disaggregated from the compute in data-parallel systems. Colocating caching tiers with the compute machines can reduce load on the interconnect but doing so leads to new resource management challenges. We design a system Netco, which prefetches data into the cache (based on workload predictability), and appropriately divides the cache space and network bandwidth between the prefetches and serving ongoing jobs. Netco makes various decisions (what content to cache, when to cache and how to apportion bandwidth) to support end-to-end optimization goals such as maximizing the number of jobs that meet their service-level objectives (e.g., deadlines). Our implementation of these ideas is available within the open-source Apache HDFS project. Experiments on a public cloud, with production-trace inspired workloads, show that Netco uses up to 5x less remote I/O compared to existing techniques and increases the number of jobs that meet their deadlines up to 80%.

References

[1]
Amazon Elastic Compute Cloud: Enhanced Networking on Linux. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html.
[2]
Allow HDFS block replicas to be provided by an external storage system. https://issues.apache. org/jira/browse/HDFS-9806.
[3]
Alluxio - Open Source Memory Speed Virtual Distributed Storage. http://www.alluxio.org/.
[4]
Amazon EC2. https://aws.amazon.com/ec2/.
[5]
Amazon Elastic Block Store. https://aws.amazon.com/ebs/.
[6]
Amazon S3. https://aws.amazon.com/s3/.
[7]
Apache Gridmix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html.
[8]
Apache Hadoop. http://hadoop.apache.org/.
[9]
Azure Data Lake Analytics. https://azure.microsoft.com/en-us/services/data-lake-analytics/.
[10]
Azure Storage Scalability and Performance Targets. https://docs.microsoft.com/en-us/azure/storage/common/storage-scalability-targets.
[11]
Best Practices for Amazon EMR. https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf.
[12]
Cloudera Enterprise Reference Architecture for Azure Deployments. http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_azure.pdf.
[13]
Databricks IO Cache. https://docs.databricks.com/user-guide/databricks-io-cache.html.
[14]
Enable HDFS to cache data read from external storage systems. https://issues.apache.org/jira/browse/HDFS-13069.
[15]
Gurobi Optimization. http://www.gurobi.com/.
[16]
Hadoop Distributed File System. https://wiki.apache.org/hadoop/HDFS.
[17]
Handling writes from HDFS to Provided storages. https://issues.apache.org/jira/browse/HDFS-12090.
[18]
High-performance Premium Storage and managed disks for VMs. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage.
[19]
Microsoft Azure. https://azure.microsoft.com/en-us/.
[20]
Moving Data into HDFS from Amazon S3. http://documentation.altiscale.com/moving-data-from-s3-to-hdfs.
[21]
Sizes for Windows virtual machines in Azure. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes.
[22]
Use HDFS-compatible Azure Blob storage with Hadoop in HDInsight. https://docs.microsoft. com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage.
[23]
Windows Azure Storage BLOB. https://azure.microsoft.com/en-us/services/storage/blobs/.
[24]
S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing data-parallel computing. In NSDI, 2012.
[25]
S. Albers, S. Arora, and S. Khanna. Page replacement for general caching problems. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '99, Philadelphia, PA, USA, 1999. Society for Industrial and Applied Mathematics.
[26]
G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with skewed content popularity in mapreduce clusters. In Proceedings of the Sixth Conference on Computer Systems, EuroSys '11, New York, NY, USA, 2011. ACM.
[27]
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective Straggler Mitigation: Attack of the Clones. In NSDI, 2013.
[28]
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. Pacman: Coordinated memory caching for parallel jobs. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, Berkeley, CA, USA, 2012. USENIX Association.
[29]
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in Map-reduce Clusters Using Mantri. In OSDI, 2010.
[30]
N. Bansal, N. Buchbinder, and J. S. Naor. A primal-dual randomized algorithm for weighted paging. Journal of the ACM (JACM), 59(4):19, 2012.
[31]
A. Bestavros. Using speculation to reduce server load and service time on the www. Technical report, Boston, MA, USA, 1995.
[32]
A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an O(n1/4) approximation for densest k-subgraph. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 201--210, 2010.
[33]
E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and Coordinated Scheduling for Cloud-scale Computing. In OSDI, 2014.
[34]
M. Brehob, S. Wagner, E. Torng, and R. Enbody. Optimal replacement is np-hardfor nonstandard caches. IEEE Trans. Comput., 53(1):73--76, Jan. 2004.
[35]
G. Călinescu, A. Chakrabarti, H. J. Karloff, and Y. Rabani. An improved approximation algorithm for resource allocation. ACM Trans. Algorithms, 7(4):48:1--48:7, 2011.
[36]
Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proc. VLDB Endow., 5(12):1802--1813, Aug. 2012.
[37]
Y. Cheng, M. S. Iqbal, A. Gupta, and A. R. Butt. Cast: Tiering storage for data analytics in the cloud. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, New York, NY, USA, 2015. ACM.
[38]
H.-T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. In Proceedings of the 11th International Conference on Very Large Data Bases - Volume 11, VLDB '85. VLDB Endowment, 1985.
[39]
M. Chowdhury et al. Leveraging Endpoint Flexibility in Data-Intensive Clusters. In SIGCOMM, 2013.
[40]
M. Chowdhury and I. Stoica. Coflow: A networking abstraction for cluster applications. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, New York, NY, USA, 2012. ACM.
[41]
M. Chowdhury and I. Stoica. Efficient coflow scheduling without prior knowledge. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, New York, NY, USA, 2015. ACM.
[42]
M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with varys. In ACM SIGCOMM 2014.
[43]
D. E. Culler, A. Gupta, and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 1997.
[44]
C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan, and S. Rao. Reservation-based scheduling: If you're late don't blame us! In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, New York, NY, USA, 2014. ACM.
[45]
F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized task-aware scheduling for data center networks. In ACM SIGCOMM 2014.
[46]
U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3):410--421, 2001.
[47]
A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, New York, NY, USA, 2012. ACM.
[48]
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource Packing for Cluster Schedulers. In SIGCOMM, 2014.
[49]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In NSDI, 2011.
[50]
A. Iosup, N. Yigitbasi, and D. Epema. On the Performance Variability of Production Cloud Services. In CCGRID, 2011.
[51]
S. Irani. Page replacement with multi-size pages and applications to web caching. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC '97, New York, NY, USA, 1997. ACM.
[52]
S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, New York, NY, USA, 2013. ACM.
[53]
V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, andM. Caesar. Network-aware scheduling for data-parallel jobs: Plan when you can. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, New York, NY, USA, 2015. ACM.
[54]
S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. n. Goiri, S. Krishnan, J. Kulkarni, and S. Rao. Morpheus: Towards automated slos for enterprise clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, Berkeley, CA, USA, 2016. USENIX Association.
[55]
E. Kakoulli and H. Herodotou. OctopusFS: A Distributed File System with Tiered Storage Management. In SIGMOD Conference, 2017.
[56]
S. Kandula, I. Menache, R. Schwartz, and S. R. Babbula. Calendaring for wide area networks. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, New York, NY, USA, 2014. ACM.
[57]
P. Leitner and J. Cito. Patterns in the chaos-a study of performance variation and predictability in public iaas clouds. ACM Transactions on Internet Technology (TOIT), 16(3):15, 2016.
[58]
H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing, pages 1--15. ACM, 2014.
[59]
P. Manurangsi. Almost-polynomial ratio eth-hardness of approximating densest k-subgraph. In Proceedings of the 49th ACM Symposium on Theory of Computing, STOC 2017, Montreal, Quebec, Canada.
[60]
M. Mao and M. Humphrey. Auto-scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows. In SC, 2011.
[61]
N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In Proceedings of the 2Nd USENIX Conference on File and Storage Technologies, FAST '03, Berkeley, CA, USA, 2003. USENIX Association.
[62]
R. Motwani and P. Raghavan. Randomized algorithms. Chapman & Hall/CRC, 2010.
[63]
V. Narasayya, I. Menache, M. Singh, F. Li, M. Syamala, and S. Chaudhuri. Sharing buffer pool memory in multi-tenant relational database-as-a-ser vice. Proceedings of the VLDB Endowment, 8(7):726--737, 2015.
[64]
E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K Page Replacement Algorithm for Database Disk Buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, New York, NY, USA, 1993. ACM.
[65]
E. J. O'neil, P. E. O'neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297--306, 1993.
[66]
V. N. Padmanabhan and J. C. Mogul. Using predictive prefetching to improve world wide web latency. SIGCOMM Comput. Commun. Rev., 26(3):22--36, July 1996.
[67]
Q. Pu, H. Li, M. Zaharia, A. Ghodsi, and I. Stoica. Fairride: Near-optimal, fair cache sharing. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, NSDI'16, Berkeley, CA, USA, 2016. USENIX Association.
[68]
K. V. Rashmi, M. Chowdhury, J. Kosaian, I. Stoica, and K Ramchandran. EC-cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, 2016.
[69]
A. S. Tanenbaum and H. Bos. Modern Operating Systems. Prentice Hall Press, Upper Saddle River, NJ, USA, 4th edition, 2014.
[70]
E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, and T. Zhu. Ioflow: A software-defined storage architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, New York, NY, USA, 2013. ACM.
[71]
J. Wang. A survey of web caching schemes for the internet. SIGCOMM Comput. Commun. Rev., 29(5), Oct. 1999.
[72]
A. Wieder, P. Bhatotia, A. Post, and R. Rodrigues. Orchestrating the Deployment of Computations in the Cloud with Conductor. In NSDI, 2012.
[73]
Z. Wu, C. Yu, and H. V. Madhyastha. CosTLO: Cost-effective Redundancy for Lower Latency Variance on Cloud Storage Services. In NSDI, 2015.
[74]
J. Yang, R. Karimi, T. Sæmundsson, A. Wildani, and Y. Vigfusson. MITHRIL: Mining Sporadic Associations for Cache Prefetching. CoRR, abs/1705.07400, 2017.
[75]
S. Yang, K. Srinivasan, K. Udayashankar, S. Krishnan, J. Feng, Y. Zhang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Tombolo: Performance enhancements for cloud storage gateways. In MSST, 2016.
[76]
H. Zhang, K. Chen, W. Bai, D. Han, C. Tian, H. Wang, H. Guan, and M. Zhang. Guaranteeing deadlines for inter-datacenter transfers. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, New York, NY, USA, 2015. ACM.
[77]
H. Zhang, L. Chen, B. Yi, K. Chen, M. Chowdhury, and Y. Geng. Coda: Toward automatically identifying and scheduling coflows in the dark. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference, SIGCOMM '16, New York, NY, USA, 2016. ACM.
[78]
T. Zou, R. Le Bras, M. V. Salles, A. Demers, and J. Gehrke. ClouDiA: a deployment advisor for public clouds. In PVLDB'13, 2013.

Cited By

View all
  • (2024)DAG-aware harmonizing job scheduling and data caching for disaggregated analytics frameworksFuture Generation Computer Systems10.1016/j.future.2024.03.005156(116-129)Online publication date: Jul-2024
  • (2023)Exploiting Cloud Object Storage for High-Performance AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361148616:11(2769-2782)Online publication date: 24-Aug-2023
  • (2023)SiloD: A Co-design of Caching and Scheduling for Deep Learning ClustersProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567499(883-898)Online publication date: 8-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '18: Proceedings of the ACM Symposium on Cloud Computing
October 2018
546 pages
ISBN:9781450360111
DOI:10.1145/3267809
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2018

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. Disaggregated architectures
  2. caching
  3. cloud computing
  4. data analytics

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoCC '18
Sponsor:
SoCC '18: ACM Symposium on Cloud Computing
October 11 - 13, 2018
CA, Carlsbad, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DAG-aware harmonizing job scheduling and data caching for disaggregated analytics frameworksFuture Generation Computer Systems10.1016/j.future.2024.03.005156(116-129)Online publication date: Jul-2024
  • (2023)Exploiting Cloud Object Storage for High-Performance AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361148616:11(2769-2782)Online publication date: 24-Aug-2023
  • (2023)SiloD: A Co-design of Caching and Scheduling for Deep Learning ClustersProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567499(883-898)Online publication date: 8-May-2023
  • (2023)S/C: Speeding up Data Materialization with Bounded Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00393(1981-1994)Online publication date: Apr-2023
  • (2022)Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and OpportunitiesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522566(2465-2473)Online publication date: 10-Jun-2022
  • (2022)IS-HBase: An In-Storage Computing Optimized HBase with I/O Offloading and Self-Adaptive Caching in Compute-Storage Disaggregated InfrastructureACM Transactions on Storage10.1145/348836818:2(1-42)Online publication date: 12-Apr-2022
  • (2022)DFMan: A Graph-based Optimization of Dataflow Scheduling on High-Performance Computing Systems2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00043(368-378)Online publication date: May-2022
  • (2022)Optimizing Near-Data Processing for Spark2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00067(636-646)Online publication date: Jul-2022
  • (2022)Tripod: Harmonizing Job Scheduling and Data Caching for Analytics Frameworks2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00095(610-618)Online publication date: Oct-2022
  • (2022)Workload-aware storage policies for cloud object storageJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.01.026163(232-247)Online publication date: May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media