ABSTRACT
We consider a common setting where storage is disaggregated from the compute in data-parallel systems. Colocating caching tiers with the compute machines can reduce load on the interconnect but doing so leads to new resource management challenges. We design a system Netco, which prefetches data into the cache (based on workload predictability), and appropriately divides the cache space and network bandwidth between the prefetches and serving ongoing jobs. Netco makes various decisions (what content to cache, when to cache and how to apportion bandwidth) to support end-to-end optimization goals such as maximizing the number of jobs that meet their service-level objectives (e.g., deadlines). Our implementation of these ideas is available within the open-source Apache HDFS project. Experiments on a public cloud, with production-trace inspired workloads, show that Netco uses up to 5x less remote I/O compared to existing techniques and increases the number of jobs that meet their deadlines up to 80%.
- Amazon Elastic Compute Cloud: Enhanced Networking on Linux. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html.Google Scholar
- Allow HDFS block replicas to be provided by an external storage system. https://issues.apache. org/jira/browse/HDFS-9806.Google Scholar
- Alluxio - Open Source Memory Speed Virtual Distributed Storage. http://www.alluxio.org/.Google Scholar
- Amazon EC2. https://aws.amazon.com/ec2/.Google Scholar
- Amazon Elastic Block Store. https://aws.amazon.com/ebs/.Google Scholar
- Amazon S3. https://aws.amazon.com/s3/.Google Scholar
- Apache Gridmix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html.Google Scholar
- Apache Hadoop. http://hadoop.apache.org/.Google Scholar
- Azure Data Lake Analytics. https://azure.microsoft.com/en-us/services/data-lake-analytics/.Google Scholar
- Azure Storage Scalability and Performance Targets. https://docs.microsoft.com/en-us/azure/storage/common/storage-scalability-targets.Google Scholar
- Best Practices for Amazon EMR. https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf.Google Scholar
- Cloudera Enterprise Reference Architecture for Azure Deployments. http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_azure.pdf.Google Scholar
- Databricks IO Cache. https://docs.databricks.com/user-guide/databricks-io-cache.html.Google Scholar
- Enable HDFS to cache data read from external storage systems. https://issues.apache.org/jira/browse/HDFS-13069.Google Scholar
- Gurobi Optimization. http://www.gurobi.com/.Google Scholar
- Hadoop Distributed File System. https://wiki.apache.org/hadoop/HDFS.Google Scholar
- Handling writes from HDFS to Provided storages. https://issues.apache.org/jira/browse/HDFS-12090.Google Scholar
- High-performance Premium Storage and managed disks for VMs. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage.Google Scholar
- Microsoft Azure. https://azure.microsoft.com/en-us/.Google Scholar
- Moving Data into HDFS from Amazon S3. http://documentation.altiscale.com/moving-data-from-s3-to-hdfs.Google Scholar
- Sizes for Windows virtual machines in Azure. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes.Google Scholar
- Use HDFS-compatible Azure Blob storage with Hadoop in HDInsight. https://docs.microsoft. com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage.Google Scholar
- Windows Azure Storage BLOB. https://azure.microsoft.com/en-us/services/storage/blobs/.Google Scholar
- S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing data-parallel computing. In NSDI, 2012. Google ScholarDigital Library
- S. Albers, S. Arora, and S. Khanna. Page replacement for general caching problems. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '99, Philadelphia, PA, USA, 1999. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with skewed content popularity in mapreduce clusters. In Proceedings of the Sixth Conference on Computer Systems, EuroSys '11, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective Straggler Mitigation: Attack of the Clones. In NSDI, 2013. Google ScholarDigital Library
- G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. Pacman: Coordinated memory caching for parallel jobs. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarDigital Library
- G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in Map-reduce Clusters Using Mantri. In OSDI, 2010. Google ScholarDigital Library
- N. Bansal, N. Buchbinder, and J. S. Naor. A primal-dual randomized algorithm for weighted paging. Journal of the ACM (JACM), 59(4):19, 2012. Google ScholarDigital Library
- A. Bestavros. Using speculation to reduce server load and service time on the www. Technical report, Boston, MA, USA, 1995. Google ScholarDigital Library
- A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an O(n1/4) approximation for densest k-subgraph. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 201--210, 2010. Google ScholarDigital Library
- E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and Coordinated Scheduling for Cloud-scale Computing. In OSDI, 2014. Google ScholarDigital Library
- M. Brehob, S. Wagner, E. Torng, and R. Enbody. Optimal replacement is np-hardfor nonstandard caches. IEEE Trans. Comput., 53(1):73--76, Jan. 2004. Google ScholarDigital Library
- G. Călinescu, A. Chakrabarti, H. J. Karloff, and Y. Rabani. An improved approximation algorithm for resource allocation. ACM Trans. Algorithms, 7(4):48:1--48:7, 2011. Google ScholarDigital Library
- Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proc. VLDB Endow., 5(12):1802--1813, Aug. 2012. Google ScholarDigital Library
- Y. Cheng, M. S. Iqbal, A. Gupta, and A. R. Butt. Cast: Tiering storage for data analytics in the cloud. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '15, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- H.-T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. In Proceedings of the 11th International Conference on Very Large Data Bases - Volume 11, VLDB '85. VLDB Endowment, 1985. Google ScholarDigital Library
- M. Chowdhury et al. Leveraging Endpoint Flexibility in Data-Intensive Clusters. In SIGCOMM, 2013. Google ScholarDigital Library
- M. Chowdhury and I. Stoica. Coflow: A networking abstraction for cluster applications. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- M. Chowdhury and I. Stoica. Efficient coflow scheduling without prior knowledge. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with varys. In ACM SIGCOMM 2014. Google ScholarDigital Library
- D. E. Culler, A. Gupta, and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 1997. Google ScholarDigital Library
- C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan, and S. Rao. Reservation-based scheduling: If you're late don't blame us! In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized task-aware scheduling for data center networks. In ACM SIGCOMM 2014. Google ScholarDigital Library
- U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3):410--421, 2001. Google ScholarDigital Library
- A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource Packing for Cluster Schedulers. In SIGCOMM, 2014. Google ScholarDigital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In NSDI, 2011. Google ScholarDigital Library
- A. Iosup, N. Yigitbasi, and D. Epema. On the Performance Variability of Production Cloud Services. In CCGRID, 2011. Google ScholarDigital Library
- S. Irani. Page replacement with multi-size pages and applications to web caching. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC '97, New York, NY, USA, 1997. ACM. Google ScholarDigital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, andM. Caesar. Network-aware scheduling for data-parallel jobs: Plan when you can. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. n. Goiri, S. Krishnan, J. Kulkarni, and S. Rao. Morpheus: Towards automated slos for enterprise clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, Berkeley, CA, USA, 2016. USENIX Association. Google ScholarDigital Library
- E. Kakoulli and H. Herodotou. OctopusFS: A Distributed File System with Tiered Storage Management. In SIGMOD Conference, 2017. Google ScholarDigital Library
- S. Kandula, I. Menache, R. Schwartz, and S. R. Babbula. Calendaring for wide area networks. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- P. Leitner and J. Cito. Patterns in the chaos-a study of performance variation and predictability in public iaas clouds. ACM Transactions on Internet Technology (TOIT), 16(3):15, 2016. Google ScholarDigital Library
- H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing, pages 1--15. ACM, 2014. Google ScholarDigital Library
- P. Manurangsi. Almost-polynomial ratio eth-hardness of approximating densest k-subgraph. In Proceedings of the 49th ACM Symposium on Theory of Computing, STOC 2017, Montreal, Quebec, Canada. Google ScholarDigital Library
- M. Mao and M. Humphrey. Auto-scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows. In SC, 2011. Google ScholarDigital Library
- N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In Proceedings of the 2Nd USENIX Conference on File and Storage Technologies, FAST '03, Berkeley, CA, USA, 2003. USENIX Association. Google ScholarDigital Library
- R. Motwani and P. Raghavan. Randomized algorithms. Chapman & Hall/CRC, 2010. Google ScholarDigital Library
- V. Narasayya, I. Menache, M. Singh, F. Li, M. Syamala, and S. Chaudhuri. Sharing buffer pool memory in multi-tenant relational database-as-a-ser vice. Proceedings of the VLDB Endowment, 8(7):726--737, 2015. Google ScholarDigital Library
- E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K Page Replacement Algorithm for Database Disk Buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
- E. J. O'neil, P. E. O'neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297--306, 1993. Google ScholarDigital Library
- V. N. Padmanabhan and J. C. Mogul. Using predictive prefetching to improve world wide web latency. SIGCOMM Comput. Commun. Rev., 26(3):22--36, July 1996. Google ScholarDigital Library
- Q. Pu, H. Li, M. Zaharia, A. Ghodsi, and I. Stoica. Fairride: Near-optimal, fair cache sharing. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, NSDI'16, Berkeley, CA, USA, 2016. USENIX Association. Google ScholarDigital Library
- K. V. Rashmi, M. Chowdhury, J. Kosaian, I. Stoica, and K Ramchandran. EC-cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, 2016. Google ScholarDigital Library
- A. S. Tanenbaum and H. Bos. Modern Operating Systems. Prentice Hall Press, Upper Saddle River, NJ, USA, 4th edition, 2014. Google ScholarDigital Library
- E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, and T. Zhu. Ioflow: A software-defined storage architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- J. Wang. A survey of web caching schemes for the internet. SIGCOMM Comput. Commun. Rev., 29(5), Oct. 1999. Google ScholarDigital Library
- A. Wieder, P. Bhatotia, A. Post, and R. Rodrigues. Orchestrating the Deployment of Computations in the Cloud with Conductor. In NSDI, 2012. Google ScholarDigital Library
- Z. Wu, C. Yu, and H. V. Madhyastha. CosTLO: Cost-effective Redundancy for Lower Latency Variance on Cloud Storage Services. In NSDI, 2015. Google ScholarDigital Library
- J. Yang, R. Karimi, T. Sæmundsson, A. Wildani, and Y. Vigfusson. MITHRIL: Mining Sporadic Associations for Cache Prefetching. CoRR, abs/1705.07400, 2017.Google Scholar
- S. Yang, K. Srinivasan, K. Udayashankar, S. Krishnan, J. Feng, Y. Zhang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Tombolo: Performance enhancements for cloud storage gateways. In MSST, 2016.Google Scholar
- H. Zhang, K. Chen, W. Bai, D. Han, C. Tian, H. Wang, H. Guan, and M. Zhang. Guaranteeing deadlines for inter-datacenter transfers. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- H. Zhang, L. Chen, B. Yi, K. Chen, M. Chowdhury, and Y. Geng. Coda: Toward automatically identifying and scheduling coflows in the dark. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference, SIGCOMM '16, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- T. Zou, R. Le Bras, M. V. Salles, A. Demers, and J. Gehrke. ClouDiA: a deployment advisor for public clouds. In PVLDB'13, 2013. Google ScholarDigital Library
Index Terms
- Netco: Cache and I/O Management for Analytics over Disaggregated Stores
Recommendations
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitectureOn-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Modeling LRU cache with invalidation
Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...
Caching Cost Model for In-memory Data Analytics Framework
SMA 2020: The 9th International Conference on Smart Media and ApplicationsIn the era of data-parallel analytics, caching intermediate results is used as a key method to speed up the framework. Existing frameworks apply various caching policies depending on run-time context or programmer’s decision. Since caching still leave ...
Comments