skip to main content
research-article

TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System

Published:01 February 2013Publication History
Skip Abstract Section

Abstract

We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100TB of input data spread across 832 disks in 52 nodes at a rate of 0.938TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 66% better in absolute performance and has over six times the per-node throughput of the previous record holder. When evaluated against the 100TB Indy JouleSort benchmark, TritonSort sorted 9703 records/Joule. In this article, we describe the hardware and software architecture necessary to operate TritonSort at this level of efficiency. Through careful management of system resources to ensure cross-resource balance, we are able to sort data at approximately 80% of the disks’ aggregate sequential write speed.

We believe the work holds a number of lessons for balanced system design and for scale-out architectures in general. While many interesting systems are able to scale linearly with additional servers, per-server performance can lag behind per-server capacity by more than an order of magnitude. Bridging the gap between high scalability and high performance would enable either significantly less expensive systems that are able to do the same work or provide the ability to address significantly larger problem sets with the same infrastructure.

References

  1. Aggarwal, A. and Vitter, J. S. 1988. The input/output complexity of sorting and related problems. Comm. ACM 31, 9, 1116--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alizadeh, M., Greenberg, A., Maltz, D. A., Padhye, J., Patel, P., Prabhakar, B., S Engupta, S., and Srid-Haran, M. 2010. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amdahl, G. 1970. Storage and I/O parameters and system potential. In Proceedings of the IEEE Computer Group Conference.Google ScholarGoogle Scholar
  4. Anderson, E. and Tucek, J. 2009. Efficiency matters! In Proceedings of the Workshop on Hot Topics in Storage and File Systems (HotStorage’09).Google ScholarGoogle Scholar
  5. Anon, Bitton, D., Brown, M., Catell, R., Ceri, S., et al. 1985. A measure of transaction processing power. J. Datamation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H., Culler, D. E., Hellerstein, J. M., and Patterson, D. A. 1997. High-Performance sorting on networks of workstations. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arpaci-Dusseau, R., Arpaci-Dusseau, A., Culler, D., Hellerstein, J., and Patterson, D. 1998. The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs. In Proceedings of the International Symposium on High-Performance Computer Architecture. 90--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arpaci-Dusseau, R. H. 2003. Run-Time adaptation in river. ACM Trans. Comput. Syst. 21, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. AvocentPDU. 2011. Avocent PM3000V PDU. http://www.avocent.com/Products/Category/Power_Distribution_Units/PM1000_2000_3000_PDUs.aspx.Google ScholarGoogle Scholar
  10. Bryant, R. E. 2007. Data-Intensive supercomputing: The case for DISC. Tech. rep. CMU-CS-07-128, Carnegie Mellon University.Google ScholarGoogle Scholar
  11. Dean, J. and Ghemawat, S. 2004. MapReduce: Simpli?ed data processing on large clusters. In Proceedings of the ACM USENIX Symposium on Operating Systems Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dewitt, D., Ghandeharizadeh, S., Schneider, D., Bricker, A., Hsiao, H.-I., and Rasmussen, R. 1990. The gamma database machine project. IEEE Trans. Knowl. Data Engin. 2, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fawnsort. 2010. FAWNSort: Energy-Ef?cient sorting of 10GB. http://sortbenchmark.org/fawnsort_2010.pdf.Google ScholarGoogle Scholar
  14. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Graefe, G. 1994. Volcano-An extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Engin. 6, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gray, J. and Putzolu, G. R. 1987. The 5 minute rule for trading memory for disk accesses and the 10 byte rule for trading memory for CPU time. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hadoop 2011. Apache hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  18. Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the SIGOPS European Conference on Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kuszmaul, B. C. 2007. Kuszmaul, B. C. 2007. TeraByte TokuSampleSort. http://sortbenchmark.org/tokutera.pdf.Google ScholarGoogle Scholar
  20. Nyberg, C., Barclay, T., Cvetanovic, Z., Gray, J., and Lomet, D. 1995. Alphasort: A cache-sensitive parallel external sort. In Proceedings of the International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nyberg, C., Koester, C., and Gray, J. 1997. NSort: A parallel sorting program for NUMA and SMP machines. http://www.ordinal.com/white/whitepaper.html.Google ScholarGoogle Scholar
  22. Rahn, M., Sanders, P., Singler, J., and Kieritz, T. 2009. DEMSort -- Distributed external memory sort. http://sortbenchmark.org/demsort.pdf.Google ScholarGoogle Scholar
  23. Rasmussen, A., Porter, G., Conley, M., Madhyastha, H. V., Mysore, R. N., Pucher, A., and Vahdat, A. 2011. TritonSort: A balanced, large-scale sorting system. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rivoire, S., Shah, M. A., Ranganathan, P., and Kozyrakis, C. 2007. Joulesort: A balanced energy-efficiency benchmark. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’07). ACM, New York, 365--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shah, M., Hellerstein, J., and Franklin, M. 2003. Flux: An adaptive partitioning operator for continuous query systems. In Proceedings of the International Conference on Data Engineering.Google ScholarGoogle Scholar
  26. SortBenchMark 2010. Sort benchmark home page. http://sortbenchmark.org/.Google ScholarGoogle Scholar
  27. WattsUpMeter 2011. WattsUp power meters. http://www.wattsupmeters.com.Google ScholarGoogle Scholar
  28. Welsh, M., Culler, D., and Brewer, E. 2001. SEDA: An architecture for well-conditioned, scalable internet services. In Proceedings of the SIGOPS Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wyllie, J. 2005. Sorting on a cluster attached to a storage-area network. http://sortbenchmark.org/2005_SCS_Wyllie.pdf.Google ScholarGoogle Scholar
  30. YahooCluster. 2008. Scaling Hadoop to 4000 nodes at Yahoo! http://developer.yahoo.net/blogs/hadoop/2008/09/scaling_hadoop_to_4000_nodes_a.html.Google ScholarGoogle Scholar

Index Terms

  1. TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Computer Systems
        ACM Transactions on Computer Systems  Volume 31, Issue 1
        February 2013
        96 pages
        ISSN:0734-2071
        EISSN:1557-7333
        DOI:10.1145/2427631
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 February 2013
        • Accepted: 1 August 2012
        • Revised: 1 April 2012
        • Received: 1 May 2011
        Published in tocs Volume 31, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader