skip to main content
10.1145/2486159.2486194acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Recursive design of hardware priority queues

Published: 23 July 2013 Publication History

Abstract

A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k-1 primitive priority queues of size kn (k=2,3,...) and a RAM of size n, where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays, shift-registers and similar technologies.
As a by product of our technique we present an O(n) time sorting algorithm in a system equipped with a O(wn) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.

References

[1]
M. Thorup, "Equivalence between priority queues and sorting," in IEEE Symposium on Foundations of Computer Science, 2002, pp. 125--134.
[2]
P. Lavoie, D. Haccoun, and Y. Savaria, "A systolic architecture for fast stack sequential decoders," Communications, IEEE Transactions on, vol. 42, no. 234, pp. 324--335, feb/mar/apr 1994.
[3]
S.-W. Moon, K. Shin, and J. Rexford, "Scalable hardware priority queue architectures for high-speed packet switches," in Real-Time Technology and Applications Symposium, 1997. Proceedings., Third IEEE, jun 1997, pp. 203--212.
[4]
H. Wang and B. Lin, "Pipelined van emde boas tree: Algorithms, analysis, and applications," in IEEE INFOCOM, 2007, pp. 2471--2475.
[5]
K. Mclaughlin, S. Sezer, H. Blume, X. Yang, F. Kupzog, and T. G. Noll, "A scalable packet sorting circuit for high-speed wfq packet scheduling," IEEE Transactions on Very Large Scale Integration Systems, vol. 16, pp. 781--791, 2008.
[6]
A. Ioannou and M. Katevenis, "Pipelined heap (priority queue) management for advanced scheduling in high-speed networks," Networking, IEEE/ACM Transactions on, vol. 15, no. 2, pp. 450--461, april 2007.
[7]
R. Chandra and O. Sinnen, "Improving application performance with hardware data structures," in Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, april 2010, pp. 1--4.
[8]
R. Panigrahy and S. Sharma, "Sorting and searching using ternary cams," IEEE Micro, vol. 23, pp. 44--53, January 2003.
[9]
Y. Afek, A. Bremler-Barr, and L. Schiff, "Recursive design of hardware priority queues." {Online}. Available: http://www.cs.tau.ac.il/~schiffli/PPQfull.pdf
[10]
L. Zhang, "Virtualclock: a new traffic control algorithm for packet-switched networks," ACM Transactions on Computer Systems (TOCS), vol. 9, no. 2, pp. 101--124, may 1991.
[11]
P. Goyal, H. Vin, and H. Cheng, "Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks," Networking, IEEE/ACM Transactions on, vol. 5, no. 5, pp. 690--704, oct 1997.
[12]
S. Keshav, An engineering approach to computer networking: ATM networks, the Internet, and the telephone network. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1997.
[13]
A. Kortebi, L. Muscariello, S. Oueslati, and J. Roberts, "Evaluating the number of active flows in a scheduler realizing fair statistical bandwidth sharing," in Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, ser. SIGMETRICS '05. New York, NY, USA: ACM, 2005, pp. 217--228. {Online}. Available: http://doi.acm.org/10.1145/1064212.1064237
[14]
M. Shreedhar and G. Varghese, "Efficient fair queueing using deficit round-robin," IEEE/ACM Trans. Netw., vol. 4, pp. 375--385, June 1996. {Online}. Available: http://dx.doi.org/10.1109/90.502236
[15]
H. Wang and B. Lin, "Succinct priority indexing structures for the management of large priority queues," in Quality of Service, 2009. IWQoS. 17th International Workshop on, july 2009, pp. 1--5.
[16]
X. Zhuang and S. Pande, "A scalable priority queue architecture for high speed network processing," in INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, april 2006, pp. 1--12.
[17]
G. S. Brodal, J. L. TrÃd'ff, and C. D. Zaroliagis, "A parallel priority queue with constant time operations," Journal of Parallel and Distributed Computing, vol. 49, no. 1, pp. 4 --21, 1998.
[18]
A. V. Gerbessiotis and C. J. Siniolakis, "Architecture independent parallel selection with applications to parallel priority queues," Theoretical Computer Science, vol. 301, no. 1A S3, pp. 119--142, 2003.
[19]
J. Garcia, M. March, L. Cerda, J. Corbal, and M. Valero, "On the design of hybrid dram/sram memory schemes for fast packet buffers," in High Performance Switching and Routing, 2004. HPSR. 2004 Workshop on, 2004, pp. 15--19.
[20]
H. J. Chao and B. Liu, High Performance Switches and Routers. John Wiley & Sons, Inc., 2006.
[21]
J. Patel, E. Norige, E. Torng, and A. X. Liu, "Fast regular expression matching using small tcams for network intrusion detection and prevention systems," in USENIX Security Symposium, 2010, pp. 111--126.
[22]
Packet size distribution comparison between Internet links in 1998 and 2008, CAIDA. {Online}. Available: http://www.caida.org/research/traffic-analysis/pkt_size_ distribution/graphs.xml
[23]
A. M. Ben-amram, "When can we sort in o(n log n) time"? Journal of Computer and System Sciences, vol. 54, pp. 345--370, 1997.
[24]
B. Agrawal and T. Sherwood, "Ternary cam power and delay model: Extensions and uses," IEEE Transactions on Very Large Scale Integration Systems, vol. 16, pp. 554--564, 2008.

Cited By

View all
  • (2024)A Fast Scalable Hardware Priority Queue and Optimizations for Multi-Pushes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00038(134-140)Online publication date: 27-May-2024
  • (2018)Encoding Short Ranges in TCAM Without ExpansionIEEE/ACM Transactions on Networking10.1109/TNET.2018.279769026:2(835-850)Online publication date: 1-Apr-2018
  • (2016)Encoding Short Ranges in TCAM Without ExpansionProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935769(35-46)Online publication date: 11-Jul-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
July 2013
348 pages
ISBN:9781450315722
DOI:10.1145/2486159
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. priority queue
  2. sorting
  3. tcam
  4. wfq

Qualifiers

  • Research-article

Conference

SPAA '13

Acceptance Rates

SPAA '13 Paper Acceptance Rate 31 of 130 submissions, 24%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Fast Scalable Hardware Priority Queue and Optimizations for Multi-Pushes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00038(134-140)Online publication date: 27-May-2024
  • (2018)Encoding Short Ranges in TCAM Without ExpansionIEEE/ACM Transactions on Networking10.1109/TNET.2018.279769026:2(835-850)Online publication date: 1-Apr-2018
  • (2016)Encoding Short Ranges in TCAM Without ExpansionProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935769(35-46)Online publication date: 11-Jul-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media