Abstract
This paper deals with the optimization of iterative algorithms with matrix operations or nested loops for hardware implementation in Field Programmable Gate Arrays (FPGA), using Integer Linear Programming (ILP). The method is demonstrated on an implementation of the Finite Interval Constant Modulus Algorithm. It is an equalization algorithm, suitable for modern communication systems (4G and behind). For the floating-point calculations required in the algorithm, two arithmetic libraries were used in the FPGA implementation: one based on the logarithmic number system, the other using floating-point number system in the standard IEEE format. Both libraries use pipelined modules. Traditional approaches to the scheduling of nested loops lead to a relatively large code, which is unsuitable for FPGA implementation. This paper presents a new high-level synthesis methodology, which models both, iterative loops and imperfectly nested loops, by means of the system of linear inequalities. Moreover, memory access is considered as an additional resource constraint. Since the solutions of ILP formulated problems are known to be computationally intensive, an important part of the article is devoted to the reduction of the problem size.
Similar content being viewed by others
References
D. N. Godard, “Self-Recovering Equalization and Carrier Tracking in Two-Dimensional Data Communication Systems,” IEEE Trans. Commun., vol. 28, November 1980, pp. 1867–1875.
P. A. Regalia, “A Finite Interval Constant Modulus Algorithm,” in Proc. International Conference on Acoustics, Speech, and Signal Processing(ICASSP-2002), volume III, Orlando, FL, May 13–17 2002, pp. 2285–2288.
Celoxica Ltd, Platform Developer’s Kit: Pipelined Floating-point Library Manual, 2004. http://www.celoxica.com.
R. Matoušek, M. Tichý, Z. Pohl, J. Kadlec, and C. Softley, “Logarithmic Number System and Floating-Point Arithmetics on FPGA,” in Field-Programmable Logic and Applications: Reconfigurable Computing is Going Mainstream, vol. 2438 of Lecture Notes in Computer Science, M. Glesner, P. Zipf, and M. Renovell (Eds.), Springer, Berlin Heidelberg New York, 2002, pp. 627–636.
P. Šůcha and Z. Hanzálek, Optimization of Iterative Algorithms with Matrix Operations: Case Studies, Technical report, CTU FEL DCE, Prague, 2005. http://dce.felk.cvut.cz/sucha/articles/sucha05ficmaCS.pdf.
M. A. Bayoumi, G. A. Jullien, and W. C. Miller, “Hybrid VLSI Architecture of FIR Filters using Residue Number Systems,” Electron. Lett., vol. 21, no. 8, January 1985, pp. 358–359.
J. G. McWhirter, “Systolic Array for Recursive Least-Squares Minimisation,” Electron. Lett., vol. 19, no. 18, 1983, pp. 729–730.
I. K. Proudler, J. G. McWhirter, M. Moonen, and G. Hekstra, “The Formal Derivation of a Systolic Array for Recursive Least Squares Estimation,” IEEE Trans. Circuits Syst. 2: Analog Digit. Signal Process, vol. 43, no. 3, 1996, pp. 247–254.
M. Moonen, P. Van Dooren, and J. Vandewalle, “Systolic Algorithm for QSVD Updating,” Signal Process., vol. 25, no. 2, 1991, pp. 203–213.
G. Lightbody, R. Walke, R. Woods, and J. McCanny, “Parameterizable qr core,” in Asilomar Conference on Signals, Systems and Computers, Conference Record, vol. 1, 1999, pp. 120–124.
R. L. Walke and R. W. M. Smith, “20 GFLOPS QR Processor on a Xilinx Virtex-E FPGA,” in Advanced Signal Processing Algorithms, Architectures, and Implementations X, vol. 4116, F. T. Luk (Ed.), SPIE, 2000.
S. L. Sindorf and S. H. Gerez, “An Integer Linear Programming Approach to the Overlapped Scheduling of Iterative Data-Flow Graphs for Target Architectures with Communication Delays,” in PROGRESS 2000 Workshop on Embedded Systems, Utrecht, The Netherlands, 2000.
C. Hanen and A. Munier, “A Study of the Cyclic Scheduling Problem on Parallel Processors,” Discrete Appl. Math., vol. 57, February 1995, pp. 167–192.
A. Munier, “The Complexity of a Cyclic Scheduling Problem with Identical Machines,” Eur. J. Oper. Res., vol. 91, June 1996, pp. 471–480.
Dirk Fimmel and Jan Müller, “Optimal Software Pipelining Under Resource Constraints,” Int. J. Found. Comput. Sci., vol. 12, no. 6, 2001, pp. 697–718.
P. Šůcha, Z. Pohl, and Z. Hanzálek, “Scheduling of Iterative Algorithms on FPGA with Pipelined Arithmetic Unit,” in 10th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2004), Toronto, Canada, 2004.
Z. Pohl, P. Šůcha, J. Kadlec, and Z. Hanzálek, “Performance Tuning of Iterative Algorithms in Signal Processing,” in The International Conference on Field-Programmable Logic and Applications (FPL’05), Tampere, Finland, August 2005.
M. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” in PLDI ’88: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language design and Implementation, 1988, pp. 318–328.
B. R. Rau and C. D. Glaeser, “Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing,” in MICRO 14: Proceedings of the 14th Annual Workshop on Microprogramming, IEEE Press, Piscataway, NJ, USA, 1981, pp. 183–198.
S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow,” in Design, Automation and Test in Europe Conference and Exhibition (DATE’04), Paris, France, February 2004.
A. Darte and Guillaume Huard, “Loop Shifting for Loop Compaction,” Int. J. Parallel Program., vol. 28, no. 5, 2000, pp. 499–534.
S. Carr, C. Ding, and P. Sweany, “Improving Software Pipelining with Unroll-and-Jam,” in Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS’96), January 1996.
D. Petkov, R. Harr, and S. Amarasinghe, “Efficient Pipelining of Nested Loops: Unroll-and-Squash,” in 16th International Parallel and Distributed Processing Symposium (IPDPS’02), Fort Lauderdale, California, April 2002.
M. J. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley Longman, Boston, MA, USA, 1995.
N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” in Proceedings of the IEEE/ACM SC2000 Conference, Dallas, Texas, November 2000.
R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. Rau, D. Cronquist, and M. Sivaraman, “Pico-npa: High-Level Synthesis of Nonprogrammable Hardware Accelerators,” J. VLSI Signal Process., vol. 31, no. 2, 2002, pp. 127–142.
A. Heřmánek, J. Schier, and P. A. Regalia, “Architecture Design for FPGA Implementation of Finite Interval CMA,” in Proc. European Signal Processing Conference, Wiena, Austria, September 2004, pp. 2039–2042.
W. Givens, “Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form,” J. Soc. Ind. Appl. Math., vol. 6, 1958, pp. 26–50.
A. Heřmánek, Study of the next generation equalization algorithms and their implementation. PhD thesis, Université Paris XI, UFR Scientifique d’Orsay, 2005.
A. Makhorin, GLPK (GNU Linear Programming Kit) Version 4.6, 2004. http://www.gnu.org/software/glpk/.
ILOG, Inc. CPLEX Version 8.0, 2002. http://www.ilog.com/products/cplex/.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Šůcha, P., Hanzálek, Z., Heřmánek, A. et al. Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design—Implementation of Finite Interval Constant Modulus Algorithm. J VLSI Sign Process Syst Sign Image Video Technol 46, 35–53 (2007). https://doi.org/10.1007/s11265-006-0004-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-006-0004-y