Skip to main content
Log in

Communication-aware Heterogeneous Multiprocessor Mapping for Real-time Streaming Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Real-time streaming signal processing systems typically desire high throughput and low latency. Many such systems can be modeled as synchronous data flow graphs. In this paper, we address the problem of multi-objective mapping of SDF graphs onto heterogeneous multiprocessor platforms, where we account for the overhead of bus-based inter-processor communication. The primary contributions include (1) an integer linear programming (ILP) model that globally optimizes throughput, latency and cost; (2) low-complexity two-stage heuristics based on a combination of an evolutionary algorithm with an ILP to generate either a single sub-optimal mapping solution or a Pareto front for design space optimization. In our simulations, the proposed heuristic shows up to 12x run-time efficiency compared to the global ILP while maintaining a 10 − 6 optimality gap in throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

Notes

  1. The ratio of weights between throughput and cost is kept approximately the same as in the global ILP.

References

  1. Alander, J. (2002). On optimal population size of genetic algorithms. In Proc. IEEE int. conf. on comp. sys. and software eng. (pp. 65–70).

  2. Aslam, N., Arslan, T., & Erdogan, A. (2007). Algorithmic level design space exploration tool for creation of highly optimized synthesizable circuits. In Proc. IEEE int. conf. on acoustics, speech and signal processing (Vol. 2).

  3. Bambha, N., Bhattacharyya, S., Teich, J., & Zitzler, E. (2004). Systematic integration of parameterized local search into evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 8(2), 137–155.

    Article  Google Scholar 

  4. Bambha, N., Kianzad, V., Khandelia, M., & Bhattacharyya, S. (2002). Intermediate representations for design automation of multiprocessor dsp systems. Design Automation for Embedded Systems, 7(4), 307–323.

    Article  MATH  Google Scholar 

  5. Bhattacharyya, S., Murthy, P., & Lee, E. (1996). Software synthesis from dataflow graphs. New York: Springer.

    Book  MATH  Google Scholar 

  6. Bonfietti, A., Benini, L., Lombardi, M., & Milano, M. (2010). An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. In Proc. IEEE conf. on design, automation and test in Europe (pp. 897–902).

  7. Glover, F. (1975). Improved linear integer programming formulations of nonlinear integer problems. Management Science, 22(4), 455–460.

    Article  MathSciNet  Google Scholar 

  8. Lee, E., & Messerschmitt, D. (1987). Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36(1), 24–35.

    Article  MATH  Google Scholar 

  9. Lin, J., Srivatsa, A., Gerstlauer, A., & Evans, B. (2011). Heterogeneous multiprocessor mapping for real-time streaming systems. In Proc. IEEE int. conf. on acoustics, speech and signal processing.

  10. Marwedel, P. (2010). Embedded system design: Embedded systems foundations of cyber-physical systems. New York: Springer.

    Google Scholar 

  11. Pino, J., Parks, T., & Lee, E. (1994). Automatic code generation for heterogeneous multiprocessors. In Proc. IEEE int. conf. on acoustics, speech and signal processing (pp. 445–448).

  12. Ruggiero, M., Guerri, A., Bertozzi, D., Poletti, F., & Milano, M. (2006). Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip. In Proc. conf. on design, automation and test in Europe (pp. 3–8). European Design and Automation Association.

  13. Sih, G. (1992). Multiprocessor scheduling to account for interprocessor communication. Berkeley, CA: University of California at Berkeley.

    Google Scholar 

  14. Sriram, S., & Lee, E. (1994). Statically sceduling communication resources in multiprocessor dsp architectures. In 1994 conference record of the twenty-eighth asilomar conference on signals, systems and computers (Vol. 2, pp. 1046–1051). IEEE.

  15. Stuijk, S., Basten, T., Geilen, M., Ghamarian, A., & Theelen, B. (2008). Resource-efficient routing and scheduling of time-constrained streaming communication on networks-on-chip. Journal of Systems Architecture, 54(3–4), 411–426.

    Article  Google Scholar 

  16. Stuijk, S., Geilen, M., & Basten, T. (2006). SDF3: SDF for free. In Proc. IEEE int. conf. on application of concurrency to system design (pp. 276–278). Available at http://www.es.ele.tue.nl/sdf3.

  17. Zhu, J., Sander, I., & Jantsch, A. (2009). Buffer minimization of real-time streaming applications scheduling on hybrid CPU/FPGA architectures. In Proc. IEEE conf. on design, automation and test in Europe (pp. 1506–1511).

  18. Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the strength Pareto evolutionary algorithm. In Eurogen (Vol. 3242).

  19. Zitzler, E., Teich, J., & Bhattacharyya, S. (2000). Evolutionary algorithms for the synthesis of embedded software. IEEE Transactions on VLSI Systems, 8(4), 452–455.

    Article  Google Scholar 

  20. Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), 257–271.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Lin.

Additional information

This research was supported by an equipment gift from Intel.

The work in this paper was presented in part at the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing.

Appendix: Scheduling ILP

Appendix: Scheduling ILP

When given a partition, the execution time of conventional SDF actors as well as the communication delays of the send and receive actors become fixed parameters. For conciseness, we slightly abuse the notation by defining parameters

$$\begin{array}{rll} D_i&=\sum\limits_j{a^*_{ij}D_{ij}}, \\ DS_k&=ipc^*_k\cdot\sum\limits_l{bs^*_{kl}DC_{kl}}, \\ DR_k&=ipc^*_k\cdot ch^*_k\cdot \sum\limits_l{br^*_{kl}DC_{kl}}, \end{array} $$
(23)

where v * denotes a specific value of a variable v. Let us define the index set \(\mathcal{T}_p=[T_{\rm max}-Period^*,T_{\rm max}]\), in which Period * is the minimum period determined by the partition. Also denote \(\mathcal{K}_s\stackrel{\triangle}{=}\{k|ipc^*_k=1\}\) and \(\mathcal{K}_r\stackrel{\triangle}{=}\{k|ipc^*_k\cdot ch^*_k=1\}\), i.e. the set of edges with an active send or an active receive actor, respectively.

The scheduling sub-problem can then be formalized as an ILP that minimizes

$$\begin{array}{rll} Latency=&\sum\limits_{t\in \mathcal{T}_p}t[s_I(t)-s_I(t-1)]+D_I \\ &-\sum\limits_{t\in\mathcal{T}_p}t[s_1(t)-s_1(t-1)]\ \\ &+(s_1(T_{\rm max})-s_I(T_{\rm max}))Period^*, \end{array} $$
(24)

subject to

Precedence

$$\begin{array}{rll} s_{src(k)}(t-D_i)P_k-ss_k(t) &\geq 0, \\[3pt] ss_k(t-DS_k)-sr_k(t) &\geq 0, \\[3pt] sr_k(t-DR_k)-s_{dst(k)}(t)C_k+O_k &\geq 0, \forall k. \end{array} $$
(25)

Resource sharing

$$\begin{array}{rll} & \sum\limits_i{[s_i(t)-s_i(t-D_i)]} \\ &\sum\limits_{k\in\mathcal{K}_s}{[ss_k(t)-ss_k(t-DS_k)]a^*_{src(k)j}} \\ &\sum\limits_{k\in\mathcal{K}_r}{[sr_k(t)-sr_k(t-DR_k)]a^*_{dst(k)j}}\leq 1, \forall j;\\ \end{array} $$
(26)
$$\begin{array}{rll} &\sum\limits_{k\in\mathcal{K}_s}{[ss_k(t)-ss_k(t-DS_k)]bs^*_{kl}} \\ &\sum\limits_{k\in\mathcal{K}_r}{[sr_k(t)-sr_k(t-DR_k)]br^*_{kl}} \leq 1, \forall l. \end{array} $$
(27)

Periodicity

$$\begin{array}{rll} &\forall i, t\in[T_{\rm max}-D_i, T_{\rm max}] \\ &\quad s_i(t)-s_i(t-Period^*) = N_i; \\ \end{array} $$
(28)
$$\begin{array}{rll} &\forall k, t\in[T_{\rm max}-DS_k, T_{\rm max}], \\ &\quad ss_k(t)-ss_k(t-Period^*) = N_{src(k)}P_k; \\ \end{array} $$
(29)
$$\begin{array}{rll} &\forall k, t\in[T_{\rm max}-DR_k, T_{\rm max}], \\ &\quad sr_k(t)-sr_k(t-Period^*) = N_{dst(k)}C_k. \end{array} $$
(30)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J., Gerstlauer, A. & Evans, B.L. Communication-aware Heterogeneous Multiprocessor Mapping for Real-time Streaming Systems. J Sign Process Syst 69, 279–291 (2012). https://doi.org/10.1007/s11265-012-0674-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-012-0674-6

Keywords

Navigation