Abstract
The search for energy efficiency in the design of embedded systems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient parallelism to keep all these CPU resources busy. Since embedded systems often consist of multiple tasks, task-level parallelism can be used for the purpose. Simultaneous multi-threading (SMT) proved a valuable technique to do so in high-performance systems, but it cannot be afforded in system with tight energy budgets. Moreover, it does not exploit data-level parallel hardware, and does not exploit the available information on threads.
We propose software-SMT (SW-SMT), a technique to exploit task-level parallelism to improve the utilization of both instruction-level and data-level parallel hardware, thereby improving performance. The technique performs simultaneous compilation of multiple threads at design-time, and it includes a run-time selection of the most efficient mixes.
We have applied the technique to two major blocks of a SDR (software-defined radio) application, achieving energy gains up to 46% on different ILP and DLP architectures. We show that the potentials of SW-SMT increase with SIMD datapath size and VLIW issue width.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Philips Research, Philips SiliconHive Avispa Accelerator, http://www.siliconhive.com
Mei, B., Vernalde, S., Verkest, D., Man, H.D., Laurereins, R.: ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In: Proc. of FPL (2003)
Lin, Y., Harel, Y., Woh, M., Baron, N., Lee, H., Mahlke, S., Mudge, T., Flautner, K.: A system solution for high-performance, low-power SDR. In: SDR Forum (2005)
Lee, H.-S., Lin, Y., Harel, Y., Woh, M., Mahlke, S.A., Mudge, T.N., Flautner, K.: Software defined radio – A high performance embedded challenge. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 6–26. Springer, Heidelberg (2005)
Berkel, K.V., Heinle, F., Meuwissen, P., Moerman, K., Weiss, M.: Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In: Proc. Software Defined Radio Tech. Conf., pp. 125–130 (2004)
Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., Smith, B.: The Tera computer system. In: Proc. Intl. Conf. on Supercomputing, pp. 1–6 (1990)
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: Proc. ISCA, pp. 392–403 (1995)
Koufaty, D., Marr, D.T.: Hyperthreading technology in the netburst microarchitecture. IEEE Micro 23(2), 56–65 (2003)
Li, Y., Brooks, D., Hu, Z., Skadron, K., Bose, P.: Understanding the energy efficiency of simultaneous multithreading. In: Proc. ISLPED, pp. 44–49 (2004)
van der Horst, M., van Berkel, K., Lukkien, J., Mak, R.: Recursive filtering on a vector DSP with linear speedup. In: Proc. ASAP, pp. 23–25 (2005)
Thoen, F., Catthoor, F.: Modeling, Verification and Exploration of Task-level Concurrency in Real-time Embedded Systems. Kluwer Academic Publishing, Dordrecht (1999)
Ma, Z., Catthoor, F., Vounckx, J.: Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms. In: Proc. ASP-DAC (2005)
Ma, Z.: Interleaved sub-task scheduling on multi-processor SoC. PhD thesis, Katholieke Universiteit Leuven (2006)
Parssinen, A.: System design for multi-standard radios. In: Proc. ISSCC (2006)
Sasanka, R.: Energy Efficient Support for All levels of Parallelism for Complex Media Applications. PhD thesis, University of Illinois at Urbana-Champaign (2005)
Hirata, H., Kimura, K., Nagamine, S., Mochizuki, Y., Nishimura, A., Nakase, Y., Nishizawa, T.: An elementary processor architecture with simultaneous instruction issuing from multiple threads. In: Proc. ISCA, pp. 136–145 (1992)
Seng, J.S., Tullsen, D.M., Cai, G.Z.: Power-sensitive multithreaded architecture. In: Proc. ICCD, pp. 199–208 (2000)
Corbal, J., Espasa, R., Valero, M.: DLP+TLP processors for the next generation of media workloads. In: Proc. HPCA, pp. 219–228 (2001)
Lo, J., Eggers, S., Emer, J., Levy, H., Stamm, R., Tullsen, D.: Converting thread-level parallelism into instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems 15(5), 322–354 (1997)
Özer, E., Conte, T.M., Sharma, S.: Weld: A multithreading technique towards latency-tolerant VLIW processors. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds.) HiPC 2001. LNCS, vol. 2228, pp. 192–203. Springer, Heidelberg (2001)
Ferreira, V.M.G., Yasuura, H.: Simultaneous multithreading vliw processor architecture. Technical report, Dept. of Computer Science and Communication Engineering, Kyushu University, Japan (2001)
Kaxiras, S., Narlikar, G., Berenbaum, A.D., Hu, Z.: Comparing power consumption of an smt and a cmp dsp for mobile phone workloads. In: Proc. CASES, pp. 211–220 (2001)
Op de Beeck, P., Barat, F., Jayapala, M., Lauwereins, R.: CRISP: A template for reconfigurable instruction set processors. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, p. 296. Springer, Heidelberg (2001)
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism (1999), http://www.trimaran.org
Cotterell, S., Vahid, F.: Synthesis of customized loop caches for core-based embedded systems. In: Proc. ICCAD (2002)
Jayapala, M., Barat, F., Aa, T.V., Catthoor, F., Corporaal, H., Deconinck, G.: Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Transactions on Computers 54(6), 672–683 (2005)
Scarpazza, D.P.: A Source-Level Estimation and Optimization Methodology for the Execution Time and Energy Consumption of Embedded Software. PhD thesis, Politecnico di Milano (May 2006), http://www.scarpaz.com/phd
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scarpazza, D.P., Raghavan, P., Novo, D., Catthoor, F., Verkest, D. (2006). Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction- and Data-Level Parallelism. In: Vounckx, J., Azemard, N., Maurine, P. (eds) Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation. PATMOS 2006. Lecture Notes in Computer Science, vol 4148. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847083_2
Download citation
DOI: https://doi.org/10.1007/11847083_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39094-7
Online ISBN: 978-3-540-39097-8
eBook Packages: Computer ScienceComputer Science (R0)