Skip to main content

Advertisement

Log in

Exploring Domain-Specific Architectures for Energy-Efficient Wearable Computing

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper explores the use of domain-specific architectures for energy-efficient and flexible computing of a variety of workloads, including signal processing applications, in wearable devices. As wearable devices become more popular, and with growing consumer demands, these devices are expected to run a wide range of increasingly complex workloads. A general-purpose solution for wearable computing (e.g., microcontrollers and microprocessors) affords high flexibility, wherein a wide range of applications can be run, but offers mediocre performance and may result in high energy and area overheads. On the other end of the computing flexibility spectrum, application-specific integrated circuits (or accelerators) may optimize a specific algorithm, resulting in inflexible computing and under-utilization of computing resources. Domain-specific architectures (DSAs) provide a happy medium of computing flexibility. DSAs focus on doing a few things—i.e., satisfying the computing requirements of a set of domain workloads with execution similarities—extremely well. As such, DSAs maximize resource usage and achieve substantial performance and energy benefits for a variety of applications. In this work, we first analyze wearable workloads to identify their execution patterns, data movement characteristics, execution bottlenecks, and similarities. Thereafter, we explore various DSA design schemes to meet the increasing processing requirements of wearable workloads, within the typically stringent design constraints of wearable devices. We analyze the performance, energy, and area tradeoffs of the different DSA design schemes in comparison to multiple state-of-the-art architectures, and show, through experimental results, that DSAs offer much promise for flexible, low-overhead, and energy-efficient wearable computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  1. Park, S., Chung, K., & Jayaraman, S. (2014). Wearables: fundamentals, advancements, and a roadmap for the future. In Wearable sensors (pp. 1–23). Elsevier.

  2. eservices report 2020 - fitness. [Online]. Available: https://www.statista.com/study/36674/fitness-report/.

  3. Tan, C., Kulkarni, A., Venkataramani, V., Karunaratne, M., Mitra, T., & Peh, L.-S. (2017). Locus: Low-power customizable many-core architecture for wearables. ACM Transactions on Embedded Computing Systems (TECS), 17(1), 1–26.

    Article  Google Scholar 

  4. Liu, R., & Lin, F.X. (2016). Understanding the characteristics of android wear os. In Proceedings of the 14th annual international conference on mobile systems, applications, and services (pp. 151–164).

  5. Hennessy, J.L., & Patterson, D.A. (2019). Computer architecture: a quantitative approach.

  6. Cordeiro, R., Gajaria, D., Limaye, A., Adegbija, T., Karimian, N., & Tehranipoor, F. (2020). Ecg-based authentication using timing-aware domain-specific architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(11), 3373–3384.

    Article  Google Scholar 

  7. Jouppi, N.P., Young, C., Patil, N., & Patterson, D. (2018). A domain-specific architecture for deep neural networks. Communications of the ACM, 61(9), 50–59.

    Article  Google Scholar 

  8. Jouppi, N.P., Yoon, D.H., Kurian, G., Li, S., Patil, N., Laudon, J., Young, C., & Patterson, D. (2020). A domain-specific supercomputer for training deep neural networks. Communications of the ACM, 63(7), 67–78.

    Article  Google Scholar 

  9. Kuan, K., & Adegbija, T. (2019). Energy-efficient runtime adaptable l1 stt-ram cache design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(6), 1328–1339.

    Article  Google Scholar 

  10. Gotlibovych, I., Crawford, S., Goyal, D., Liu, J., Kerem, Y., Benaron, D., Yilmaz, D., Marcus, G., & Li, Y. (2018). End-to-end deep learning from raw sensor data: Atrial fibrillation detection using wearables, arXiv:1807.10707.

  11. Janarthanan, R., Doss, S., & Baskar, S. (2020). Optimized unsupervised deep learning assisted reconstructed coder in the on-nodule wearable sensor for human activity recognition. Measurement, 164, 108050.

    Article  Google Scholar 

  12. Wiechert, G., Triff, M., Liu, Z., Yin, Z., Zhao, S., Zhong, Z., Zhaou, R., & Lingras, P. (2016). Identifying users and activities with cognitive signal processing from a wearable headband. In 2016 IEEE 15th International conference on cognitive informatics & cognitive computing (ICCI* CC) (pp. 129–136). IEEE.

  13. Ren, Y., Xie, X., Li, G., & Wang, Z. (2016). Hand gesture recognition with multiscale weighted histogram of contour direction normalization for wearable applications. IEEE Transactions on Circuits and Systems for Video Technology, 28(2), 364–377.

    Article  Google Scholar 

  14. Liu, Y., Jiang, F., & Gowda, M. (2020). Application informed motion signal processing for finger motion tracking using wearable sensors. In ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 8334–8338). IEEE.

  15. Kale, N., Lee, J., Lotfian, R., & Jafari, R. (2012). Impact of sensor misplacement on dynamic time warping based human activity recognition using wearable computers. In Proceedings of the conference on wireless health (pp. 1–8).

  16. Rong, L., Jianzhong, Z., Ming, L., & Xiangfeng, H. (2007). A wearable acceleration sensor system for gait recognition, in 2007 2nd. In IEEE conference on industrial electronics and applications (pp. 2654–2659). IEEE.

  17. Sundararajan, D. (2011). Fundamentals of the discrete haar wavelet transform.

  18. Majmudar, C.A., & Morshed, B.I. (2016). Autonomous oa removal in real-time from single channel eeg data on a wearable device using a hybrid algebraic-wavelet algorithm. ACM Transactions on Embedded Computing Systems (TECS), 16(1), 1–16.

    Article  Google Scholar 

  19. Park, C., Chou, P.H., Bai, Y., Matthews, R., & Hibbs, A. (2006). An ultra-wearable, wireless, low power ecg monitoring system. In 2006 IEEE biomedical circuits and systems conference (pp. 241–244). IEEE.

  20. Braojos, R., Mamaghanian, H., Dias, A., Ansaloni, G., Atienza, D., Rincón, F. J., & Murali, S. (2014). Ultra-low power design of wearable cardiac monitoring systems. In 2014 51st ACM/EDAC/IEEE design automation conference (DAC) (pp. 1–6). IEEE.

  21. Dieffenderfer, J., Goodell, H., Mills, S., McKnight, M., Yao, S., Lin, F., Beppler, E., Bent, B., Lee, B., Misra, V., & eta l (2016). Low-power wearable systems for continuous monitoring of environment and health for chronic respiratory disease. IEEE Journal of Biomedical and Health Informatics, 20(5), 1251–1264.

    Article  Google Scholar 

  22. Dogan, A.Y., Constantin, J., Ruggiero, M., Burg, A., & Atienza, D. (2012). Multi-core architecture design for ultra-low-power wearable health monitoring systems. In 2012 Design, automation & test in europe conference & exhibition (DATE), (pp 988–993). IEEE.

  23. Ickes, N., Sinangil, Y., Pappalardo, F., Guidetti, E., & Chandrakasan, A.P. (2011). A 10 pj/cycle ultra-low-voltage 32-bit microprocessor system-on-chip. In 2011 Proceedings of the ESSCIRC (ESSCIRC) (pp. 159–162). IEEE.

  24. Jouppi, N.P., Young, C., Patil, N., & Patterson, D. (2018). A domain-specific architecture for deep neural networks. Communications of the ACM, 61(9), 50–59.

    Article  Google Scholar 

  25. Cong, J., Guruaj, K., Huang, M., Li, S., Xiao, B., & Zou, Y. (2011). Domain-specific processor with 3d integration for medical image processing. In ASAP 2011-22nd IEEE International conference on application-specific systems, architectures and processors (pp. 247–250). IEEE.

  26. Di Tucci, L., Baghdadi, R., Amarasinghe, S., & Santambrogio, M.D. (2020). Salsa: a domain specific architecture for sequence alignment. In 2020 IEEE International Parallel and distributed processing symposium workshops (IPDPSW) (pp. 147–150). IEEE.

  27. Xin, G., Han, J., Yin, T., Zhou, Y., Yang, J., Cheng, X., & Zeng, X. (2020). Vpqc: A domain-specific vector processor for post-quantum cryptography based on risc-v architecture. In IEEE transactions on circuits and systems I: regular papers.

  28. Jain, A.K., Omidian, H., Fraisse, H., Benipal, M., Liu, L., & Gaitonde, D. (2020). A domain-specific architecture for accelerating sparse matrix vector multiplication on fpgas. In 2020 30th International conference on field-programmable logic and applications (FPL) (pp. 127–132). IEEE.

  29. Muzaffar, S., & Elfadel, I.M. (2019). A domain-specific processor microarchitecture for energy-efficient, dynamic iot communication. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27 (9), 2074–2087.

    Article  Google Scholar 

  30. Waheed, O.T., & Elfadel, I.A.M. (2019). Domain-specific architecture for imu array data fusion. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC) (pp. 129–134). IEEE.

  31. Reinders, J. (2005). Vtune performance analyzer essentials. Intel Press.

  32. Thiel, J. (2006). An overview of software performance analysis tools and techniques: From gprof to dtrace, Washington University in St. Louis, Tech. Rep.

  33. Tanaka, H., Ota, Y., Matsumoto, N., Hieda, T., Takeuchi, Y., & Imai, M. (2010). A new compilation technique for simd code generation across basic block boundaries. In 2010 15th Asia and South pacific design automation conference (ASP-DAC) (pp. 101–106). IEEE.

  34. Karrenberg, R. (2015). Whole-function vectorization. In Automatic SIMD vectorization of SSA-based control flow graphs (pp. 85–125). Springer.

  35. Shahbahrami, A., Juurlink, B., & Vassiliadis, S. (2007). Simd vectorization of histogram functions. In 2007 IEEE International conf. on application-specific systems, architectures and processors (ASAP) (pp. 174–179). IEEE.

  36. Chang, H., & Sung, W. (2008). Efficient vectorization of simd programs with non-aligned and irregular data access hardware. In Proceedings of the 2008 international conference on compilers, architectures and synthesis for embedded systems, (pp. 167–176).

  37. Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., & et al. (2011). The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2), 1–7.

    Article  Google Scholar 

  38. Raman, S.K., Pentkovski, V., & Keshava, J. (2000). Implementing streaming simd extensions on the pentium iii processor. IEEE Micro, 20(4), 47–57.

    Article  Google Scholar 

  39. Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., & Jarvis, S.A. (2013). Exploring simd for molecular dynamics, using intel®; xeon®; processors and intel®; xeon phi coprocessors. In 2013 IEEE 27th International symposium on parallel and distributed processing (pp. 1085–1097). IEEE.

  40. Spracklen, L., & Abraham, S.G. (2005). Chip multithreading: Opportunities and challenges. In 11th International symposium on high-performance computer architecture (pp. 248–252). IEEE.

  41. Olszewski, M., Ansel, J., & Amarasinghe, S. (2009). Kendo: efficient deterministic multithreading in software. In Proceedings of the 14th international conference on architectural support for programming languages and operating systems (pp 97–108).

  42. Sun, Z., Bi, X., Li, H., Wong, W.-F., Ong, Z.-L., Zhu, X., & Wu, W. (2011). Multi retention level stt-ram cache designs with a dynamic refresh scheme.

  43. Smullen, C.W., Mohan, V., Nigam, A., Gurumurthi, S., & Stan, M.R. (2011). Relaxing non-volatility for fast and energy-efficient stt-ram caches. In 2011 IEEE 17th International symposium on high performance computer architecture (pp 50–61). IEEE.

  44. Qiu, H., Wang, X., & Xie, F. (2017). A survey on smart wearables in the application of fitness. In 2017 IEEE 15th Intl conf on dependable, autonomic and secure computing, 15th intl conf on pervasive intelligence and computing, 3rd intl conf on big data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/CyberSciTech) (pp. 303–307). IEEE.

  45. Duncan, R. (1990). A survey of parallel computer architectures. Computer, 23(2), 5–16.

    Article  Google Scholar 

  46. Firasta, N., Buxton, M., Jinbo, P., Nasri, K., & Kuo, S. (2008). Intel avx: New frontiers in performance improvements and energy efficiency. Intel White Paper, 19, 20.

    Google Scholar 

  47. Reddy, V.G. (2008). Neon technology introduction. ARM Corporation, 4, 1.

    Google Scholar 

  48. Fatemi, H., Corporaal, H., Basten, T., Kleihorst, R., & Jonker, P. (2005). Designing area and performance constrained simd/vliw image processing architectures. In International conference on advanced concepts for intelligent vision systems (pp. 689–696). Springer.

  49. Fijany, A., & Hosseini, F. (2011). Image processing applications on a low power highly parallel simd architecture. In 2011 Aerospace conference (pp. 1–12). IEEE.

  50. Fabietti, P., Benedetti, M.M., Bronzo, F., Reboldi, G., Sarti, E., & Brunetti, P. (1991). Wearable system for acquisition, processing and storage of the signal from amperometric glucose sensors. The International Journal of Artificial Organs, 14(3), 175–178.

    Article  Google Scholar 

  51. Yamaguchi, T., Mikami, S., Saito, M., Okada, K., & Gotouda, A. (2018). A newly developed ultraminiature wearable electromyogram system useful for analyses of masseteric activity during the whole day. Journal of Prosthodontic Research, 62(1), 110–115.

    Article  Google Scholar 

  52. Park, E., Kim, D., & Yoo, S. (2018). Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In 2018 ACM/IEEE 45th Annual international symposium on computer architecture (ISCA) (pp 688–698). IEEE.

  53. Lee, S.Y., & Lee, K. (2018). Factors that influence an individual’s intention to adopt a wearable healthcare device: The case of a wearable fitness tracker. Technological Forecasting and Social Change, 129, 154–163.

    Article  Google Scholar 

  54. Oliver, N., & Flores-Mangas, F. (2006). Healthgear: a real-time wearable system for monitoring and analyzing physiological signals. In International workshop on wearable and implantable body sensor networks (BSN’06) (pp. 4–pp). IEEE.

  55. Nakhkash, M.R., Gia, T.N., Azimi, I., Anzanpour, A., Rahmani, A.M., & Liljeberg, P. (2019). Analysis of performance and energy consumption of wearable devices and mobile gateways in iot applications. In Proceedings of the international conference on omni-layer intelligent systems, (pp. 68–73).

  56. Coke, J.S., Bhatt, A.V., Graham, S., & Lent, D. (1998). Implementing scatter/gather operations in a direct memory access device on a personal computer, Jan. 13 1998, uS Patent 5,708,849.

  57. Strey, A., & Bange, M. (2001). Performance analysis of intel’s mmx and sse: A case study. In European conference on parallel processing(pp. 142–147). Springer.

  58. Limaye, A., Tumeo, A., & Adegbija, T. (2020). Energy characterization of graph workloads. Sustainable Computing: Informatics and Systems 100465.

  59. Cherupalli, H., Duwe, H., Ye, W., Kumar, R., & Sartori, J. (2017). Enabling effective module-oblivious power gating for embedded processors. In 2017 IEEE International symposium on high performance computer architecture (HPCA) (pp. 157–168). IEEE.

  60. A. Ltd., Arm development studio: Streamline performance analyzer. [Online]. Available: https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer.

  61. Stephens, N., Biles, S., Boettcher, M., Eapen, J., Eyole, M., Gabrielli, G., Horsnell, M., Magklis, G., Martinez, A., Premillieu, N., & et al. (2017). The arm scalable vector extension. IEEE Micro, 37(2), 26–39.

    Article  Google Scholar 

  62. Waterman, A.S. (2016). Design of the risc-v instruction set architecture, Ph.D. dissertation, UC Berkeley.

  63. Dong, X., Xu, C., Xie, Y., & Jouppi, N.P. (2012). Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(7), 994–1007.

    Article  Google Scholar 

  64. Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., & Jouppi, N.P. (2009). Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM international symposium on microarchitecture, pp. 469–480.

  65. Feist, T. (2012). Vivado design suite. White Paper, 5, 30.

    Google Scholar 

  66. Dagum, L., & Menon, R. (1998). Openmp: an industry standard api for shared-memory programming. IEEE Computational Science and Engineering, 5(1), 46–55.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhruv Gajaria.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gajaria, D., Adegbija, T. Exploring Domain-Specific Architectures for Energy-Efficient Wearable Computing. J Sign Process Syst 94, 559–577 (2022). https://doi.org/10.1007/s11265-021-01682-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-021-01682-y

Keywords

Navigation