Abstract
This paper explores the use of domain-specific architectures for energy-efficient and flexible computing of a variety of workloads, including signal processing applications, in wearable devices. As wearable devices become more popular, and with growing consumer demands, these devices are expected to run a wide range of increasingly complex workloads. A general-purpose solution for wearable computing (e.g., microcontrollers and microprocessors) affords high flexibility, wherein a wide range of applications can be run, but offers mediocre performance and may result in high energy and area overheads. On the other end of the computing flexibility spectrum, application-specific integrated circuits (or accelerators) may optimize a specific algorithm, resulting in inflexible computing and under-utilization of computing resources. Domain-specific architectures (DSAs) provide a happy medium of computing flexibility. DSAs focus on doing a few things—i.e., satisfying the computing requirements of a set of domain workloads with execution similarities—extremely well. As such, DSAs maximize resource usage and achieve substantial performance and energy benefits for a variety of applications. In this work, we first analyze wearable workloads to identify their execution patterns, data movement characteristics, execution bottlenecks, and similarities. Thereafter, we explore various DSA design schemes to meet the increasing processing requirements of wearable workloads, within the typically stringent design constraints of wearable devices. We analyze the performance, energy, and area tradeoffs of the different DSA design schemes in comparison to multiple state-of-the-art architectures, and show, through experimental results, that DSAs offer much promise for flexible, low-overhead, and energy-efficient wearable computing.
Similar content being viewed by others
References
Park, S., Chung, K., & Jayaraman, S. (2014). Wearables: fundamentals, advancements, and a roadmap for the future. In Wearable sensors (pp. 1–23). Elsevier.
eservices report 2020 - fitness. [Online]. Available: https://www.statista.com/study/36674/fitness-report/.
Tan, C., Kulkarni, A., Venkataramani, V., Karunaratne, M., Mitra, T., & Peh, L.-S. (2017). Locus: Low-power customizable many-core architecture for wearables. ACM Transactions on Embedded Computing Systems (TECS), 17(1), 1–26.
Liu, R., & Lin, F.X. (2016). Understanding the characteristics of android wear os. In Proceedings of the 14th annual international conference on mobile systems, applications, and services (pp. 151–164).
Hennessy, J.L., & Patterson, D.A. (2019). Computer architecture: a quantitative approach.
Cordeiro, R., Gajaria, D., Limaye, A., Adegbija, T., Karimian, N., & Tehranipoor, F. (2020). Ecg-based authentication using timing-aware domain-specific architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(11), 3373–3384.
Jouppi, N.P., Young, C., Patil, N., & Patterson, D. (2018). A domain-specific architecture for deep neural networks. Communications of the ACM, 61(9), 50–59.
Jouppi, N.P., Yoon, D.H., Kurian, G., Li, S., Patil, N., Laudon, J., Young, C., & Patterson, D. (2020). A domain-specific supercomputer for training deep neural networks. Communications of the ACM, 63(7), 67–78.
Kuan, K., & Adegbija, T. (2019). Energy-efficient runtime adaptable l1 stt-ram cache design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(6), 1328–1339.
Gotlibovych, I., Crawford, S., Goyal, D., Liu, J., Kerem, Y., Benaron, D., Yilmaz, D., Marcus, G., & Li, Y. (2018). End-to-end deep learning from raw sensor data: Atrial fibrillation detection using wearables, arXiv:1807.10707.
Janarthanan, R., Doss, S., & Baskar, S. (2020). Optimized unsupervised deep learning assisted reconstructed coder in the on-nodule wearable sensor for human activity recognition. Measurement, 164, 108050.
Wiechert, G., Triff, M., Liu, Z., Yin, Z., Zhao, S., Zhong, Z., Zhaou, R., & Lingras, P. (2016). Identifying users and activities with cognitive signal processing from a wearable headband. In 2016 IEEE 15th International conference on cognitive informatics & cognitive computing (ICCI* CC) (pp. 129–136). IEEE.
Ren, Y., Xie, X., Li, G., & Wang, Z. (2016). Hand gesture recognition with multiscale weighted histogram of contour direction normalization for wearable applications. IEEE Transactions on Circuits and Systems for Video Technology, 28(2), 364–377.
Liu, Y., Jiang, F., & Gowda, M. (2020). Application informed motion signal processing for finger motion tracking using wearable sensors. In ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 8334–8338). IEEE.
Kale, N., Lee, J., Lotfian, R., & Jafari, R. (2012). Impact of sensor misplacement on dynamic time warping based human activity recognition using wearable computers. In Proceedings of the conference on wireless health (pp. 1–8).
Rong, L., Jianzhong, Z., Ming, L., & Xiangfeng, H. (2007). A wearable acceleration sensor system for gait recognition, in 2007 2nd. In IEEE conference on industrial electronics and applications (pp. 2654–2659). IEEE.
Sundararajan, D. (2011). Fundamentals of the discrete haar wavelet transform.
Majmudar, C.A., & Morshed, B.I. (2016). Autonomous oa removal in real-time from single channel eeg data on a wearable device using a hybrid algebraic-wavelet algorithm. ACM Transactions on Embedded Computing Systems (TECS), 16(1), 1–16.
Park, C., Chou, P.H., Bai, Y., Matthews, R., & Hibbs, A. (2006). An ultra-wearable, wireless, low power ecg monitoring system. In 2006 IEEE biomedical circuits and systems conference (pp. 241–244). IEEE.
Braojos, R., Mamaghanian, H., Dias, A., Ansaloni, G., Atienza, D., Rincón, F. J., & Murali, S. (2014). Ultra-low power design of wearable cardiac monitoring systems. In 2014 51st ACM/EDAC/IEEE design automation conference (DAC) (pp. 1–6). IEEE.
Dieffenderfer, J., Goodell, H., Mills, S., McKnight, M., Yao, S., Lin, F., Beppler, E., Bent, B., Lee, B., Misra, V., & eta l (2016). Low-power wearable systems for continuous monitoring of environment and health for chronic respiratory disease. IEEE Journal of Biomedical and Health Informatics, 20(5), 1251–1264.
Dogan, A.Y., Constantin, J., Ruggiero, M., Burg, A., & Atienza, D. (2012). Multi-core architecture design for ultra-low-power wearable health monitoring systems. In 2012 Design, automation & test in europe conference & exhibition (DATE), (pp 988–993). IEEE.
Ickes, N., Sinangil, Y., Pappalardo, F., Guidetti, E., & Chandrakasan, A.P. (2011). A 10 pj/cycle ultra-low-voltage 32-bit microprocessor system-on-chip. In 2011 Proceedings of the ESSCIRC (ESSCIRC) (pp. 159–162). IEEE.
Jouppi, N.P., Young, C., Patil, N., & Patterson, D. (2018). A domain-specific architecture for deep neural networks. Communications of the ACM, 61(9), 50–59.
Cong, J., Guruaj, K., Huang, M., Li, S., Xiao, B., & Zou, Y. (2011). Domain-specific processor with 3d integration for medical image processing. In ASAP 2011-22nd IEEE International conference on application-specific systems, architectures and processors (pp. 247–250). IEEE.
Di Tucci, L., Baghdadi, R., Amarasinghe, S., & Santambrogio, M.D. (2020). Salsa: a domain specific architecture for sequence alignment. In 2020 IEEE International Parallel and distributed processing symposium workshops (IPDPSW) (pp. 147–150). IEEE.
Xin, G., Han, J., Yin, T., Zhou, Y., Yang, J., Cheng, X., & Zeng, X. (2020). Vpqc: A domain-specific vector processor for post-quantum cryptography based on risc-v architecture. In IEEE transactions on circuits and systems I: regular papers.
Jain, A.K., Omidian, H., Fraisse, H., Benipal, M., Liu, L., & Gaitonde, D. (2020). A domain-specific architecture for accelerating sparse matrix vector multiplication on fpgas. In 2020 30th International conference on field-programmable logic and applications (FPL) (pp. 127–132). IEEE.
Muzaffar, S., & Elfadel, I.M. (2019). A domain-specific processor microarchitecture for energy-efficient, dynamic iot communication. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27 (9), 2074–2087.
Waheed, O.T., & Elfadel, I.A.M. (2019). Domain-specific architecture for imu array data fusion. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC) (pp. 129–134). IEEE.
Reinders, J. (2005). Vtune performance analyzer essentials. Intel Press.
Thiel, J. (2006). An overview of software performance analysis tools and techniques: From gprof to dtrace, Washington University in St. Louis, Tech. Rep.
Tanaka, H., Ota, Y., Matsumoto, N., Hieda, T., Takeuchi, Y., & Imai, M. (2010). A new compilation technique for simd code generation across basic block boundaries. In 2010 15th Asia and South pacific design automation conference (ASP-DAC) (pp. 101–106). IEEE.
Karrenberg, R. (2015). Whole-function vectorization. In Automatic SIMD vectorization of SSA-based control flow graphs (pp. 85–125). Springer.
Shahbahrami, A., Juurlink, B., & Vassiliadis, S. (2007). Simd vectorization of histogram functions. In 2007 IEEE International conf. on application-specific systems, architectures and processors (ASAP) (pp. 174–179). IEEE.
Chang, H., & Sung, W. (2008). Efficient vectorization of simd programs with non-aligned and irregular data access hardware. In Proceedings of the 2008 international conference on compilers, architectures and synthesis for embedded systems, (pp. 167–176).
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., & et al. (2011). The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2), 1–7.
Raman, S.K., Pentkovski, V., & Keshava, J. (2000). Implementing streaming simd extensions on the pentium iii processor. IEEE Micro, 20(4), 47–57.
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., & Jarvis, S.A. (2013). Exploring simd for molecular dynamics, using intel®; xeon®; processors and intel®; xeon phi coprocessors. In 2013 IEEE 27th International symposium on parallel and distributed processing (pp. 1085–1097). IEEE.
Spracklen, L., & Abraham, S.G. (2005). Chip multithreading: Opportunities and challenges. In 11th International symposium on high-performance computer architecture (pp. 248–252). IEEE.
Olszewski, M., Ansel, J., & Amarasinghe, S. (2009). Kendo: efficient deterministic multithreading in software. In Proceedings of the 14th international conference on architectural support for programming languages and operating systems (pp 97–108).
Sun, Z., Bi, X., Li, H., Wong, W.-F., Ong, Z.-L., Zhu, X., & Wu, W. (2011). Multi retention level stt-ram cache designs with a dynamic refresh scheme.
Smullen, C.W., Mohan, V., Nigam, A., Gurumurthi, S., & Stan, M.R. (2011). Relaxing non-volatility for fast and energy-efficient stt-ram caches. In 2011 IEEE 17th International symposium on high performance computer architecture (pp 50–61). IEEE.
Qiu, H., Wang, X., & Xie, F. (2017). A survey on smart wearables in the application of fitness. In 2017 IEEE 15th Intl conf on dependable, autonomic and secure computing, 15th intl conf on pervasive intelligence and computing, 3rd intl conf on big data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/CyberSciTech) (pp. 303–307). IEEE.
Duncan, R. (1990). A survey of parallel computer architectures. Computer, 23(2), 5–16.
Firasta, N., Buxton, M., Jinbo, P., Nasri, K., & Kuo, S. (2008). Intel avx: New frontiers in performance improvements and energy efficiency. Intel White Paper, 19, 20.
Reddy, V.G. (2008). Neon technology introduction. ARM Corporation, 4, 1.
Fatemi, H., Corporaal, H., Basten, T., Kleihorst, R., & Jonker, P. (2005). Designing area and performance constrained simd/vliw image processing architectures. In International conference on advanced concepts for intelligent vision systems (pp. 689–696). Springer.
Fijany, A., & Hosseini, F. (2011). Image processing applications on a low power highly parallel simd architecture. In 2011 Aerospace conference (pp. 1–12). IEEE.
Fabietti, P., Benedetti, M.M., Bronzo, F., Reboldi, G., Sarti, E., & Brunetti, P. (1991). Wearable system for acquisition, processing and storage of the signal from amperometric glucose sensors. The International Journal of Artificial Organs, 14(3), 175–178.
Yamaguchi, T., Mikami, S., Saito, M., Okada, K., & Gotouda, A. (2018). A newly developed ultraminiature wearable electromyogram system useful for analyses of masseteric activity during the whole day. Journal of Prosthodontic Research, 62(1), 110–115.
Park, E., Kim, D., & Yoo, S. (2018). Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In 2018 ACM/IEEE 45th Annual international symposium on computer architecture (ISCA) (pp 688–698). IEEE.
Lee, S.Y., & Lee, K. (2018). Factors that influence an individual’s intention to adopt a wearable healthcare device: The case of a wearable fitness tracker. Technological Forecasting and Social Change, 129, 154–163.
Oliver, N., & Flores-Mangas, F. (2006). Healthgear: a real-time wearable system for monitoring and analyzing physiological signals. In International workshop on wearable and implantable body sensor networks (BSN’06) (pp. 4–pp). IEEE.
Nakhkash, M.R., Gia, T.N., Azimi, I., Anzanpour, A., Rahmani, A.M., & Liljeberg, P. (2019). Analysis of performance and energy consumption of wearable devices and mobile gateways in iot applications. In Proceedings of the international conference on omni-layer intelligent systems, (pp. 68–73).
Coke, J.S., Bhatt, A.V., Graham, S., & Lent, D. (1998). Implementing scatter/gather operations in a direct memory access device on a personal computer, Jan. 13 1998, uS Patent 5,708,849.
Strey, A., & Bange, M. (2001). Performance analysis of intel’s mmx and sse: A case study. In European conference on parallel processing(pp. 142–147). Springer.
Limaye, A., Tumeo, A., & Adegbija, T. (2020). Energy characterization of graph workloads. Sustainable Computing: Informatics and Systems 100465.
Cherupalli, H., Duwe, H., Ye, W., Kumar, R., & Sartori, J. (2017). Enabling effective module-oblivious power gating for embedded processors. In 2017 IEEE International symposium on high performance computer architecture (HPCA) (pp. 157–168). IEEE.
A. Ltd., Arm development studio: Streamline performance analyzer. [Online]. Available: https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer.
Stephens, N., Biles, S., Boettcher, M., Eapen, J., Eyole, M., Gabrielli, G., Horsnell, M., Magklis, G., Martinez, A., Premillieu, N., & et al. (2017). The arm scalable vector extension. IEEE Micro, 37(2), 26–39.
Waterman, A.S. (2016). Design of the risc-v instruction set architecture, Ph.D. dissertation, UC Berkeley.
Dong, X., Xu, C., Xie, Y., & Jouppi, N.P. (2012). Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(7), 994–1007.
Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., & Jouppi, N.P. (2009). Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM international symposium on microarchitecture, pp. 469–480.
Feist, T. (2012). Vivado design suite. White Paper, 5, 30.
Dagum, L., & Menon, R. (1998). Openmp: an industry standard api for shared-memory programming. IEEE Computational Science and Engineering, 5(1), 46–55.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gajaria, D., Adegbija, T. Exploring Domain-Specific Architectures for Energy-Efficient Wearable Computing. J Sign Process Syst 94, 559–577 (2022). https://doi.org/10.1007/s11265-021-01682-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-021-01682-y