Reference Hub1
Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems

Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems

Tapio Pahikkala, Antti Airola, Thomas Canhao Xu, Pasi Liljeberg, Hannu Tenhunen, Tapio Salakoski
Copyright: © 2012 |Volume: 3 |Issue: 2 |Pages: 19
ISSN: 1947-3176|EISSN: 1947-3184|EISBN13: 9781466612006|DOI: 10.4018/jertcs.2012040104
Cite Article Cite Article

MLA

Pahikkala, Tapio, et al. "Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems." IJERTCS vol.3, no.2 2012: pp.73-91. http://doi.org/10.4018/jertcs.2012040104

APA

Pahikkala, T., Airola, A., Xu, T. C., Liljeberg, P., Tenhunen, H., & Salakoski, T. (2012). Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems. International Journal of Embedded and Real-Time Communication Systems (IJERTCS), 3(2), 73-91. http://doi.org/10.4018/jertcs.2012040104

Chicago

Pahikkala, Tapio, et al. "Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems," International Journal of Embedded and Real-Time Communication Systems (IJERTCS) 3, no.2: 73-91. http://doi.org/10.4018/jertcs.2012040104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

The authors introduce a machine learning approach based on parallel online regularized least-squares learning algorithm for parallel embedded hardware platforms. The system is suitable for use in real-time adaptive systems. Firstly, the system can learn in online fashion, a property required in real-life applications of embedded machine learning systems. Secondly, to guarantee real-time response in embedded multi-core computer architectures, the learning system is parallelized and able to operate with a limited amount of computational and memory resources. Thirdly, the system can predict several labels simultaneously. The authors evaluate the performance of the algorithm from three different perspectives. The prediction performance is evaluated on a hand-written digit recognition task. The computational speed is measured from 1 thread to 4 threads, in a quad-core platform. As a promising unconventional multi-core architecture, Network-on-Chip platform is studied for the algorithm. The authors construct a NoC consisting of a 4x4 mesh. The machine learning algorithm is implemented in this platform with up to 16 threads. It is shown that the memory consumption and cache efficiency can be considerably improved by optimizing the cache behavior of the system. The authors’ results provide a guideline for designing future embedded multi-core machine learning devices.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.