Abstract
We discuss hardware/software co-processing on a hybrid processor for a compute- and data-intensive multispectral imaging algorithm, k-means clustering. The experiments are performed on two models of the Altera Excalibur board, the first using the soft IP core 32-bit NIOS 1.1 RISC processor, and the second with the hard IP core ARM processor. In our experiments, we compare performance of the sequential k-means algorithm with three different accelerated versions. We consider granularity and synchronization issues when mapping an algorithm to a hybrid processor. Our results show that speedup of 11.8X is achieved by migrating computation to the Excalibur ARM hardware/software as compared to software only on a Gigahertz Pentium III. Speedup on the Excalibur NIOS is limited by the communication cost of transferring data from external memory through the processor to the customized circuits. This limitation is overcome on the Excalibur ARM, in which dual-port memories, accessible to both the processor and configurable logic, have the biggest performance impact of all the techniques studied.
Similar content being viewed by others
References
Altera Corporation. Excalibur. http://www.altera.com/products/devices/excalibur/exc-index.html, 2001.
Altera Corporation. ARM-based embedded processor device overview data sheet. http://www.altera. com/literature/lit-exc.html, Feb. 2001.
M. Estlick, M. Leeser, J. Szymanski, and J. Theiler. Algorithmic transformations in the implementation of k-means clustering on reconfigurable hardware. ACM FPGA 2001, 2001.
J. Frigo, M. Gokhale, and D. Lavenier. Evaluation of the streams-C C-to-FPGA compiler: An applications perspective. ACM FPGA 2001, 2001.
C. Funk, J. Theiler, D. A. Roberts, and C. C. Borel. Clustering to improve matched-filter detection of weak gas plumes in hyperspectral imagery. IEEE Trans. Geosci. Remote Sensing, 39:1410–1419, 2001.
M. B. Gokhale and J. M. Stone. Co-synthesis to a hybrid RISC/FPGA architecture. Journal of VLSI Signal Processing Systems, 24: March 2000.
J. R. Hauser and J. Wawrzynek. GARP: A MIPS processor with a reconfigurable coprocessor. In J. Arnold and K. L. Pocek, eds., Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, Apr. 1997.
T. Kaukoranta, P. Fränti, and O. Nevalainen. Iterative split-and-merge algorithm for vector quantization codebook generation. Opt. Eng., 37:2726–2732, 1998.
D. Lavenier. FPGA implementation of the k-means clustering algorithm for hyperspectral images. Los Alamos National Laboratory LAUR 00-3079, 2000.
M. Leeser, M. Estlick, N. Kitaryeva, J. Theiler, and J. Szymanski. Applying reconfigurable hardware to segmentation for multispectral imagery. In HPEC 2000, Boston, MA, Sept. 2000.
Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Trans. Communications, COM-28:84–95, 1980.
R. Razdan and M. D. Smith. A high-performance microarchitecture with hardware-programmable functional units. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 172–180. IEEE/ACM, Nov. 1994.
C. Rupp, et al. The Napa Adaptive Processing Architecture. FCCM 1998, Apr. 1998.
R. A. Schowengerdt. Techniques for Image Processing and Classification in Remote Sensing, Academic Press, Orlando, 1983.
D. G. Sheppard, A. Bilgin, M. S. Nadar, B. R. Hunt, and M. W. Marcellin. Vector quantizer for image restoration. IEEE Trans. Image Processing, 7:119–124, 1998.
J. Theiler, J. Frigo, M. Gokhale, and J. J. Szymanski. Co-design of software and hardware to implement remote sensing algorithms. Proc. SPIE, 4480, 2001.
B. Thiesson, C. Meek, and D. Heckerman. Accelerating EM for large databases. Technical Report MSR-TR-99-31, Microsoft Research, Microsoft Corporation, Redmond, WA 98052, 1999.
Xilinx Corporation. Virtex/powerpc. http://www.xilinx.com/prs_rls/ibmpartner.htm, 2000.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gokhale, M., Frigo, J., Mccabe, K. et al. Experience with a Hybrid Processor: K-Means Clustering. The Journal of Supercomputing 26, 131–148 (2003). https://doi.org/10.1023/A:1024495400663
Issue Date:
DOI: https://doi.org/10.1023/A:1024495400663