research-article

Fitting FFT onto an energy efficient massively parallel architecture

Authors:
István Lőrentz

Transylvania University of Braşov, Romania

Transylvania University of Braşov, Romania
View Profile

,
Mihaela Maliţa

Saint Anselm College, Manchester, NH

Saint Anselm College, Manchester, NH
View Profile

,
Rǎzvan Andonie

Central Washington University, Ellensburg, WA

Central Washington University, Ellensburg, WA
View Profile

IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore TechnologiesJune 2010Article No.: 8Pages 1–11https://doi.org/10.1145/1882453.1882464

Published:19 June 2010Publication History

IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies

Pages 1–11

ABSTRACT

We present novel implementations of the Fast Fourier Transform on the massively parallel Connex Array™(CA) circuit. The estimated performance is 19 GFlops (BenchFFT metric) of parallel computing 64 FFTs of size 1024, using 5 Watts. We compare the CA and NVIDIA's GTX 285 GPU performance. The CA is not a direct NVIDIA competitor, targeting a different application area. Considering its low power dissipation, the CA is a good solution for low cost mobile computing equipment, video processing, and multi-channel high-sampling audio processing.

References

R. Andonie and M. Maliţa. The Connex Array#8482; as a neural network accelerator. In CI '07: Proceedings of the Third IASTED International Conference on Computational Intelligence, pages 163--167, Anaheim, CA, USA, 2007. ACTA Press. Google ScholarDigital Library
D. J. Brown and C. Reams. Toward energy-efficient computing. Queue, 8(2):30--43, 2010. Google ScholarDigital Library
ClearSpeed Technology Ltd. ClearSpeed CSX 700. http://www.clearspeed.com/products/csx700.php.Google Scholar
J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19(90):297--301, 1965.Google ScholarCross Ref
D. E. Culler, R. M. Karp, D. Patterson, A. Sahay, E. E. Santos, K. E. Schauser, R. Subramonian, and T. von Eicken. Logp: a practical model of parallel computation. Commun. ACM, 39(11):78--85, 1996. Google ScholarDigital Library
A. G. Eleanor Chu. Inside the FFT black box: serial and parallel fast Fourier transform algorithms. CRC Press, 1999.Google Scholar
M. Frigo and S. G. Johnson. BenchFFT. http://www.fftw.org/benchfft/.Google Scholar
G. M. Gentleman and G. Sande. Fast Fourier transforms for fun and profit. In 1966 Fall Joint Computer Conference, volume 29, pages 563--578. AFIPS Proc, 1966. Google ScholarDigital Library
N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete Fourier transforms on graphics processors. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12. IEEE Press, 2008. Google ScholarDigital Library
P. Kabal and B. Sayar. Performance of fixed-point FFT's: rounding and scaling considersations. IEEE ICASSP, 1:221--224, 1986.Google Scholar
S. Kyo and S. Okazaki. IMAPCAR: A 100 GOPS in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements. Journal of Signal Processing Systems, (published online 6 November 2008). Google ScholarDigital Library
M. Maliţa, G. ştefan, and D. Thiébaut. Not multi-, but many-core: designing integral parallel architectures for embedded computation. SIGARCH Comput. Archit. News, 35(5):32--38, 2007. Google ScholarDigital Library
M. Maliţa. Vector-C library. http://www.anselm.edu/homepage/mmalita/ResearchS07/WebsiteS07/.Google Scholar
M. Maliţa and G. ştefan. Integral parallel architecture & Berkeley's Motifs. In ASAP '09: Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pages 191--194. IEEE Computer Society, 2009. Google ScholarDigital Library
K. Moreland and E. Angel. The FFT on a GPU. In HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on graphics hardware, pages 112--119, 2003. Google ScholarDigital Library
G. ştefan. The CA1024: SoC with integral parallel architecture for HDTV processing. In 4th International System-on-Chip (SoC) Conference & Exhibit, November 1-2, Radisson Hotel Newport Beach, California, 2006.Google Scholar
G. Ştefan, A. Sheel, B. Mitu, T. Thomson, and D. Tomescu. The CA1024: a fully programmable system-on-chip for cost-effective HDTV media processing. In Hot Chips: A Symposium on High Performance Chips, August 20-22, Memorial Auditorium, Stanford University, 2006.Google Scholar
NVIDIA Corporation. NVidia GeForce GTX 285. http://www.nvidia.com/object/product_geforce_gtx_285_us.html.Google Scholar
J. C. Schatzman. Accuracy of the discrete Fourier transform and the fast Fourier transform. SIAM J. Sci. Comput, 17:1150--1166, 1996. Google ScholarDigital Library
D. Thiebaut, G. Ştefan, and M. Maliţa. DNA search and the Connex technology. In Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'06), Bucharest, Romania, 2006.Google Scholar
D. Thiebaut and M. Maliţa. Fast polynomial computation on Connex Array. Technical Report 303, Smith College, November 2006.Google Scholar
D. Thiebaut and M. Maliţa. Real-time packet filtering with the Connex Array. In Proceedings of the International Conference on Complex Systems, pages 501--506, Boston, MA, 2006.Google Scholar
M. Thiebaut and G. Ştefan. Memory engine for the inspection and manipulation of data. U. S. Patent No. 6,760,821, July 2004.Google Scholar
M. Thiebaut and G. Ştefan. Ziv-Lempel compression with the Connex Engine. Tech. Rep. 077, Dept. Computer Science, Smith College, Northampton, MA, 01063, January 2002.Google Scholar
M. Thiebaut and G. ştefan. Local alignment of DNA sequences with the Connex engine. In The First Workshop on Algorithms in BioInformatics WABI 2001, BRICS Univ. of Aarus, Denmark, August 2001.Google Scholar
V. Volkov and B. Kazian. Fitting FFT onto the G80 architecture. UC Berkeley CS258 Project Report, May 2008.Google Scholar

Index Terms

Fitting FFT onto an energy efficient massively parallel architecture
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computation of transforms

Recommendations

A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

Applications that use learning and classification algorithms operate on large amounts of unstructured data, and have stringent performance constraints. For such applications, the performance of general purpose processors scales poorly with data size ...
Read More
Programming Massively Parallel Processors: A Hands-on Approach
Read More
Fine-Grained Acceleration of HMMER 3.0 via Architecture-Aware Optimization on Massively Parallel Processors
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

HMMER search used for protein Motif finding which is a probabilistic method based on profile hidden Markov models, is one of popular tools for protein homology sequence search. The current version of HMMER (version 3.0) is highly optimized for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
June 2010
91 pages
ISBN:9781450300087
DOI:10.1145/1882453
General Chairs:
Hisham El-Shishiny
World-Wide Leader of IBM Centers for Advanced Studies
,
Erven Rohou
INRIA Rennes, France
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 151
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fitting FFT onto an energy efficient massively parallel architecture

IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

Programming Massively Parallel Processors: A Hands-on Approach

Fine-Grained Acceleration of HMMER 3.0 via Architecture-Aware Optimization on Massively Parallel Processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fitting FFT onto an energy efficient massively parallel architecture

IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

Programming Massively Parallel Processors: A Hands-on Approach

Fine-Grained Acceleration of HMMER 3.0 via Architecture-Aware Optimization on Massively Parallel Processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media