skip to main content
10.1145/1882453.1882464acmotherconferencesArticle/Chapter ViewAbstractPublication PagesifmtConference Proceedingsconference-collections
research-article

Fitting FFT onto an energy efficient massively parallel architecture

Published:19 June 2010Publication History

ABSTRACT

We present novel implementations of the Fast Fourier Transform on the massively parallel Connex Array™(CA) circuit. The estimated performance is 19 GFlops (BenchFFT metric) of parallel computing 64 FFTs of size 1024, using 5 Watts. We compare the CA and NVIDIA's GTX 285 GPU performance. The CA is not a direct NVIDIA competitor, targeting a different application area. Considering its low power dissipation, the CA is a good solution for low cost mobile computing equipment, video processing, and multi-channel high-sampling audio processing.

References

  1. R. Andonie and M. Maliţa. The Connex Array#8482; as a neural network accelerator. In CI '07: Proceedings of the Third IASTED International Conference on Computational Intelligence, pages 163--167, Anaheim, CA, USA, 2007. ACTA Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. J. Brown and C. Reams. Toward energy-efficient computing. Queue, 8(2):30--43, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ClearSpeed Technology Ltd. ClearSpeed CSX 700. http://www.clearspeed.com/products/csx700.php.Google ScholarGoogle Scholar
  4. J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19(90):297--301, 1965.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. E. Culler, R. M. Karp, D. Patterson, A. Sahay, E. E. Santos, K. E. Schauser, R. Subramonian, and T. von Eicken. Logp: a practical model of parallel computation. Commun. ACM, 39(11):78--85, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. G. Eleanor Chu. Inside the FFT black box: serial and parallel fast Fourier transform algorithms. CRC Press, 1999.Google ScholarGoogle Scholar
  7. M. Frigo and S. G. Johnson. BenchFFT. http://www.fftw.org/benchfft/.Google ScholarGoogle Scholar
  8. G. M. Gentleman and G. Sande. Fast Fourier transforms for fun and profit. In 1966 Fall Joint Computer Conference, volume 29, pages 563--578. AFIPS Proc, 1966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete Fourier transforms on graphics processors. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12. IEEE Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Kabal and B. Sayar. Performance of fixed-point FFT's: rounding and scaling considersations. IEEE ICASSP, 1:221--224, 1986.Google ScholarGoogle Scholar
  11. S. Kyo and S. Okazaki. IMAPCAR: A 100 GOPS in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements. Journal of Signal Processing Systems, (published online 6 November 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Maliţa, G. ştefan, and D. Thiébaut. Not multi-, but many-core: designing integral parallel architectures for embedded computation. SIGARCH Comput. Archit. News, 35(5):32--38, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Maliţa. Vector-C library. http://www.anselm.edu/homepage/mmalita/ResearchS07/WebsiteS07/.Google ScholarGoogle Scholar
  14. M. Maliţa and G. ştefan. Integral parallel architecture & Berkeley's Motifs. In ASAP '09: Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pages 191--194. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Moreland and E. Angel. The FFT on a GPU. In HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on graphics hardware, pages 112--119, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. ştefan. The CA1024: SoC with integral parallel architecture for HDTV processing. In 4th International System-on-Chip (SoC) Conference & Exhibit, November 1-2, Radisson Hotel Newport Beach, California, 2006.Google ScholarGoogle Scholar
  17. G. Ştefan, A. Sheel, B. Mitu, T. Thomson, and D. Tomescu. The CA1024: a fully programmable system-on-chip for cost-effective HDTV media processing. In Hot Chips: A Symposium on High Performance Chips, August 20-22, Memorial Auditorium, Stanford University, 2006.Google ScholarGoogle Scholar
  18. NVIDIA Corporation. NVidia GeForce GTX 285. http://www.nvidia.com/object/product_geforce_gtx_285_us.html.Google ScholarGoogle Scholar
  19. J. C. Schatzman. Accuracy of the discrete Fourier transform and the fast Fourier transform. SIAM J. Sci. Comput, 17:1150--1166, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Thiebaut, G. Ştefan, and M. Maliţa. DNA search and the Connex technology. In Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI'06), Bucharest, Romania, 2006.Google ScholarGoogle Scholar
  21. D. Thiebaut and M. Maliţa. Fast polynomial computation on Connex Array. Technical Report 303, Smith College, November 2006.Google ScholarGoogle Scholar
  22. D. Thiebaut and M. Maliţa. Real-time packet filtering with the Connex Array. In Proceedings of the International Conference on Complex Systems, pages 501--506, Boston, MA, 2006.Google ScholarGoogle Scholar
  23. M. Thiebaut and G. Ştefan. Memory engine for the inspection and manipulation of data. U. S. Patent No. 6,760,821, July 2004.Google ScholarGoogle Scholar
  24. M. Thiebaut and G. Ştefan. Ziv-Lempel compression with the Connex Engine. Tech. Rep. 077, Dept. Computer Science, Smith College, Northampton, MA, 01063, January 2002.Google ScholarGoogle Scholar
  25. M. Thiebaut and G. ştefan. Local alignment of DNA sequences with the Connex engine. In The First Workshop on Algorithms in BioInformatics WABI 2001, BRICS Univ. of Aarus, Denmark, August 2001.Google ScholarGoogle Scholar
  26. V. Volkov and B. Kazian. Fitting FFT onto the G80 architecture. UC Berkeley CS258 Project Report, May 2008.Google ScholarGoogle Scholar

Index Terms

  1. Fitting FFT onto an energy efficient massively parallel architecture

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
        June 2010
        91 pages
        ISBN:9781450300087
        DOI:10.1145/1882453

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 June 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader