skip to main content
10.1145/2830556.2830557acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

STAC-A2™ benchmark on POWER8

Published: 15 November 2015 Publication History

Abstract

The STAC-A2™ benchmark is an emerging standard designed to evaluate the speed, scalability and quality of computational platforms for performing financial risk analytics in the capital markets industry. The problem posed by the benchmark is the computation of several types of Greeks for an exotic option under an American exercise model. We recently reported record-setting performance for a STAC-A2 benchmark solution developed for an IBM® POWER8® S824 server. We explain the high performance of our solution in terms of the architecture, scalability and high memory bandwidth provided by POWER8 based systems. Developing the benchmark application also led us to investigate and perfect several techniques that are generally applicable to the simulation of complex options and their sensitivities. We describe several of these techniques in detail, along with the performance impacts we observed when compared with other approaches. We focus on two areas in particular, namely cache-efficient data management for Monte Carlo simulation of American-exercise options, and a parallel implementation of the Longstaff-Schwartz algorithm.

References

[1]
STAC-A2 results, SUT ID: INTC140814, 2014. http://www.stacresearch.com/INTC140814.
[2]
STAC-A2 results, SUT ID: INTC140815, 2014. http://www.stacresearch.com/INTC140815.
[3]
STAC-A2 results, SUT ID: NVDA141116, 2014. http://www.stacresearch.com/NVDA141116.
[4]
STAC-A2 results, SUT ID: IBM150305, 2015. http://www.stacresearch.com/IBM150305, 2015.
[5]
STAC-A2 results, SUT ID: INTC150811, 2015. http://www.stacresearch.com/INTC150811, 2015.
[6]
A. V. Adinetz et al. Performance evaluation of scientific applications on POWER8. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, volume 8966 of LNCS, pages 24--45. Springer, 2015.
[7]
L. B. Andersen. Efficient simulation of the Heston stochastic volatility model. http://ssrn.com/abstract=946405, Jan 2007.
[8]
A. Choudhury et al. Optimizations in financial engineering: The least-squares Monte Carlo method of Longstaff and Schwartz. In Proc. 2008 IEEE Intl. Par. Dist. Proc. Symp. IPDPS, pages 1--11, April 2008.
[9]
C. J. Demeure. Fast QR factorization of Vandermonde matrices. Linear Algebra and its Applications, 122/123/124:165--194, 1989.
[10]
J. Demmel et al. Communication-optimal parallel and sequential QR and LU factorizations. Technical Report UCB/EECS-2008-89, Aug 2008. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-89.html.
[11]
J. Demouth. Monte-Carlo simulation of american options with GPUs. http://on-demand.gputechconf.com/gtc/2014/presentations/S4784-monte-carlo-sim-american-options-gpus.pdf, Apr 2014.
[12]
E. Fiksman and S. Salahuddin. STAC-A2 on Intel architecture: From scalar code to heterogeneous application. In Proc. the 7th Workshop High Perf. Comp. Finance (WHPCF), pages 53--60, Nov 2014.
[13]
P. Glasserman. Monte Carlo Methods in Financial Engineering. Applications of mathematics: stochastic modelling and applied probability. Springer, 2003.
[14]
G. H. Golub and C. F. Van Loan. Matrix Computations, volume 3. JHU Press, 2012.
[15]
F. Gustavson, L. Karlsson, and B. Kågström. Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw., 38(3):17:1--17:32, Apr. 2012.
[16]
S. Heston. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6(2):327--343, 1993.
[17]
P. Lankford, L. Ericson, and A. Nikolaev. End-user driven technology benchmarks based on market-risk workloads. In High Perf. Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 1171--1175, Nov 2012.
[18]
F. Longstaff and E. Schwartz. Valuing american options by simulation: a simple least-squares approach. Review of Financial Studies, 14(1):113--147, 2001.
[19]
G. Mateescu, G. H. Bauer, and R. A. Fiedler. Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. SIGMETRICS Perform. Eval. Rev., 40(2):68--73, Oct. 2012.
[20]
A. Nikolaev, I. Burylov, and S. Salahuddin. Intel® version of STAC-A2 benchmark: toward better performance with less effort. In Proc. the 6th Workshop High Perf. Comp. Finance (WHPCF), page 7. ACM, 2013.
[21]
G. Ruetsch and P. Micikevicius. Optimizing matrix transpose in CUDA. http://docs.nvidia.com/cuda/samples/6_Advanced/transpose/doc/MatrixTranspose.pdf.
[22]
J. Salmon et al. Parallel random numbers: As easy as 1, 2, 3. In Proc. 2011 Intl. Conf. High Perf. Computing, Networking, Storage and Analysis (SC), pages 1--12, Nov 2011.
[23]
B. Sinharoy et al. IBM POWER8 processor core microarchitecture. IBM Journal of Research and Development, 59(1):2:1--2:21, Jan 2015.
[24]
W. Starke et al. The cache and memory subsystems of the IBM POWER8 processor. IBM Journal of Research and Development, 59(1):3:1--3:13, Jan 2015.

Cited By

View all
  • (2016)Performance Analysis of Spark/GraphX on POWER8 ClusterHigh Performance Computing10.1007/978-3-319-46079-6_19(268-285)Online publication date: 6-Oct-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WHPCF '15: Proceedings of the 8th Workshop on High Performance Computational Finance
November 2015
61 pages
ISBN:9781450340151
DOI:10.1145/2830556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heston model
  2. Longstaff-Schwartz
  3. OpenPOWER
  4. POWER8
  5. STAC-A2
  6. matrix transpose
  7. parallel SVD

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

WHPCF '15 Paper Acceptance Rate 8 of 10 submissions, 80%;
Overall Acceptance Rate 8 of 10 submissions, 80%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Performance Analysis of Spark/GraphX on POWER8 ClusterHigh Performance Computing10.1007/978-3-319-46079-6_19(268-285)Online publication date: 6-Oct-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media