ABSTRACT
Finite element method (FEM) is a popular approach to solving Differential equations [5]. Among its many attractive features is its ability to handle complex geometries. The domain is discretised using simple elements whose local contributions are assembled into a global system of equations. This is in contrast to the finite difference method (FDM) which can typically only handle regular geometries. However before solution is possible the system of equations of the FEM has to be assembled, a procedure which can be significant to the computational performance of the FEM solver, particularly when coupled with highly parallel execution [3]. In this work we outline a new algorithm for achieving a highly parallel assembler routine compatible with Intel® Xeon Phi and GPU architectures. We also present performance comparison and analysis of our algorithm and the globalNZ algorithm outlined by Cecka et al. in [2], as implemented on Intel® Xeon Phi architecture and compare these to the serial implementation of Hughes [5].
- L. Buatois, G. Caumon, and B. LÅl'vy. Concurrent number cruncher: a gpu implementation of a general sparse linear solver. International Journal of Parallel, Emergent and Distributed Systems, 24(3): 205--223, 2009. Google ScholarDigital Library
- C. Cecka, A. J. Lew, and E. Darve. Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering, 85(5): 640--669, 2011.Google ScholarCross Ref
- C. Cecka, A. J. Lew, and E. Darve. GPU Computing Gems Jade Edition, chapter Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics. Applications of GPU Computing Series. Elsevier Science, 2011.Google Scholar
- R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM JOURNAL ON APPLIED MATHEMATICS, 17(2): 416--429, 1969.Google ScholarDigital Library
- T. Hughes. The finite element method: linear static and dynamic finite element analysis. Dover Civil and Mechanical Engineering Series. Dover Publications, 2000.Google Scholar
- J. Jeffers and J. Reinders. Intel Xeon Phi Coprocessor High Performance Programming. Elsevier Science, 2013. Google ScholarDigital Library
- S. Rao. The Finite Element Method in Engineering. Elsevier Science, 2010.Google Scholar
- N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 351--362, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus, gpus and intel mic architectures. Technical report, Intel Labs, 2010.Google Scholar
- E. Saule, K. Kaya, and U. V. Catalyurek. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi. ArXiv e-prints, Feb. 2013.Google Scholar
- I. Smith and D. Griffiths. Programming the Finite Element Method. Wiley, 2004. Google ScholarDigital Library
- M. Wang, H. Klie, M. Parashar, and H. Sudan. Solving sparse linear systems on nvidia tesla gpus. In G. Allen, J. Nabrzyski, E. Seidel, G. Albada, J. Dongarra, and P. Sloot, editors, Computational Science - ICCS 2009, volume 5544 of Lecture Notes in Computer Science, pages 864--873. Springer Berlin Heidelberg, 2009. Google ScholarDigital Library
Recommendations
Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to DiscoveryIn this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on ...
Developmental directions in parallel accelerators
AusPDC '14: Proceedings of the Twelfth Australasian Symposium on Parallel and Distributed Computing - Volume 152Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such ...
First results of performance comparisons on many-core processors in solving QAP with ACO: kepler GPU versus xeon PHI
GECCO Comp '14: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary ComputationThis paper compares the performance of parallel computation on two types of many-core processors, Tesla K20c GPU and Xeon Phi 5110P, in solving the quadratic assignment problem (QAP) with ant colony optimization (ACO). The results show that the ...
Comments