Efficient vectorised kernels for unstructured high-order finite element fluid solvers on GPU architectures in two dimensions

Jan Eichstädt, Joaquim Peiró, David Moxey*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Downloads (Pure)


We develop efficient kernels for elemental operators of matrix-free solvers of the Helmholtz equation, which are the core operations for incompressible Navier-Stokes solvers, for use on graphics-processing units (GPUs). Our primary concern in this work is the extension of matrix-free routines to efficiently evaluate this elliptic operator on regular and curvilinear triangular elements in a tensor-product manner. We investigate two types of efficient CUDA kernels for a range of polynomial orders and thus varying arithmetic intensities: the first maps each elemental operation to a CUDA-thread for a completely vectorised kernel, whilst the second maps each element to a CUDA-block for nested parallelism. Our results show that the first option is beneficial for elements with low polynomial order, whereas the second option is beneficial for elements of higher order. The crossover point between these two schemes for the hardware used in this study corresponds to polynomial orders at around $P=4-5$, depending on element type. For both options, we highlight the importance of the layout of data structures, which necessitates the development of interleaved elemental data for vectorised kernels, and analyse the effect of selecting different memory spaces on the GPU. As the considered kernels are foremost memory-bandwidth bound, we develop kernels for curved elements that trade memory bandwidth against additional arithmetic operations, and demonstrate improved throughput in selected cases. We further compare our optimised CUDA kernels against optimised OpenACC kernels, to contrast the performance between a native and a portable programming model for GPUs.
Original languageEnglish
Publication statusAccepted/In press - 30 Nov 2022


Dive into the research topics of 'Efficient vectorised kernels for unstructured high-order finite element fluid solvers on GPU architectures in two dimensions'. Together they form a unique fingerprint.

Cite this