Numerical resolution of conservation laws with OpenCL

ESAIM: Proceedings and Surveys, Jul 2013

We present several numerical simulations of conservation laws on recent multicore processors, such as GPUs, using the OpenCL programming framework. Depending on the chosen numerical method, different implementation strategies have to be considered, for achieving the best performance. We explain how to program efficiently three methods: a finite volume approach on a structured grid, a high order Discontinuous Galerkin (DG) method on an unstructured grid and a Particle-In-Cell (PIC) method. The three methods are respectively applied to a two-fluid computation, a Maxwell simulation and a Vlasov-Maxwell simulation.

Article PDF cannot be displayed. You can download it here:

https://www.esaim-proc.org/articles/proc/pdf/2013/02/proc134004.pdf

Numerical resolution of conservation laws with OpenCL

ESAIM: PROCEEDINGS, Juillet 2013, Vol. 40, p. 51-62 C. Bourdarias, S. Gerbi, Editors NUMERICAL RESOLUTION OF CONSERVATION LAWS WITH OPENCL A. Crestetto 1 , P. Helluy 2 and J. Jung 3 Abstract. We present several numerical simulations of conservation laws on recent multicore processors, such as GPUs, using the OpenCL programming framework. Depending on the chosen numerical method, different implementation strategies have to be considered, for achieving the best performance. We explain how to program efficiently three methods: a finite volume approach on a structured grid, a high order Discontinuous Galerkin (DG) method on an unstructured grid and a Particle-In-Cell (PIC) method. The three methods are respectively applied to a two-fluid computation, a Maxwell simulation and a Vlasov-Maxwell simulation. 1. Introduction Recent parallel CPU and GPU architectures are very efficient for scientific computing. Several programming frameworks, such as OpenMP, CUDA or OpenCL are developed for several years in order to help the implementation of classical algorithms on GPUs or multicore CPUs. The implementation is generally not so obvious and one has to care of some aspects in order to reach efficient simulations. The first point is that, because of the massive parallelism of some devices, the computations may become negligible compared to data memory transfers. The data organization into memory becomes the most important point of the algorithms. The second point is that simple operations in sequential program, such as computing an integral or a maximum on the computational domain, become non trivial on parallel computers because they imply memory write conflicts. Such operations have thus to be reorganized in order to perform efficiently in parallel. In this paper we review three classical methods for solving hyperbolic systems of conservation laws: the structured Finite Volume method (FV), the high order Discontinuous Galerkin (DG) method and the Particle-In-Cell (PIC) method. We explain how to program them efficiently on GPUs, using the OpenCL framework. We also apply our simulations to realistic 2D test cases coming from physics 2. Multicore architectures and OpenCL Several computer parallel architectures exist. The Single Instruction Multiple Data (SIMD) model is often interesting in scientific computing because it allows to perform the same operation on many data in parallel. The processors share the same instruction unit, which also allows excellent performance/watt ratio (green computing). Recent GPU devices are of this type, with several hundreds of processors sharing a few instruction units. OpenCL is a programming framework for driving such devices. OpenCL is practically available since september 2009 [11]. The specification is managed by the Khronos Group, which is also responsible of the OpenGL API design and evolutions. OpenCL means “Open Computing Language”. Many different architectures exist, but OpenCL proposes a unified model for programing generic SIMD machines, called “devices” in the OpenCL terminology. The generic device can be in practice a GPU, a multicore CPU or even a computer made of several multicore CPUs. In this model, a GPU is considered as a device plugged into a computer, called a “host”. See Figure 1. A device is made of (the typical hardware characteristics below are given for a NVIDIA GeForce GTX 280 GPU) • Global memory (typically 1 Gb) 1 CALVI, Inria Nancy-Grand Est & IRMA, Université de Strasbourg 2 TONUS, Inria Nancy-Grand Est & IRMA, Université de Strasbourg, 3 IRMA, Université de Strasbourg c EDP Sciences, SMAI 2013 Article published online by EDP Sciences and available at http://www.esaim-proc.org or http://dx.doi.org/10.1051/proc/201340004 52 ESAIM: PROCEEDINGS PE 1 Global mem. PE 2 Local mem. GPU PE 3 PE 4 Local mem. CU 1 CU 2 Host Figure 1. A (virtual) GPU with 2 Compute Units and 4 Processing Elements • Compute units (typically 27). Each compute unit is made of: • Processing elements (typically 8). • Local (or cache) memory (typically 16 kb) The same program (a “kernel”) can be executed on all the processing elements at the same time, with the following rules: • All the processing elements have access to the global memory. • The processing elements have only access to the local memory of their compute unit. • If several processing elements write at the same location at the same time, only one write is successful. • The access to the global memory is slow while the access to the local memory is fast. We can achieve faster reads/writes to the global memory if neighboring processors access neighboring memory locations. OpenCL includes: • A library of C functions, called from the host, in order to drive the GPU (or the multicore CPU); • A C-like language for writing the kernels that will be executed on the processing elements. Virtually, it allows having as many compute units and processing elements as needed. The virtual OpenCL compute units are called “work-groups” and the virtual processing elements are called “work-items”. The workgroups and work-items are distributed on the real compute units and processing elements of the GPU thanks to a mechanism of command queues. The user does not have to bother of that mechanism because it is managed automatically by the OpenCL driver. The main advantage of OpenCL is its portability. The same program can run on multicore CPUs or GPUs of different brands. It is also possible to drive several devices at the same time for better efficiency. For instance, in a single PC, a part of the computations can be executed on the CPU while the other part is run on one or several GPUs. This allows very important computation power at a very low cost. Recent evolutions indicate that in the future, the OpenCL devices will share common memory buffers. This will greatly improve the efficiency of CPU-GPU cooperation by avoiding slow memory copies. Many resources are available on the web for learning OpenCL. For a tutorial and simple examples, see for instance [8]. 53 ESAIM: PROCEEDINGS Liquid post-shock Liquid pre-shock Gas r Shock y x Figure 2. Liquid-gas shock-bubble interaction test. Description of the initial conditions. 3. A structured FV approximation of a compressible two-fluid flow 3.1. Shock-bubble interaction Our first example is devoted to solving a 2D shock-bubble interaction in a compressible medium. The bubble is made of air inside liquid water. The liquid water is shocked at the left boundary and the planar shock wave will impinge the bubble. The initial condition of the problem is depicted on Figure 2 with the following parameters Lx = 2 m, Ly = 1 m, Ls = 0.04 m, X1 = 0.5 m, Y1 = 0.5 m, r = 0.4 m. The test case is also studied in [13, 17]. We first write the mathematical model, which is a hyperbolic system of conservation laws. We consider the vector of conservative variables W = (ρ, ρu, ρv, ρE, ρϕ)T depending on the space variable X ∈ R2 and time t, where ρ is the densi (...truncated)


This is a preview of a remote PDF: https://www.esaim-proc.org/articles/proc/pdf/2013/02/proc134004.pdf
Article home page: https://www.esaim-proc.org/articles/proc/abs/2013/02/proc134004/proc134004.html

A. Crestetto, P. Helluy, J. Jung. Numerical resolution of conservation laws with OpenCL, ESAIM: Proceedings and Surveys, 2013, pp. 51-62, 40, DOI: 10.1051/proc/201340004