Numerical resolution of conservation laws with OpenCL
ESAIM: PROCEEDINGS, Juillet 2013, Vol. 40, p. 51-62
C. Bourdarias, S. Gerbi, Editors
NUMERICAL RESOLUTION OF CONSERVATION LAWS WITH OPENCL
A. Crestetto 1 , P. Helluy 2 and J. Jung 3
Abstract. We present several numerical simulations of conservation laws on recent multicore processors, such as GPUs, using the OpenCL programming framework. Depending on the chosen numerical
method, different implementation strategies have to be considered, for achieving the best performance.
We explain how to program efficiently three methods: a finite volume approach on a structured grid, a
high order Discontinuous Galerkin (DG) method on an unstructured grid and a Particle-In-Cell (PIC)
method. The three methods are respectively applied to a two-fluid computation, a Maxwell simulation
and a Vlasov-Maxwell simulation.
1. Introduction
Recent parallel CPU and GPU architectures are very efficient for scientific computing. Several programming
frameworks, such as OpenMP, CUDA or OpenCL are developed for several years in order to help the implementation of classical algorithms on GPUs or multicore CPUs. The implementation is generally not so obvious
and one has to care of some aspects in order to reach efficient simulations. The first point is that, because of
the massive parallelism of some devices, the computations may become negligible compared to data memory
transfers. The data organization into memory becomes the most important point of the algorithms. The second
point is that simple operations in sequential program, such as computing an integral or a maximum on the
computational domain, become non trivial on parallel computers because they imply memory write conflicts.
Such operations have thus to be reorganized in order to perform efficiently in parallel.
In this paper we review three classical methods for solving hyperbolic systems of conservation laws: the structured Finite Volume method (FV), the high order Discontinuous Galerkin (DG) method and the Particle-In-Cell
(PIC) method. We explain how to program them efficiently on GPUs, using the OpenCL framework. We also
apply our simulations to realistic 2D test cases coming from physics
2. Multicore architectures and OpenCL
Several computer parallel architectures exist. The Single Instruction Multiple Data (SIMD) model is often
interesting in scientific computing because it allows to perform the same operation on many data in parallel.
The processors share the same instruction unit, which also allows excellent performance/watt ratio (green
computing). Recent GPU devices are of this type, with several hundreds of processors sharing a few instruction
units.
OpenCL is a programming framework for driving such devices. OpenCL is practically available since september
2009 [11]. The specification is managed by the Khronos Group, which is also responsible of the OpenGL API
design and evolutions. OpenCL means “Open Computing Language”.
Many different architectures exist, but OpenCL proposes a unified model for programing generic SIMD machines,
called “devices” in the OpenCL terminology. The generic device can be in practice a GPU, a multicore CPU or
even a computer made of several multicore CPUs. In this model, a GPU is considered as a device plugged into
a computer, called a “host”. See Figure 1. A device is made of (the typical hardware characteristics below are
given for a NVIDIA GeForce GTX 280 GPU)
• Global memory (typically 1 Gb)
1 CALVI, Inria Nancy-Grand Est & IRMA, Université de Strasbourg
2 TONUS, Inria Nancy-Grand Est & IRMA, Université de Strasbourg,
3 IRMA, Université de Strasbourg
c EDP Sciences, SMAI 2013
Article published online by EDP Sciences and available at http://www.esaim-proc.org or http://dx.doi.org/10.1051/proc/201340004
52
ESAIM: PROCEEDINGS
PE 1
Global mem.
PE 2
Local mem.
GPU
PE 3
PE 4
Local mem.
CU 1
CU 2
Host
Figure 1. A (virtual) GPU with 2 Compute Units and 4 Processing Elements
• Compute units (typically 27).
Each compute unit is made of:
• Processing elements (typically 8).
• Local (or cache) memory (typically 16 kb)
The same program (a “kernel”) can be executed on all the processing elements at the same time, with the
following rules:
• All the processing elements have access to the global memory.
• The processing elements have only access to the local memory of their compute unit.
• If several processing elements write at the same location at the same time, only one write is successful.
• The access to the global memory is slow while the access to the local memory is fast. We can achieve
faster reads/writes to the global memory if neighboring processors access neighboring memory locations.
OpenCL includes:
• A library of C functions, called from the host, in order to drive the GPU (or the multicore CPU);
• A C-like language for writing the kernels that will be executed on the processing elements.
Virtually, it allows having as many compute units and processing elements as needed. The virtual OpenCL
compute units are called “work-groups” and the virtual processing elements are called “work-items”. The workgroups and work-items are distributed on the real compute units and processing elements of the GPU thanks to
a mechanism of command queues. The user does not have to bother of that mechanism because it is managed
automatically by the OpenCL driver.
The main advantage of OpenCL is its portability. The same program can run on multicore CPUs or GPUs of
different brands. It is also possible to drive several devices at the same time for better efficiency. For instance,
in a single PC, a part of the computations can be executed on the CPU while the other part is run on one or
several GPUs. This allows very important computation power at a very low cost. Recent evolutions indicate
that in the future, the OpenCL devices will share common memory buffers. This will greatly improve the
efficiency of CPU-GPU cooperation by avoiding slow memory copies. Many resources are available on the web
for learning OpenCL. For a tutorial and simple examples, see for instance [8].
53
ESAIM: PROCEEDINGS
Liquid
post-shock
Liquid
pre-shock
Gas
r
Shock
y
x
Figure 2. Liquid-gas shock-bubble interaction test. Description of the initial conditions.
3. A structured FV approximation of a compressible two-fluid flow
3.1. Shock-bubble interaction
Our first example is devoted to solving a 2D shock-bubble interaction in a compressible medium. The bubble is
made of air inside liquid water. The liquid water is shocked at the left boundary and the planar shock wave will
impinge the bubble. The initial condition of the problem is depicted on Figure 2 with the following parameters
Lx = 2 m, Ly = 1 m, Ls = 0.04 m, X1 = 0.5 m, Y1 = 0.5 m, r = 0.4 m.
The test case is also studied in [13, 17].
We first write the mathematical model, which is a hyperbolic system of conservation laws. We consider the
vector of conservative variables W = (ρ, ρu, ρv, ρE, ρϕ)T depending on the space variable X ∈ R2 and time t,
where ρ is the densi (...truncated)