Acceleration of cardiac tissue simulation with graphic processing units
Daisuke Sato
0
1
2
Yuanfang Xie
0
1
2
James N. Weiss
0
1
2
Zhilin Qu
0
1
2
Alan Garfinkel
0
1
2
Allen R. Sanderson
0
1
2
0
A. R. Sanderson Scientific Computing and Imaging Institute, University of Utah
, Salt Lake City,
UT, USA
1
A. Garfinkel (&) Cardiovascular Research Laboratory, Departments of Medicine (Cardiology), Physiological Science, David Geffen School of Medicine at UCLA
,
Los Angeles, CA, USA
2
D. Sato Y. Xie J. N. Weiss Z. Qu Cardiovascular Research Laboratory, Departments of Medicine (Cardiology), David Geffen School of Medicine at UCLA
,
Los Angeles, CA, USA
In this technical note we show the promise of using graphic processing units (GPUs) to accelerate simulations of electrical wave propagation in cardiac tissue, one of the more demanding computational problems in cardiology. We have found that the computational speed of two-dimensional (2D) tissue simulations with a single commercially available GPU is about 30 times faster than with a single 2.0 GHz Advanced Micro Devices (AMD) Opteron processor. We have also simulated wave conduction in the three-dimensional (3D) anatomic heart with GPUs where we found the computational speed with a single GPU is 1.6 times slower than with a 32-central processing unit (CPU) Opteron cluster. However, a cluster with two or four GPUs is faster than the CPU-based cluster. These results demonstrate that a commodity personal computer is able to perform a whole heart simulation of electrical wave conduction within times that enable the investigators to interact more easily with their simulations.
1 Introduction
In the last few decades, computer simulation has become an
important tool to investigate various phenomena in cardiac
biology, including studies of single ion channel properties
[9], action potentials of the myocyte [3, 5], dynamics of action
potential propagation in tissue [2], subcellular calcium
dynamics [7], etc. In spite of the advancement of
computational technology, the simulation of action potential waves in
three-dimensional (3D) cardiac tissue with a realistic
geometry is still considered as a large-scale simulation.
General-purpose computing on GPUs (GPGPU) is a
recently emerging technology [1, 4, 8], which uses GPUs,
instead of CPUs, to compute large simulations in parallel.
GPUs are massively parallel single instruction multiple data
processing units. Each GPU may contain 128240 stream
processors whereas todays CPUs contain 2, 4, or 8 cores. In
this paper, we demonstrate that the GPU is about 30*40
times faster than the CPU, enabling it to perform whole heart
electrophysiology simulations within practical time.
In this study, we chose the simulation of the propagation
of the action potential in cardiac tissue, which is modeled
as the propagation of a wave in an excitable medium.
Therefore, this technique can be applied to a number of
phenomena in physics, chemistry, and biology.
2 Methods
We used two test models. The first was a 2D homogeneous
sheet, and the second was an anatomic rabbit ventricular
model with fiber rotation [10], that is, an anisotropy that
varies from point to point in the heart. Each model was
simulated using both the GPUs and CPUs.
The GPU simulation was performed with a single
NVIDIA Geforce 8800 GT 1GB Graphic random-access
memory (RAM) and an NVIDIA Geforce 9800 GX2 1GB
Graphic RAM. These graphic cards were installed into a
system with a dual-core 2.0 GHz AMD Opteron processor
and 4GB error correction code (ECC) RAM. The operating
system is OpenSUSE 10.2. Our programs are written in
C??. We used GNU C?? compiler version 4.1.2 and
NVIDIA CUDA version 1.1.
The CPU simulation was performed with an 8-node
high performance-computing (HPC) cluster. Each node
has two dual-core 2.0 GHz AMD Opteron processors
(i.e., 4 cores in each node) and 4GB ECC RAM. The
operating system is Fedora Core 5. We used an Intel
C?? compiler 10.1. In order to parallelize on this
cluster, we used Message Passing Interface 1.0. The
FORTRAN version of this code was used in some of our
previous studies [10].
All 2D simulations, and all 3D simulations with one
GPU, were performed with the NVIDIA Geforce 8800 GT.
3D simulations with two or four GPUs were performed
with the NVIDIA Geforce 9800 GX2.
Because these GPUs support only single precision, all
floating-point calculations were done using single precision
across both GPU and CPU simulations.
The code for the GPU is called a kernel. When the
GPU kernel code is executed, it is similar to a CPU
based parallel implementation accomplished through a
series of threads, with each thread running independently
in parallel. Similar to a CPU implementation, it was
necessary to synchronize all threads after each ordinary
differential equation (ODE) or partial differential
equation (PDE) kernel execution. We can then thread these
intra-GPU as they control the processing within a single
GPU.
In addition to having to manage threads intra-GPU, it
was also necessary to have inter-GPU threads to control
each GPU. For instance, the NVIDIA Geforce 9800 GX2
graphics card has two GPUs on one card. In order to utilize
each GPU there must be a corresponding thread created
from the main program.
As with a CPU cluster with distributed memory, it is
also necessary to manage the distributed GPU memory.
However, unlike a CPU where data can be moved from
one CPU to another, GPUs can and must communicate
with the CPU memory, that is, data is transferred from
one GPU to the other GPU via the main RAM; GPU1$
RAM$GPU2.
The cardiac tissue was modeled using the following
partial differential equation:
where V is the transmembrane voltage, I is the total ionic
current, Cm is the transmembrane capacitance, and D is
the diffusion tensor. The cell model used in this study was
phase I of the LuoRudy action potential model [3]. We
solved this reaction-diffusion equation with the forward
Euler method, using the technique of operator splitting
[6]. The time step was adaptively varied between 0.01
and 0.1 ms and the space step was 0.015 cm. Details of
the modeling of cardiac tissue are described in our
previous study [10]. For each time step, the ODE part was
solved once and the PDE part was solved four times for
the 2D simulation and six times for the 3D simulation
(Fig. 1).
To test the GPU code, we induced spiral waves in 2D
and 3D tissue using cross-field stimulation, that is, two
successive perpendicular rectilinear wave fronts. In each
case, we simulated 1 s of real world cardiac time
(Fig. 2).
For the 2D tissue simulations, the benchmark protocol
involved pacing the tissue from the corner for 3 s of
simulated time at a pacing cycle length of 150 ms. Tissue size
was varied from 100 9 100 (1.5 cm 9 1.5 cm) to
800 9 800 (12 cm 9 12 cm). For the 3D tissue
simulations, the benchmark protocol consisted of pacing the
whole heart from the apex for 3 s of simulated time, at a
pacing cycle length of 150 ms.
Finally, we investigated where the computational
bottlenecks occurred. We split th (...truncated)