FEMPAR: An Object-Oriented Parallel Finite Element Framework
FEMPAR: An Object-Oriented Parallel Finite Element Framework
Santiago Badia 0 1 2 3 4 5
Alberto F. Mart´ın 0 1 2 3 4 5
0 Javier Principe
1 Alberto F. Mart ́ın
2 & Santiago Badia
3 Department of Fluid Mechanics, Universitat Polite`cnica de Catalunya , Eduard Maristany, 10-14, 08019 Barcelona , Spain
4 CIMNE Centre Internacional de Me`todes Nume`rics en Enginyeria , Parc Mediterrani de la Tecnologia, UPC, Esteve Terradas 5, 08860 Castelldefels , Spain
5 Department of Civil and Environmental Engineering, Universitat Polite`cnica de Catalunya , Jordi Girona 1-3, Edifici C1, 08034 Barcelona , Spain
ions used in the discretization module and the related geometrical module. We also provide some ingredients about the assembly of linear systems arising from finite element discretizations, but the software design of complex scalable multilevel solvers is postponed to a subsequent work.
1 Introduction
Even though the origins of the FE method trace back to the
50s, the field has drastically evolved during the last six
decades, leading to increasingly complex algorithms to
improve accuracy, stability, and performance. The use of
the p-version of the FE method and its exponential
convergence makes high-order approximations an excellent
option in many applications [
1
]. Adaptive mesh refinement
driven by a posteriori error estimates, i.e., h-adaptivity, is
an essential ingredient to reduce computational cost in an
automatic way [
2
]. For smooth solutions, p-adaptivity or
hybrid hp-adaptivity can further reduce computational cost
for a target level of accuracy [
3
]. Originally, FE methods
were restricted to nodal Lagrangian bases for structural
problems. The extension of FE methods to other
applications, like porous media flow or electromagnetism,
motivated the design of more complex bases and require
different mappings from the reference to the physical
space, complicating the implementation of these
techniques in standard FE codes. Saddle-point problems
also require particular mixed FE discretizations for stability
purposes [
4, 5
]. More recently, novel FE formulations have
been proposed within the frame of exterior calculus, e.g.,
for mixed linear elasticity problems [6].
Physics-compatible discretization are also gaining attention, e.g., in the
field of incompressible fluid mechanics. Divergence-free
mixed FEs satisfy mass conservation up to machine
precision, but their implementation is certainly challenging
[
7
]. During the last decade, a huge part of the
computational mechanics community has embraced isogeometric
analysis techniques [
8
], in which the discretization spaces
are defined in terms of NURBS (or simply splines), leading
to smoother global spaces. In the opposite direction,
discontinuous galerkin (DG) methods have also been actively
developed, and novel approaches, like hybridizable DG
and Petrov-Galerkin DG methods, have been proposed
[
9, 10
]. As the discretization methods become more and
more complex, the efficient implementation of these
techniques is more complicated. It also poses a challenge in the
design of scientific software libraries, which should be
extensible and provide a framework for the (easy)
implementation of novel techniques, to be resilient to new
algorithmic trends.
The hardware in which scientific codes run evolves even
faster. During 40 years, core performance has been steadily
increasing, as predicted by Moore’s law. In some years,
supercomputers will reach 1 exaflop/s, a dramatic
improvement in computational power that will not only
affect the extreme scale machines but radically transform
the whole range of platforms, from desktops to high
performance computing (HPC) clouds. The ability to
efficiently exploit the forthcoming 100x boost of
computational performance will have a tremendous impact
on scientific discoveries/economic benefits based on
computational science, reaching almost every field of research.
However, all the foreseen exascale growth in
computational power will be delivered by increasing hardware
parallelism (in distinct forms), and the efficient
exploitation of these resources will not be a simple task. HPC
architectures will combine general-purpose fat cores,
finegrain many-cores accelerators (GPUs, DSPs, FPGAs, Intel
MIC, etc.), and multiple-level disruptive-technology
memories, with high non-uniformity as common
denominator [
11
]. This (inevitable) trend challenges
algorithm/software design. Traditional bulk-synchronous
message passing interface (MPI) approaches are likely to
face significant performance obstacles. Significant progress
is already being made by MPI?X [
12
] (with X=OpenMP,
CUDA, OpenCL, OmpSs, Kokkos, etc.) hybrid execution
models. Going a step further, asynchronous many-task
execution models (e.g., Charm??[
13
], Legion [
14
], or
HPX [
15
]) and their supporting run-time systems hold
great promise [
16
].
Traditionally, researchers in the field of scientific
computing used to develop codes with a very (...truncated)