Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes

PLOS ONE, Jun 2017

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.

Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes

RESEARCH ARTICLE Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for twodimensional panel codes Lukas Einkemmer* Department of Mathematics, University of Innsbruck, Innsbruck, Austria * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Einkemmer L (2017) Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes. PLoS ONE 12(6): e0178156. https://doi.org/ 10.1371/journal.pone.0178156 Editor: Kay Hamacher, Technische Universitat Darmstadt, GERMANY Abstract To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation. Received: August 20, 2016 Accepted: May 8, 2017 Published: June 5, 2017 Copyright: © 2017 Lukas Einkemmer. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are part of the tables in the paper. Funding: This work is supported by the Austrian Science Fund (FWF) - project id: P25346 (http:// www.fwf.ac.at/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. 1 Introduction Numerical simulations are routinely used in applications to predict the properties of fluid flow over a solid geometry. Such applications range from the design and analysis of aircrafts to constructing more efficient wind turbines. In this context, a large number of different models and numerical methods have been developed to efficiently compute aerodynamic quantities such as lift and drag. It is generally believed that the compressible Navier—Stokes system is able to represent the physics that is encountered in such systems faithfully. However, even for moderate Reynolds numbers, turbulent motion is only dissipated at very small spatial scales. This forces an extremely fine space discretization and renders the numerical solution of the time dependent Navier—Stokes system intractable in all but a very selective class of applications (this approach is usually referred to as DNS or direct numerical simulation). Consequently a hierarchy of reduced models has been developed that are computationally more efficient. Even though methods such as RANS (Reynolds-averaged Navier—Stokes) and LES (Large eddy PLOS ONE | https://doi.org/10.1371/journal.pone.0178156 June 5, 2017 1 / 16 Intel Xeon Phi and NVIDIA K80 for panel codes simulations) are routinely employed to perform aerodynamics simulations, these simulations can still require days or even weeks to complete. In this work the goal is to develop a computer program that is able find an ideal airfoil geometry given a target function (for example, this target could be to maximize the lift-to-drag ratio). This is a nonlinear optimization problem as the geometry is the parameter under consideration. In addition, the large number of maxima found in these problems renders traditional optimization algorithms ineffective. In recent years, genetic algorithms have enjoyed some success (see, for example, [1–3]). However, their application yields a new computational challenge as they require the computation of thousands or even hundred thousands of different airfoil configuration. Consequently, even RANS or LES methods are computationally prohibitive as the inner solver in such an optimization algorithm. In this paper we will restrict our attention to low-speed aerodynamics. That is, we assume that the flow under consideration is slow compared to the speed of sound. These conditions are present in a wide range of applications (for example, unmanned aerial vehicles and wind turbines). Since the flow is slow compared to the speed of sound it is justified to neglect compressible effects. In addition, we make the assumption that the flow is irrotational. In this case the Navier—Stokes equations reduce to Laplace’s equation. One should note that a direct solution of Laplace’s equation would result in a body with zero lift. However, by imposing an additional constraint, the so-called Kutta condition, this simple model yields very accurate results in its regime of validity (even for lifting bodies such as airfoils, rotor blades, or fins). In addition, many phenomenological corrections have been developed that are able to extend the range of validity of this simplified model considerably. In principle, any numerical method can be used to solve Laplace’s equation together with the Kutta condition. However, since we are usually interested in the fluid flow outside of a solid body, so-called panel methods (or boundary element methods) have become the standard approach. The advantage of such a method is that only the boundary has to be discretized. This implies that for a two-dimensional flow only a linear system in a single dimension has to be solved (although the corresponding matrix is no longer sparse). In addition, no error is made by introducing an artificial boundary faraway from the dynamics of interest. On modern computers a good implementation is able to compute, for example, the flow over an airfoil in less than a few tens of milliseconds (although this has not always been true in the past). Especially in the early days of computational fluid dynamics, performing such simulations was the only way to obtain results in a reasonable time. As a consequence, sophisticated software packages (such as Xfoil [4]) have been developed that are still used in current aerodynamics research (see, for example, [2, 5–7]). The main advantage of panel methods is that they are computationally cheap and that fact makes them ideally suited as the inner solver in an optimization algorithm. In addition, they are (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0178156&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0178156

Lukas Einkemmer. Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes, PLOS ONE, 2017, Volume 12, Issue 6, DOI: 10.1371/journal.pone.0178156