Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking. (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477666/pdf/

Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking.

biomimetics Article Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking Chujun Liu 1 , Andrew G. Lonsberry 1 , Mark J. Nandor 1 , Musa L. Audu 2 , Alexander J. Lonsberry 1 and Roger D. Quinn 1, * 1 2 * Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA; (C.L.); (A.G.L.); (M.J.N.); (A.J.L.) Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA; Correspondence: Received: 12 November 2018; Accepted: 11 March 2019; Published: 22 March 2019 Abstract: A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4–2.3 m tall with no changes. Keywords: biped; DDPG neural network; gait; stability 1. Introduction Spinal cord injuries (SCI) can cause paralysis, resulting in minimal motor control and rendering standing and walking impossible. Exoskeletons can help patients regain their ability to stand and walk on their own. It has been established previously that combining functional neuromuscular stimulation (FNS) with a powered, lower limb exoskeleton can restore locomotion to such individuals [1–3]. There remain many challenges in realizing such systems, given that each patient’s body is unique. One of the primary problems needing more work is the generation of adaptive control systems for stable walking and fall prevention. While much research has been invested in such control for legged robots, there have been few applications of these methods to exoskeletons. The design of algorithms for control of bipedal robot locomotion is a topic of intense research interest [4–8] and several different methods have been developed. Some of these focus on the concept of finding a zero-moment point (ZMP) about which to step. Kim and Oh [9] reported on a controller using a ZMP-based technique with feedback from inertial sensor measurements. This controller has three subcomponents: one that adjusts a pre-defined walking pattern, a second one for balance in real-time using information from sensory feedback, and a third one for motion control based on previous experience. The controller is successfully implemented on a robot that walks without the need for support and without any extensive tuning of the controller parameters. A similar method [10], Biomimetics 2019, 4, 28; doi:10.3390/biomimetics4010028 www.mdpi.com/journal/biomimetics Biomimetics 2019, 4, 28 2 of 20 again featuring a controller comprised of three modules, demonstrates bipedal stability using a ZMP method. The three modules perform body inclination control, ZMP control, and foot adjustment control. The last component is primarily invoked when the system encounters uneven terrain. The ZMP reported by Yokoi et al. [10] is computed using torque sensors at the ankles and controlled via an adjustment of the orientation of the trunk. In implementation, the robot can walk stably with a step length of 0.2 m/step and a step period of 0.8 s/step. A key difference between these two works is in how the foot position is chosen. In Kim and Oh [9], foot placement is defined as part of the desired motion pattern, while, in Yokoi et al. [10], it is computed by inverse kinematics from the desired joint angle trajectories. In this paper, based on the concept of the ZMP, a simplified, but robust, algorithm for biped locomotion is presented as the basis for control of an exoskeleton used to stabilize individuals with SCI. As has been established in robotic biped locomotion, foot placement is a critical component. Each step must be carefully planned based on feedback from the current robotic state space [11]. Choosing the next step carefully, the biped is shown to maintain its ZMP inside the support polygon as well as ensuring that the center of mass (COM) does not diverge from the ZMP [12]. This strategy, and those similar to it, depend on having a linear inverted pendulum model to find the desired ZMP and COM trajectories. In ideal situations, the inverted pendulum model can describe the real biped system well enough to predict the correct next step. However, our approach does not depend on having a known, fixed dynamics model. Instead, the model is obtained through a learning process where the data is used to train a neural network. The advantage of this approach is that it does not need any prior knowledge about the system. Furthermore, the use of a neural network is superior to a linearized dynamic model, as it can capture nonlinearities and make approximate or simplified models unnecessary [13]. In future work, the methods presented here will be applied to control the user’s muscles and powered lower limb exoskeleton based on an adaptable, reinforcement learning approach. To make the system robust for any user, the control approach must be adaptable [14]. It should thus function with limited a-priori information about the individual. To accomplish this, we employ an exploratory reinforcement learning type approach based on deep Q-networks (DQN) [15]. Reinforcement learning (RL) is a type of machine learning wherein a controller learns through trial and error. Over each trial and error episode, the controller is graded by a reward function that indicates how well it is performing [16]. The goal is to maximize the total reward, and thereby produce a controller that accomplishes some given task [17]. As control of a biped is defined over continuous state–action space, DQN is not directly applicable as it is natively applicable to discrete action space problems. A variation of DQN called deep deterministic policy gradient (DDPG) [18] is utilized here instead. Our system is composed of three separate controllers designed to operate together to produce stable walking control. One of the three controllers is a trained DDPG network and the other two consist of a conventional proportional–integral–derivative (PID) feedback controller and an open-loop controller. The use of three separa (...truncated)