Vision and force/torque integration for realtime estimation of fast-moving object under intermittent contacts

ROBOMECH Journal, Jul 2016

This paper considers the fast and accurate estimation of motion variables of a rigid body object whose movement occurs from intermittent contacts with coordinating manipulators in nonprehensile manipulation tasks. The estimator operates under multiple sensory data including visual, joint torque, joint position and/or tactile measurements which are combined at the lower level to compensate for the latency and the slow sampling of the visual data. The estimator is real-time in the sense that it provides the motion data of the target object at the same fast sample rate of the servo controller without delay. The basic formulation is the multi-rate Kalman filter with the contact force vector as its process input, and the visual observation as its measurement output signal which is the down-sampled and delayed version of the configuration of the target object. Experimental tests are conducted for the case of planar object manipulation as well as the non-centroidal rotation under gravity using a robotic hand, and their results are presented to demonstrate the validity of the proposed estimation scheme.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.robomechjournal.com/content/pdf/s40648-016-0054-2.pdf

Vision and force/torque integration for realtime estimation of fast-moving object under intermittent contacts

Bae et al. Robomech J Vision and force/torque integration for realtime estimation of fast-moving object under intermittent contacts Hyunki Bae 0 Soo Jeon 0 Jan P. Huissoon 0 0 Department of Mechanical and Mechatronics Engineering, University of Waterloo, 200 University Ave W , Waterloo, ON N2L 3G1 , Canada This paper considers the fast and accurate estimation of motion variables of a rigid body object whose movement occurs from intermittent contacts with coordinating manipulators in nonprehensile manipulation tasks. The estimator operates under multiple sensory data including visual, joint torque, joint position and/or tactile measurements which are combined at the lower level to compensate for the latency and the slow sampling of the visual data. The estimator is real-time in the sense that it provides the motion data of the target object at the same fast sample rate of the servo controller without delay. The basic formulation is the multi-rate Kalman filter with the contact force vector as its process input, and the visual observation as its measurement output signal which is the down-sampled and delayed version of the configuration of the target object. Experimental tests are conducted for the case of planar object manipulation as well as the non-centroidal rotation under gravity using a robotic hand, and their results are presented to demonstrate the validity of the proposed estimation scheme. Coordinated manipulation; Kalman filter; Sensors and sensing systems Background Coordinating multiple manipulators for handling an object has potential to substantially enhance dexterity and versatility, the level of which is not possible by traditional stand-alone manipulators [1–3]. Although we expect that this will bring us numerous benefits, there are a number of technical issues to be resolved before we demonstrate the practical use of coordinated manipulation with human-like dexterity. From a hardware perspective, there has been much progress in recent years in new designs of complex manipulation systems such as anthropomorphic robotic hands that are capable of highly coordinated manipulation tasks with a large number of degrees of freedom [4–7]. On the other hand, there is a consensus (see, for example, A Roadmap for US Robotics [1]) that we are still far behind in some key technology areas such as perception, robust high fidelity sensing and planning and control to achieve highly dexterous object manipulation capabilities. As for the sensing in particular, it is recognized that processing sensory data with multiple modalities (e.g. vision, joint angle, tactile and force) will play a key role in synthesizing dexterous manipulation skills [1], which is clearly the case for human dexterity as we have ample evidence from neuroscience [8, 9]. This paper considers the situation where we want to keep track of the movement of the object in realtime, while the object is being manipulated by intermittent contact forces from the coordinating manipulators. Such scenarios occur when we attempt to manipulate a rigid body object through, so called nonprehensile manipulation [10] which refers to a class of dexterous manipulation tasks that make an explicit use of the dynamic forces (e.g. inertial, gravitational, or centrifugal) to allow extra motion freedoms to the object. Typical examples of nonprehensile manipulation include pushing, throwing, caging, flipping, twirling, etc. In conventional approaches to coordinated (or dexterous) manipulation [11], the object is being held stably through force-closure condition during manipulation. The force closure condition ensures that the object and manipulating bodies are kinematically related throughout the manipulation process, and thus we can easily keep track of the movement of the object using the associated kinematic relations. On the contrary, in the nonprehensile situations, the movement of the target object cannot be inferred from the kinematic relation because the object frequently loses its contact with some or all of its manipulating bodies, making the contact force applied in an intermittent way. Therefore, an additional sensor such as a vision camera is often necessary to provide the information of object motions in nonprehensile cases. The visual data, however, may not be enough to provide all the information in a timely manner. The vision may be blocked from a certain direction (occlusion) [12]. Also, the movement data obtained from visual images are often slow and delayed (latency) [13, 14], thereby restricting the speed and the accuracy of achievable object manipulation tasks, especially when the object is moving relatively fast, as is often the case with nonprehensile tasks mentioned above. The main idea of this paper is to design a real-time rigid body estimator by augmenting the visual data with other sensory inputs available from manipulating bodies including the joint force/torque sensor or the tactile sensor mounted at the surface of the end-effector. The estimator is termed “real-time” in the sense that the visual feedback and other sensory data are updated and utilized to guide the manipulators at the same rate of their motion controller (or the servo controller), which is usually much faster than the update rate of the vision data. Therefore, the information fusion occurs at the very low level of the sensory data processing. Such a technical issue has been addressed either by some algorithmic approach (see, for example, [13] and references therein) or by sensor-based approach [14]. In this paper, we try to combine different types of sensors but make explicit use of the physical model for the rigid body dynamics requiring the determination of contact forces in detail as a necessary step to achieve a good estimation performance. Combining the vision with force/torque or tactile data has been tried by many researchers in recent years [15– 19], mainly for task-orientated (i.e. high-level) operations of robotic manipulators such as opening doors [18] or grasping a stationary object [15, 19]. Because the time scale of these tasks is much larger than that of the controller, the timing issue (i.e. the latency of the visual data) is not critical in these applications. On the other hand, the nonprehensile manipulations mentioned above often involve much faster time scale associated with the object motion. Thus the compensation of the visual latency will be important for accurate and fast estimation of the object motion, especially when we use the estimated variables for real-time control purpose. In fact, some pioneering works on nonprehensile manipulation share the motivation with our study in this respect. Specifically, the work in [20] considered a concept called the dynamics matching to close the visual feedback loop at the same rate (i.e. 1  ms) as the other sensory inputs using a specially-designed highspeed vision system. The remainder of this paper is organized as follows. “Preliminaries” section presents the mathematical description of general rigid body motion and the determination of contact forces using the contact kinematics. The main formulation of the multirate rigid body estimator is described in “Estimator design” section and their experimental results are presented in “Experimental results” section followed by the summary of the results in “Conclusions and discussions” section. Preliminaries Description of rigid body motion Figure 1 shows the schematic diagram of a rigid body in 3D space under contact forces and the gravitational force. S and B denote the spatial and the body coordinate frame, respectively. The total mass and the inertia tensor of the object, evaluated at the center of mass, are denoted by m and I , respectively, where I is a diagonal matrix with moments of inertia J1, J2 and J3, i.e. I0 = diag(J1, J2, J3) when its bases lie along the principal axes. Then, the general form of rigid body dynamics can be written as [11] MV˙ b + diag([ωb×], [ωb×])MV b = F b ( 1 ) where the superscript b indicates that the corresponding quantities are written with respect to the body coordinate frame attached to the object mass center (shown as B in Fig. 1). M = diag(mI3, I0) ∈ R6×6 is the generalized inertia matrix (with I3 ∈ R3×3 denoting the identity matrix). The notation[•×] ∈ R3×3 is the skew-symmetric matrix form of the cross product, i.e. [a×]b = a × b. The vector V b = col vb, ωb ∈ R6 is the twist form for the generalized body velocity where v ∈ R3 is the translational velocity, and ω ∈ R3 is the angular velocity. F b ∈ R6 is the resultant wrench applied at the center of mass, which can further be written with respect to those of each manipulator. Namely, denoting by np the total number of end-effectors, F b = in=p 1 Fib = in=p 1 col fib, τib where fib ∈ R3 and τib ∈ R3 (i = 1, 2, ...np) are the force and the torque vectors, respectively, for the ith contact point (denoted by pci in Fig.  1). Note that fi can be either a contact force or the gravitational force. The point of application, pci, of the gravitational force will be located at the center of mass (see Fig. 1). We consider the situation where the object is freely moving without any environmental or disturbance forces (such as friction) except for the gravity. If we describe v of Eq. ( 1 ) in the spatial coordinate frame (indicated by the superscript ’s’) and separate the dynamics into a translational part and a rotational one, we have mv˙s = fis = R np i=1 fib I0ω˙ b + ωb × I0ωb = τib. p˙ s = vs = Rvb R˙ = R[ωb×] where R ∈ SO( 3 ) denotes the rotation matrix for the configuration of the body coordinate frame. Recall that the rigid body kinematics is given by where ps is the position of the mass center (with respect to the spatial frame). To use the state-space approach, we convert Eq. (3b) into a vector form. Euler angles can be used to represent Eq. (3b) in the state space form but will introduce singularity. One singularity-free coordination is the unit quaternion; q = q0 eT T ∈ R4 where q0 ∈ R is the scalar part and e = [q1 q2 q3]T ∈ R3 is the vector part with q 2 = 1. At the cost of increased dimension, the quaternion can be used to replace Eq. (3b) with  0 −ω1b −ω2b −ω3b q0 1 ω1b 0 q˙ = Hq(ωb)q = 2 ω2b −ω3b ω3b ω2b ω3b 0 −ω1b 0 −ωω1b2b qq12  . ( 4 )  q3 The rotation matrix can be retrieved from the quaternion as R = (q02 − eT e)I3 + 2eeT + 2q0[e×] [22]. Equations (2a) (2b) (3a) (3b) ( 2 ), (3a) and ( 4 ) realize the state space representation of rigid body motion in general 3D which includes 13 state variables in total for general 3D motion. For the planar motion (i.e. 2D), the formulation can be simplified such that the number of state variables is reduced to 6 (see “Experimental results” section). In general, the dynamics of the object may behave in a little different way if the contact forces are applied in a non-smooth or impactive way [25]. However, in practice, we may not encounter such an ideal impact because of the softness (or the flexibility) of finger tips and the end-effectors. Determination of contact force Often times, the contact force fi in Fig.  1 cannot be directly measured but rather needs to be estimated using proprioceptive sensory data (e.g. joint angles, torques and/or tactile sensors) available from each manipulator. To explain specific ways of doing this, we briefly review the contact kinematics and coordinate transformation of generalized forces in this subsection. More detailed treatment for mathematical formulations of grasping and contact forces can be found in well-known textbooks such as [11] and [21]. To keep our analysis general, we consider the 3D case. Figure 2 shows the schematic diagram of the i-th manipulator (finger) contacting the object at the point pci. Di denotes the coordinate frame for the base of the i-th manipulator that is fixed relative to S. Ei is the coordinate frame for the reference end-effector that is attached to the terminal link. We will use mi to denote the number of joints in the ith manipulator. The joint space vector for the manipulator is denoted by θi = θi,1, . . . θi,mi T which is in Fig. 2 Coordinate frames and variables for rolling contact kinematics mi-torus Tmi = S1 × · · · × S1. For the sake of illustration, we depicted the robotic finger as a 3 DOF (degrees-offreedom) elbow manipulator, i.e. mi = 3, in Fig. 2. At first, the contact point pci can be computed from the joint angles of the manipulator and the tactile sensor. Associated with the contact point pci are two manifolds that describe the surface geometry of the object and the end-effector, the local coordinate charts of which are denoted by cc and ct, respectively (see Fig.  3). The local coordinate chart (e.g. cc) takes a point on the surface (e.g. (uc, vc) ∈ Uc ⊂ R2) and maps it to a point on the manifold (which is embedded in R3). For a given contact point and associated coordinate charts, the contact force can be described with respect to the contact frame which we denote by Ci here. Ci is an instantaneous coordinate system constituted by two tangent vectors (∂∂uccc and ∂vc) ∂cc and their outward normal (nc) at pci with respect to the object. See Fig.  3 for the components of Ci, which are illustrated by arrows with black triangular heads. We can also define the end-effector counterpart for the contact frame (denoted by Ti) with associated coordinates, ∂∂uctt , ∂vt ∂ct and nt which are illustrated by arrows with white triangular heads in Fig. 3. Note that Ci and Ti are opposite in normal direction and their tangential pairs are rotated by the contact angle ψ. Namely, fici ∈ R3 ( fi written about Ci) is related to fiti ∈ R3 ( fi written about Ti) as fici = −Rciti fiti , cos ψ sin ψ 0  Rciti = sin ψ − cos ψ 0  . ( 5 ) 0 0 −1 Our objective here is to represent the body wrench Fib := col fib, τib with respect to the joint torque vector denoted by ϒi = ϒi,1 · · · ϒi,mi T ∈ Rmi (e.g. ϒi,2 is the joint torque for θi,2). Note that Fib is related to fici and its c wrench form Fi i as where AdgTb−c1i ∈ R6×6 is the transpose of the adjoint transformation associated with the gb−c1i and = [I3 03×3]T is the wrench basis which encodes the force vector for frictional point contact (which is in R3) into the corresponding wrench form of the contact coordinate frame Ci (which is in R6). The adjoint transformation denoted by Ad• ∈ R6×6 (where • ∈ SE( 3 )) is a linear mapping that operates the coordinate transformation of a wrench vector between two different coordinates (from Ci to B in this case). Also, gb−c1i is the inverse of the homogeneous transformation gbci ∈ R4×4 which belongs to SE( 3 ) and maps a vector in Ci coordinate into that in B coordinate. (see [11] for more information about the homogeneous transformation and the adjoint transformation.) Similarly, fiti and its wrench form Fiti = fiti can be written with respect to the base frame Di as Fidi = AdgTtidi Fiti = AdgTd−i1ti fiti = AdgTd−i1ei Adge−it1i fi i T t where the last equality follows from the composition rule of the adjoint transformation. Finally, from the standard result of kinematic chain manipulators [11], ( 6 ) ( 7 ) ( 8 ) s ϒi = Jditi T F di i where Jdsiti : Rmi → R6 is the spatial manipulator Jacobian for the ith manipulator evaluated with Ti as the terminal frame and Di as the base frame. Combining Eqs. ( 7 ) and ( 8 ), we have fiti = Jh†i (θi, pci )ϒi = s Jdiei (θi) T AdgT−1 (θi)AdgTe−it1i (pci ) diei † ϒi ( 9 ) where Jhi ∈ Rmi×3 is called the hand Jacobian for the ith manipulator [11] and the notation •† stands for the pseudo-inverse of •. The body wrench Fib is then related to ϒi by Eqs. ( 5 ), ( 6 ) and ( 9 ). For mi ≥ 3, Jhi is either a square or a tall matrix and thus its pseudo-inverse exists uniquely as long as it is full rank. If Jhi is not full rank, the manipulator may be in a singular configuration and needs a special treatment, which is beyond the scope of this paper. From Eq. ( 9 ), it is clear that we need at least three joint torque sensors to fully determine fiti in 3D. If the tactile sensing is available to provide the contact pressure, fit,3i can be directly obtained from the tactile sensor. This allows us to use only two joint torque sensors to figure out the remaining components for tangential direction, i.e. fit,1i and fit,2i. If the manipulator is deficient in sensing, the contact force may not be fully determined in general. However, in some special cases, the contact force may be identified reasonably well even with fewer number of sensors. Manipulation by an impactive force is one example where the direction of the contact force is related to the pre-impact velocity. If the direction of the contact force vector can be known (for example by the vision sensor), we only need to find its magnitude which we may infer with just one joint torque sensor. An example of such cases is considered in the experimental tests. Estimator design In this section, we derive the rigid body estimator for the general 3D case, using the rigid body dynamics and the contact force measurement equations presented in the previous subsections. Multimodal sensory integration will also be put into perspective through the formulation of a new state estimator combining signals from the visual image with the joint torque sensor and/or the tactile data. State space description Denoting the state vectors for the translation and the rotation by ξ and η, respectively, (i.e. ξ = col(ps, vs) ∈ R6, η = col q, ωb ∈ R7), we can write Eqs. ( 2 ), (3a) and ( 4 ) in the state equation as ξ˙ = Ft ξ + Gt u(t) + w(t) η˙ = Fr (t)η + Gr (t) u(t) + w(t) where the subscripts t and r denote the translation and rotation, respectively, u(t) = col f1b(t), ..., fnbp (t) ∈ R3np is the force input and w(t) ∈ R3np represents the corresponding process noise. The system matrices are given by Ft = Fr (t) = Gr (t) = 03×3 I3 03×3 03×3 , 03×3np Gt = m1 R I3 · · · I3 Hq(ωb(t)) 03×4 04×3 − I0−1 ωb(t)× I0 , 04×3np If the i-th end-effector is not in contact with the object, both pcbi (t) and fib can simply be set to zero. Note that the translational dynamics depends on the rotation matrix R(t) to be estimated using the attitude dynamics. Similarly, the system matrix for the attitude dynamics, Fr (t), is a function of the state variable ωb and the attitude input matrix Gr is a function of pcb(t)× which needs to be estimated from the translational motion. The state dependency of these system matrices can be handled by the pseudo-linear formulation [23, 24] which simply replaces the actual state variables (i.e. ωb, R, and ωb(t)× ) in the system matrices by the estimated ones. Representing the time instant t = tk := kTs by the integer k, the discretetized form of Eq. ( 10 ) is ξ(k + 1) = t ξ(k) + Ŵt (k) u(k) + w(k) η(k + 1) = r (k)η(k) + Ŵr (k) u(k) + w(k) (12a) (12b) where the system matrices • and Ŵ• are computed from F• and G•, respectively, by the zero-order-hold (ZOH) assuming that signal values are almost constant within the base sample time, Ts. The input signal u(t) is available from joint torque sensors and/or tactile sensors with a fast sample rate Ts. On the other hand, the position and orientation of the rigid body can be measured through a stereo vision system at much slower sample rate and with latency due to the vision processing time. The corresponding output equations can be described as yt (k) = ps(k − Nv) = Ct ξ(k − Nv) + vt (k − Nv) = I3 0 ξ(k − Nv) + vt (k − Nv) yr (k) = q(k − Nv) = Cr η(k − Nv) + vr (k − Nv) (13a) (13b) (13c) (13d) = I4 04×3 η(k − Nv) + vr (k − Nv) for k = Nvℓ (ℓ = 0, 1, 2, ...) and zero otherwise, where Nv is the integer multiple of Ts that relates the vision time step Tv with the base sample time such that Tv = NvTs. Equations ( 12 ) and ( 13 ) complete the formulation of the system dynamics and the associated sensory data in the state space form. Due to the multi-rate nature of the system equations given in Eqs. ( 12 ) and ( 13 ), the estimation process consists of two parts: the prediction step at every fast sample time Ts, and the correction step at every vision sample time Tv. The key point is that the correction is operated in a retrospective way to fully recover the measurement delay coming from the delayed vision data, thereby ensuring that the estimates reflect the current (real-time) state information. In the remaining part of the paper, the bar notation, •¯ will be used to denote the values measured by the sensors while the hat notation, •ˆ, will be used to denote the estimated values. Multirate estimation combining vision and contact force As will be shown later, the measurement delay can be handled in the last stage. So we first derive the estimator without considering the output measurement delay here. Assume that we have the best estimates of ξ and η at the discrete-time step k based on the vision data at the same time step, which we denote as ξˆ (k|k) and ηˆ(k|k), respectively. Then, until the next vision data become available at the time step k + Nv, the state variables are predicted by propagating Eq. ( 12 ) using the contact force vector as an input signal ξˆ (k + i + 1|k) = t ξˆ (k + i|k) + Ŵt (k) u¯(k + i) ηˆ(k + i + 1|k) = r (k)ηˆ(k + i|k) + Ŵr (k) u¯(k + i) where i = 0, 1, ..., Nv − 1. k is the discrete-time index for the moment at which the last correction was made by the vision data. u¯(k) = u(k) + w(k) is the forces applied at the contact point(s) which can be measured by the force-torque and/or tactile sensors on fingers using the kinematic relation provided in the previous section. The sensor noise w(k) ∈ R3np is assumed independent, Gaussian distributed, and stationary with its covariance denoted by E w(k)w(k)T = W ∈ R3np×3np. Note that the state-dependent system matrices Ŵt, r and Ŵr are updated by the vision sample rate Tv in Eq. ( 14 ) (i.e. they are updated at the time step k and kept constant for i = 0, 1, ..., Nv − 1). However, it is also possible to update them by the base sample rate Ts using the predicted state variables. During the prediction step given in Eq. ( 14 ), the estimation error will be accumulated. Denote Z(k|k) as the estimation error covariance at time k right after updating by the kth image. Then, Z(k|k) propagates through the prediction process in Eq. ( 14 ) as Z•(k + i + 1|k) = •(k + i)Z•(k + i|k) •(k + i)T + Ŵ•(k + i)W Ŵ•(k + i)T where the subscript • indicates either t or r in the remainder of the paper. When the new image is obtained at time k + Nv, the Kalman filter gains can be computed for the translational and rotational motions as ( 15 ) L•(k + Nv) = Z•(k + Nv|k)C•T × C•Z•(k + Nv|k)C•T + V• −1 ( 16 ) (14a) (14b) Note that the estimated quaternion variables in ηˆ(k|k) needs an additional step for normalization (see “Description of rigid body motion” section). Finally, the covariance of the estimation error Z(k + Nv|k) is corrected for the next recursive step: Z•(k + Nv|k + Nv) = Z•(k + Nv|k) − L•(k + Nv)C•Z•(k + Nv|k). ( 18 ) Until now, we assumed the output signal is measured without any delay. However, the measurement of the vision sensor at the time step k is, in fact, obtained from the camera image taken at the time step k − Nv. Therefore, the prediction step Eq. ( 14 ) needs to be changed to ξˆ (k + i + 1|k − Nv) = ηˆ(k + i + 1|k − Nv) = t ξˆ (k + i|k − Nv) + Ŵt (k − Nv)u¯ (k + i) (19a) r ηˆ(k + i|k − Nv) + Ŵr (k − Nv)u¯ (k + i). (19b) Likewise, the propagation of the estimation error covariance given by Eq. ( 15 ) changes to Z•(k + i + 1|k − Nv) = •(k + i)Z•(k + i|k − Nv) •(k + i)T +Ŵ•(k − Nv)W Ŵ•(k − Nv)T and the filter gain in Eq. ( 16 ) also changes to L•(k + Nv) = Z•(k + Nv|k − Nv)C•T × C•Z•(k + Nv|k − Nv)C•T + V• −1 . By the time we arrive at the time step k + Nv and compute Eq. ( 21 ), we now know what values of ps(k) (= yt (k + Nv)), and q(k) (= yr (k + Nv)) should have been. This means that we can make the correction for the Nv time step prior to the current time step and compute ξˆ (k|k) and ηˆ(k|k) in retrospect. As soon as we where V• is the error covariance matrix for the output noise signal v• (vt or vr), i.e. E v•(k)v•(k)T = V•. The measurement noise v• is assumed to be stationary, independent and Gaussian distributed. Then the update equations can be written as ξˆ (k + Nv|k + Nv) = ξˆ (k + Nv|k) + Lt (k + Nv) × yt (k + Nv) − Ct ξˆ (k + Nv|k) ηˆ(k + Nv|k + Nv) = ηˆ(k + Nv|k) + Lr (k + Nv) × yr (k + Nv) − Cr ηˆ(k + Nv|k) . (17a) (17b) ( 20 ) ( 21 ) compute ξˆ (k|k) and ηˆ(k|k), we also know what values of ξˆ (k + i + 1|k) and ηˆ(k + i + 1|k) should have been for i = 0, 1, ..., Nv − 1, and can compute them all at once. All these estimates are for the past and thus useless except for the last pair, i.e. ξˆ (k + Nv|k) and ηˆ(k + Nv|k). So we should take them as the estimate for the current time step, k + Nv. This process repeats for the next batch of the prediction steps. In summary, the correction step shown in the previous subsection should be modified into the following set of procedures. It applies to both translation and rotation in the same way so we explain here with the translational motion only. 1. As soon as we arrive at the time step k + Nv, compute the Kalman filter gain for the time step k, i.e. Lt (k) using Eq. ( 21 ). Then, using the delayed vision data yt (k + Nv) = ps(k), make a correction for the time step k, i.e. ξˆ (k|k), by Eq. (17a). At the same time, make a correction to the estimation error covariance in a similar way as Eq. ( 18 ) but with the twice of the vision step as the update interval, i.e. ( 22 ) Zt (k|k) = Zt (k|k − 2Nv) − Lt (k)Ct Zt (k|k − 2Nv). The reason for doubling the update interval will become clear at the end of step 3). 2. Compute ξˆ (k + 1|k), ..., ξˆ (k + Nv|k) using the prediction equation Eq. (14a), after which we take only ξˆ (k + Nv|k) as the estimate for the current time step k + Nv. Similarly, compute the propagation of the estimation error covariance Zt (k + 1|k), ..., Zt (k + Nv|k) in the same way as Eq. ( 15 ). This completes the retrospective correction for the time step k + Nv. 3. Next, we progress for the subsequent time steps k + Nv + 1, ..., k + 2Nv in a similar way as Eq. (19a). At the same time, we continue computing the propagation of the estimation error covariance using Eq. ( 20 ), after which we have Zt (k + Nv|k − Nv). As a result, the update process operates by the estimation error accumulated during 2Nv time steps. 4. When we arrive at the next vision time step k + 2Nv, now we have yt (k + 2Nv) = ps(k + Nv) and can compute the Kalman filter gain for the time step k + Nv, i.e. Lt (k + Nv) using Eq. ( 21 ). Now, we are back to step 1) so can repeat the process. One important point to be made here is that the integrity of the proposed estimator strongly relies on the accuracy of the contact force vector as well as the contact location. As discussed in “Determination of contact force” section, the accuracy of the contact force can be improved by redundant sensing. Similarly, the information on the contact point may be refined by the use of high-speed, high-fidelity tactile sensor arrays. Experimental results Testbed configuration We conducted two experimental tests for the proof-ofconcept of the rigid body estimation explained in the previous section. In the first experiment, the setup is configured as shown in Fig.  4 considering the planar 2-D motion of a target object. A square object is made of aluminum blocks and is placed on the air table such that it moves on the horizontal plane without any gravity and friction. A robot hand (BHand), manufactured by Barrett Technologies Inc., is mounted to the side of the table so that the robotic fingers can manipulate the object for its planar motion. A vision camera is mounted above to take images of the object while it is moving. Figure 5 shows the sample view from the camera, which indicates, with added labels, the index number for each finger and the locations of some reference frames that we defined in “Determination of contact force” section. The body coordinate frame B is located at the center of the square object. The spatial (inertial) frame is attached to the center of the palm of the BHand and D1 (the base frame for finger 1) is placed at the inner joint (first joint) of the finger 1. The specifications of vision sensor is shown in Table  1. The acquisition timing of the image is precisely Fig. 4 Configuration of testbed for the case with the object moving on the horizontal plane controlled by the external trigger signal coming from the real-time machine. The image processing is based on the simple pattern matching which provides the location of the object center about the spatial frame (i.e. ps) as well as the rotation angle (denoted by φ in this section) of the body frame with respect to the spatial one. During the experiment, the BHand is manipulated by an operator who controls a set of potentiometers each of which corresponds to the desired position of the first joint of each finger. Each finger takes this desired position as a reference command and runs the PID controller. The square object is then manipulated through contacts with the fingers. As shown in Fig. 5, each finger of the BHand has its configuration of a two link manipulator. However, the second (or the outer) joint is kinematically linked to the first one so only one reference signal is used to control the whole finger. In the remaining part of the paper, F1, F2 and F3 will denote the finger 1, the finger 2 and the finger 3, respectively. In order to demonstrate the performance of the proposed estimator under the effect of gravitational force, we conducted another experiment as shown in Fig. 6 where the gravitational force is denoted by the downward arrow labeled with mg. The BHand (and the camera) is arranged in 90o rotation from to the previous configuration. So the back plane of the wrist of the BHand is mounted on the horizontal table. Initially, the object is stably supported by the F3 and one of its corners against gravity as shown in Fig.  6. Then, the object is moved by the action of F3 making a non-centroidal rotation about its corner at the bottom until it gets caught by the F1 (and F2). Estimator for 2D motion The movement of the object is constrained to the 2D plane in both configurations of the experimental tests mentioned above. In such 2-D motions, the rotation has only one degree of freedom with no distinction between the spatial frame and the body frame (i.e. ωb = ωs = ω ∈ R) and thus Eq. (2b) is simplified to a linear form: I0ω˙ = τi. The dimension of the state space equation in Eq. ( 10 ) also reduces down to 6 from 13. The surface of the finger has a geometric shape as shown in Fig.  7a. We took F1 as an example here but other fingers have the exact same geometry. The allowed contact area is indicated by the grey line on the surface of the finger. The grey-colored section of the finger consists of two arcs which have different radii and center points. The first section has a radius 105  mm with its centre point located at (0, −95.5) written with respect to the end-effector frame E1 shown in Fig. 7a. On the other hand, the second section has a radius 8 mm with its centre point located at 97 sin 1π8 , 0 written about E1. (Note that 97 sin 1π8 ≈ 95.5 tan 1π8 ≈ 16.84). Therefore, the coordinate chart ct : Ut ∈ R → R2 for the grey-colored section of the finger can be written as Fig. 6 A view from the vision camera for the case with the object moving under gravity a b Fig. 7 Contact geometry and coordinate frames for contact force. a Geometry of contact area, b contact force configuration ct = (105 sin ut , 105 cos ut − 95.5), 8 sin ut + 97 sin 1π8 , 8 cos ut , 0 ≤ ut < 1π8 1π8 ≤ ut < π2 ( 23 ) where the (x, y) coordinate values are written in [mm] with respect to the end effector frame E1. Note that ct parameterizes the regular surface geometry including the point ut = 10o = 1π8. Defining ct as in Eq. ( 23 ) also implies that T1, i.e. the contact frame for the finger can be configured as shown in Fig. 7a. Since the surface of the object aligns with x and y axes of the body coordinate frame, the coordinate chart for the object cc can simply be taken as a mapping from one of these axes to the contact point represented by the body coordinate frame. Similarly, the contact frame C1 for the object is configured such that its y-axis is the same as the x-axis of B for the situation shown in Fig.  7b. Figure  7b also illustrates how the contact force f1 is represented with respect to the contact frame T1 for the finger. More t specifically, f1,11 denotes the component of f1 along the t x-axis (or, equivalently, the direction of ∂∂uctt ) of T1 and f1,12 corresponds to its normal component. Using the coordinate frames shown in Fig.  7, we can compute the spatial manipulator Jacobian Jds1e1 as s Jd1e1 (θ1) = 0 0 0 0 0 1 T ℓ1s1 −ℓ1c1 0 0 0 1 ( 24 ) where ℓ1 denotes the length of the inner link, c1 := cos θ1,1 and s1 := sin θ1,1. After going through some tedious but straightforward computation of associated adjoint matrices, we can compute the hand Jacobian for F1 as where ℓ2 is the length from the outer link joint axis to the centre of E1. ct,1 and ct,2 denote the first and the second element of ct in Eq. ( 23 ), respectively. Similarly to the notation we used in Eq. ( 24 ), “c” and “s” denote the cosine and the sine functions, respectively, for the angles indicated by their subscripts, e.g. sut := sin ut. In order to use Eqs. ( 25 ) and ( 9 ) for the contact force determination, we need at least two joint torque sensors, one at the inner link and the other at the outer link. However, each finger of the BHand is equipped with a strain gauge to measure the joint torque only at the outer link, i.e. ϒ1,2 (see Fig.  7b). To get around this issue, we conducted the experiment by applying the contact force in an impactive way rather than by a continuous rolling contact. If the force is applied as an impact and if the object motion is mostly translational, then the force direction can be approximated by the relative velocity between the object centre and the finger contact point (due to the principle of impulse and momentum [25]). In fact, the contact force is likely to be an impact in our experimental setup since the object block can move freely without friction on the horizontal plane and the finger is controlled with a position mode PID servo with high gain for fast response. Following our notational convention, let us t denote vt11 the velocity of the (imminent) contact point pc1 at the finger 1 surface right before the impact, written with respect to the contact frame T1. Likewise, denote vt1 as the velocity of the mass centre of the object written with respect to T1 frame. Also denote v1t1 = vtt11 − vt1. Then the contact force for finger 1 can be approximated as f1t1 ≈ ρ v1t1 = ρ vtt1 − vt1 ∈ R2 with some unknown 1 scale factor ρ ∈ R. From Eq. ( 9 ), we have ϒ1,1 ϒ1,2 = ρJh1 v1t1 =⇒ ρ = ϒ1,2 e2T Jh1 v1t1 (26) t where e2 = [0 1]T. Consequently, we can obtain f11 as ϒ1,2 f1t1 = e2T Jh1 v1t1 · v1t1 . (27) Experimental results and evaluation The data acquisition and the real-time control have been implemented using a real-time desktop PC running under a real-time operating system provided by National Instruments Inc. The real-time loop consists of two dualrate parallel processes; one for the fast loop (Ts) and the other for the vision loop (Tv). The sensor signals from the BHand can only be communicated through high speed CAN protocol, which made us to choose Ts = 2.5  ms as the fastest possible sample rate for acquiring the joint encoder position values and the strain gauge values. The reference command for the joint PID controller is also updated by the Ts. Through the timing test of the vision camera, it has been identified that the total latency of the vision data is around 25 ms. To evaluate the performance of the estimated state variables, we need a monitoring signal (i.e. the true position and orientation data for the target object) for comparison. For this, the original vision data has been acquired every 25 ms and processed offline to remove the latency and to be interpolated into 2.5  ms data which is then stored as the monitoring signal. The original vision data sampled at 25 ms is further down-sampled to create Tv = 50  ms of the observation data with latency (i.e. yt (k) and yr (k) in Eq. ( 13 )). In the experiments, we used the original vision data (i.e. delayed and down-sampled position measurement of the object) to identify the contact point, which worked reasonably well despite some errors due to the visual latency. It may be attributed to the fact that position variable does not change abruptly in contrast to the force which can be applied in a discontinuous (or an impactive) way. Table  2 lists the parameters used for the experimental test. The first experimental results, i.e. the 2D motion free from gravity, are presented with time plots in Figs. 8, 9 and 10. During the time period shown in these plots, the object is first struck by the Finger 1 and then subsequently by the Finger 3. Figure 8 shows the joint torques measured by the strain gauges at the outer link joints during this period. Figure  8 reveals that the time duration of force application is less than 100 ms for both fingers, which conforms to our earlier discussion on the use of impactive forces for experimental tests. During the time period shown in Fig. 8, the fingers are held stationianrvgy3t,1bi=.eet.w−vettv11et3n=. tTvwtth33oe=frien0sguaelnrtsdin.gtIhnedtirohebicsjteiccoatnswse,aosf vs1ti1mv=1tp1lya−nvbdto1uannvcd3t-1 (right before the impact) are drawn in Fig. 8 with corresponding directional angles. The contact force for each finger is then determined using Eq. (27). Figure 9 shows the experimental results for the position variables; the position of the mass center ps = [x y]T and the orientation angle φ. The thick solid line is the monitoring signal obtained by offline processing of the original vision data sampled at 25 ms rate and is regarded as the true position of the object. The thin solid line is the real-time vision data with Tv = 50 ms latency, which we used as the real-time observation signal for the estimator. The thin dotted line is the motion data obtained from the real-time rigid body estimator. It shows that the latency and inter-sample values are compensated so that the estimated motion variables are close to the monitoring signal. Figure  10 compares the velocities from the rigid body estimator with those of other signals. Again, the thick solid line is the reference (or the true) velocity which is obtained by numerical differential of the monitoring signal. The thin solid line is the velocity computed using the delayed vision data. Remember that we need two consecutive sample points to compute the velocity by numerical differentiation. Therefore, the velocity obtained from the observed vision data (i.e. the one with Tv = 50  ms sampling) has additional 50 ms of latency resulting in 100 ms of total latency as shown in Fig. 10. In contrast, as we can see in the thick dotted line, the rigid body estimator can effectively compensate for such latency providing more accurate real-time velocity signal. The experimental results for the second experiment, i.e. the case with the object moving under the effect of gravity are shown in Figs.  11 and 12 which are presented in the same format as those for the planar motion are presented in Figs.  9 and 10. The catching action by F1 and F2 occurred a little before 3.2 s of time and generated an impulse-like force to the object, which can be seen from the sharp change in the velocity data of Fig.  10. We can clearly see from Fig. 9 that the estimated motion variables 3.1 a 2.9 3 3.2 3.3 3.4 2 0 -2 13.8 13.7 13.6 have compensated very well for the latency and missing intersample values of the vision, except for small errors during the catching period. The velocity profiles shown in Fig. 12 also confirm that the proposed rigid body estimator effectively compensates for the vision latency and can be used to provide more accurate real-time velocity data. Conclusions and discussions Motivated by the dynamic dexterous manipulation tasks where the fast and accurate estimation of the target object motion is often required, this paper presented a new rigid body estimation scheme combining the visual data with other sensory inputs for contact force measurement. The main point is the complementary use of the fast-sampled force measurement in compensating for the slow-sampled visual data with latency. Mathematically, -4 2.8 300 250 200 150 100 50 0 -50 the rigid body estimator takes the multi-rate extended Kalman filter form with retrospective correction step for the delay compensation. As demonstrated by the experimental results, the estimator is expected to perform very well if the contact force can be correctly determined. This will be especially so for large values of vision latency since the estimator relies on the numerical integration (or the prediction) between two consecutive vision samples. Use of high fidelity tactile and joint torque sensors is expected to enhance the performance as well as the robustness of the proposed rigid body estimator. In this paper, we assumed that we have a prior knowledge on the inertial properties of the object such as the mass or the inertia matrix. Such inertial properties may also be identified through preliminary manipulation trials and the application of parameter identification techniques. In fact, when we (humans) manipulate an object with our hands, in the beginning, we have only a limited sense of inertial properties but, through a variety of manipulation tasks, we gradually get to learn inertial properties more accurately and eventually become more skillful. Also, in order for the proposed estimator to be useful in a wide range of applications, it should be tested with more complex dynamic manipulation tasks such as throwing and catching an object in 3D space. We plan to pursue these research ideas as our future works. Authors’ contributions HB worked on the formulation of the problem under the supervision of the second and third authors. He also designed the experimental setup and carried out the experimental tests. SJ initiated the research idea, supervised the first author for analysis and experiment, and helped him with drafting the initial version of the manuscript. JPH contributed to research ideas and helped drafting the paper by providing feedback. All authors read and approved the final manuscript. Acknowledgements Authors acknowledge the financial support from NSERC (Natural Sciences and Engineering Research Council of Cananda) under the Discovery Grant Program (RGPIN-2015-05273). Competing interests The authors declare that they have no competing interests. 1. Robotics-VO ( 2013 ) A roadmap for U.S. robotics: from internet to robotics-2013 edition . http://www.robotics-vo.us/sites/default/files/2013%20 Robotics% 20Roadmap-rs . pdf. Accessed 10 July 2016 2. Falco J , Marvel J , Elena M ( 2013 ) Dexterous manipulation for manufacturing applications workshop . http://www.nist.gov/el/isd/upload/NISTIR-7940.pdf. Accessed 10 July 2016 3. Okamura AM , Smaby N , Cutkosky MR ( 2000 ) An overview of dexterous manipulation . IEEE Int Conf Robot Autom 1 : 255 - 262 4. Grebenstein M ( 2014 ) Approaching human performance . Springer, Cham 5. Martin J , Grossard M ( 2014 ) Design of a fully modular and backdrivable dexterous hand . Int J Robot Res 33 ( 5 ): 783 - 798 6. Quigley M , Salisbury C , Ng AY , Salisbury JK ( 2014 ) Mechatronic design of an integrated robotic hand . Int J Robot Res 33 ( 5 ): 706 - 720 7. Deshpande AD , Xu Z , Weghe MV , Brown BH , Ko J , Chang LY , Wilkinson DD , Bidic SM , Matsuoka Y ( 2013 ) Mechanisms of the anatomically correct testbed hand . IEEE/ASME Trans Mechatron 18 ( 1 ): 238 - 250 8. Kandel ER , Schwartz JH , Jessell TM et al ( 2000 ) Principles of neural science , vol 4 . McGraw-Hill , New York 9. Franklin DW , Wolpert DM ( 2011 ) Computational mechanisms of sensorimotor control . Neuron 72 ( 3 ): 425 - 442 10. LaValle SM ( 2006 ) Planning algorithms . Cambridge University Press, New York 11. Murray RM , Sastry SS ( 1994 ) A mathematical introduction to robotic manipulation . CRC Press, Boca Raton 12. Rizzi AA , Koditschek DE ( 1996 ) An active visual estimator for dexterous manipulation . IEEE Trans Robot Autom 12 ( 5 ): 697 - 713 13. Wang C , Lin CY , Tomizuka M ( 2014 ) Statistical learning algorithms to compensate slow visual feedback for industrial robots . ASME J Dyn Syst Meas Control 137 ( 3 ): 031011 14. Jeon S , Tomizuka M , Katou T ( 2009 ) Kinematic kalman filter (kkf ) for robot end-effector sensing . ASME J Dyn Syst Meas Control 131 ( 2 ): 021010 15. Hebert P , Hudson N , Ma J , Burdick J ( 2011 ) Fusion of stereo vision, forcetorque, and joint sensors for estimation of in-hand object location . IEEE Int Conf Robot Autom 1 : 5935 - 5941 16. Allen PK , Miller AT , Oh PY , Leibowitz BS ( 1997 ) Using tactile and visual sensing with a robotic hand . IEEE Int Conf Robot Autom 1 : 676 - 681 17. Kamiyama K , Kajimoto H , Kawakami N , Tachi S ( 2004 ) Evaluation of a vision-based tactile sensor . IEEE Int Conf Robot Autom 2 : 1542 - 1547 18. Prats M , Sanz PJ , Del Pobil AP ( 2009 ) Vision-tactile-force integration and robot physical interaction . IEEE Int Conf Robot Autom 1 : 3975 - 3980 19. Kragic D , Crinier S , Brunn D , Christensen HI ( 2003 ) Vision and tactile sensing for real world tasks . IEEE Int Conf Robot Autom 1 : 1545 - 1550 20. Namiki A , Komuro T , Ishikawa M ( 2002 ) High-speed sensory-motor fusion based on dynamics matching . Proc IEEE 90 ( 7 ): 1545 - 1550 21. Mason M ( 2001 ) Mechanics of robotic manipulation . MIT Press, Cambridge 22. Titterton D , Weston J ( 2004 ) Strapdown inertial navigation technology , vol 17 , 2nd edn . Radar, Sonar, Navigations and Avionics-IET , Stevenage 23. Bar-Itzhack IY , Harman RR , Choukroun D ( 2002 ) State-dependent pseudolinear filters for spacecraft attitude and rate estimation . In: AIAA Guidance , Navigation, and Control Conference and Exhibit 24. Leung W , Damaren C ( 2004 ) A comparison of the pseudo-linear and extended kalman filters for spacecraft attitude estimation . In: AIAA Guidance , Navigation, and Control Conference and Exhibit 25. Brogliato B ( 1999 ) Nonsmooth mechanics: models. Dynamics and control . Springer, London


This is a preview of a remote PDF: http://www.robomechjournal.com/content/pdf/s40648-016-0054-2.pdf

Hyunki Bae, Soo Jeon, Jan Huissoon. Vision and force/torque integration for realtime estimation of fast-moving object under intermittent contacts, ROBOMECH Journal, 2016, 15, DOI: 10.1186/s40648-016-0054-2