End-To-End Deadlines over Dynamic Topologies

LIPICS - Leibniz International Proceedings in Informatics, Jul 2019

Despite the creativity of the scientific community and the funding agencies, the underlying model of computation behind IoT, WSN, cloud, edge, fog, and mist is fundamentally the same; Computational nodes which are dynamically interconnected to form a system in where both processing capacity and connectivity may vary over time. On top of such a system, we consider applications that need packets to flow along a path and adhere to end-to-end deadlines. This application model is motivated by both control and automation systems, as well as telecom systems. The challenge is to guarantee end-to-end deadlines when allowing nodes and applications to join or leave. The mainstream, and to some extent natural, approach to this is to relax the stringency of the constraint (e.g. use probabilistic guarantees, soft deadlines). In this paper we take a different approach and keep the end-to-end deadlines as hard constraints and instead partially limit the freedom of how nodes and applications are allowed to leave and join. We present a theoretical framework for modeling such systems along with proofs that deadlines are always honored.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://drops.dagstuhl.de/opus/volltexte/2019/10747/pdf/LIPIcs-ECRTS-2019-10.pdf

End-To-End Deadlines over Dynamic Topologies

E C R T S End-To-End Deadlines over Dynamic Topologies Victor Millnert 0 1 2 0 Lund University , Sweden 1 Enrico Bini University of Turin , Italy 2 Johan Eker Lund University, Sweden Ericsson Research , Sweden Despite the creativity of the scientific community and the funding agencies, the underlying model of computation behind IoT, WSN, cloud, edge, fog, and mist is fundamentally the same; Computational nodes which are dynamically interconnected to form a system in where both processing capacity and connectivity may vary over time. On top of such a system, we consider applications that need packets to flow along a path and adhere to end-to-end deadlines. This application model is motivated by both control and automation systems, as well as telecom systems. The challenge is to guarantee end-to-end deadlines when allowing nodes and applications to join or leave. The mainstream, and to some extent natural, approach to this is to relax the stringency of the constraint (e.g. use probabilistic guarantees, soft deadlines). In this paper we take a different approach and keep the end-to-end deadlines as hard constraints and instead partially limit the freedom of how nodes and applications are allowed to leave and join. We present a theoretical framework for modeling such systems along with proofs that deadlines are always honored. 2012 ACM Subject Classification Computer systems organization ? Cloud computing; Networks ? Cloud computing Cloud computing plays a pivotal role in the ongoing digitalization of both the industry and the society at large. This transformation borrows many technologies and concepts from traditional cloud applications (mail services, ride sharing, media streaming, e-commerce etc.). The focus for these applications is commonly availability, scalability, and price/performance. While response time is of great concern for such applications, determinism is usually not. For the next generation of cloud-applications, such as industrial automation, collaborative traffic, and telecom systems, predictable timing is crucial. In order to secure a successful transformation and allow such timing-critical systems, the underlying cloud infrastructure must be able to guarantee predictable end-to-end response-times. One example of such an application could be a dynamically reconfigurable production cell in a manufacturing plant, where the elements in the production cell are connected to the cloud and controlled centrally. The cloud provides large-scale compute and storage capacity. Cloud back-end systems are typically implemented as a service meshes consisting of networks of interconnected microservices. The production cell may be dynamically configured to adapt to changes, such as hardware failures, or respond to external events, etc. Elements may thus dynamically join or leave a cell. Figure 1 illustrates such a production cell with industrial robots connected to the cloud, which provide services for automation, analytics, artificial intelligence and machine learning algorithms. The topology of the service network may and phrases Cloud; real-time; end-to-end latency guarantee; end-to-end response time guarantee; dynamic network Introduction 1 2 3 5 4 7 10 1 2 4 8 3 5 7 6 9 10 3 4 7 6 8 9 10 deadline deadline deadline deadline therefore change over time as robots join and leave the network. As mentioned earlier, the challenge is to still guarantee that the end-to-end deadline required by the robots are always met, despite dynamic changes of the network topology. This paper addresses the challenge of allowing network churn and still ensures predictable end-to-end response times within the cloud. A framework that allows for transitions of the network topology is presented. Moreover, it is formally proved that the suggested framework guarantees that it will never violate any timing constraints. 2 The system model The focus of this paper is to develop a framework where applications are implemented as flows of packets through a network of nodes. Each node offers a service to the incoming packets. The goal is to manage the on-line arrival/departure of flows and the on-line arrival/departure of nodes of the network in a way such that end-to-end deadlines of packets are honored, while allowing for topology changes. Section 2.1 presents definitions and assumptions with respect to the services offered by nodes. Applications are then defined as a set of interconnected services and referred to as flows. Definitions and assumptions on flows are given in Section 2.2. Finally, in Section 2.3, we present some assumptions on how these flows and nodes interact with the resource manager. 2.1 Model of the nodes A node represents an entity that offers a service to the incoming packets. The time taken by a node to provide the service to an incoming packet is given in Def. 1 below. The nodes in the system are denoted by V(t) ? N, which is the set of indices of the nodes present at time t. The terms ?node? and ?vertex? are used interchangeably1. 1 We may use the term ?node? when we refer to its capacity to process packets, while the term ?vertex? is more often used when referring to the topological structure of the network. I Definition 1 (Response time). We define the response time Ri(t), as the time taken by a packet entering node i at time t to be processed. From the definition above, we remark the following facts: The response time Ri(t) accounts for all sources of delay possibly occurring within node i (queuing, interference, processing, etc.). All incoming packets are treated in the same way regardless of the flow they belong to. Differentiantig packets depending on the application they belong to is feasible. This would require to attach the flow index j to the response time, which would become then Ri,j(t). However, this choice poses a notational challenge only with no conceptual added value. For this reason we believe that letting the packet response time depend only on the node (and time) does not significantly impact the applicability of the results. We leave the case of per-flow response time to future works. Moreover, in contrast to a large body of the research on real-time systems, this paper does not address how the response time may be computed based on the amount of incoming workload (due to the arrival of packets) and the amount of processing capacity of a node, etc. The interested readers can refer to a vast relate literature addressing this aspects [12, 18, 7, 16, 17, 19, 25, 3]. Rather, the focus is on the interactions between nodes aimed at guaranteeing end-to-end deadlines in the context of a dynamic network. Naturally, the service provided to packets by a node might include some minimum requirements, which determines a lower bound to the response time of the service time of a node. The node is unable to process packets in less time than this value. Hence, to model the minimum time needed by a node to process a packet, we introduce the following definition. I Definition 2 (Response time lower bound). We define the response time lower bound R , i as the minimum a packet may take to be processed by node i. In practice, Ri may represent the pure processing time of a packet, without any interference or delay of any kind. Finally, we assume that the node is capable of controlling its response-time, as stated below in Assumption 1. This assumptions is backed by the vast body of research for different ways of controlling the response-time of cloud services. For instance, in [18] they use the concept of ?brownout control? to ensure that the response-time of a server is within a desired limit. In [7] control theory is used to modify the processing capacity of the web servers to control response times. In [17] a combination of scaling the resources of the nodes with an admission control is used to ensure that the deadlines of the nodes in the network are met. More interesting work addressing this is found in [19, 25, 3]. I Assumption 1 (Response-time control). We assume that a node i is capable of controlling its response-time Ri(t) such that it is always below a deadline Di(t) ? Ri, i.e., such that An application in the system is modeled as a flow indexed by some j ? F (t) and characterized by a path and an end-to-end deadline, properly defined next. ?t ? 0, ?i ? V(t), Ri(t) ? Di(t). 2.2 Model of the flows (1) I Definition 3 (Path). The path of a flow j ? F (t) is defined by the sequence pj : {1, . . . , `j} ? V(t), with `j ? 1 being the length of the path, such that (2) (3) ?i = 1, . . . , `j ? 1, (pj(i), pj(i + 1)) ? E(t), where E(t) ? V(t) ? V(t) are the current edges between the nodes of the network. By Def. 3, it follows that pj(i) is the i-th node on the path of flow j. It should be noted that Eq. (2) enforces the existence of an edge of the graph between two consecutive nodes in a path. It should be noted that this paths may traverse a node more than once, which allows us to capture typical client-server sessions. With a slight abuse of notation, we may denote the image of the map pj, which is the set of nodes touched by path j, by pj only, rather than pj({1, . . . , `j}). We remark that the type of application addressed in this paper is borrowed from cloud microservices, where each service provides a unique value to the travelling packet. Hence, the route of packets belonging to an application is known and must not be decided at run time. I Definition 4 (End-to-end deadline). We define the end-to-end deadline Dj of the flow j ? F (t) as the maximum time a packet may take to be processed by all nodes along the path pj. Finally, a flow j is also characterized by an end-to-end response-time Rj(t), which is the time taken by a packet entering the first node pj(1) of the flow at time t to be processed by all nodes of the path pj. However, before properly defining Rj(t), we need to introduce the mid-path response-time Rj,i(t), that is the time needed by a packet that entered flow j at time t to pass through the first i nodes of the path pj. Formally, this quantity is defined recursively by Rj,i(t) = (Rpj(1)(t) Rj,i?1(t) + Rpj(i) t + Rj,i?1(t) if i = 1 otherwise. The intuition of (3) is quite straightforward: the time it takes a packet to traverse the first i nodes is equal to the time required to traverse the first i ? 1 nodes plus the response-time of the i-th node. The end-to-end response-time of a flow j is therefore given by Rj,`j (t), which we compactly denote by Rj(t). 2.3 Model of the dynamic network Flows and nodes may join and leave the network at run time. Hence, we define the network as follows. I Definition 5 (Network). We define the network G(t) of the system at time t as the set of nodes, directed edges, and flows present at that time: G(t) = {V(t), E(t), F (t)}. Since we aim at guaranteeing end-to-end deadlines, the requests to join or leave must be properly handled by a resource manager, which manages the network (as illustrated in Figure 2). If not properly handled, the risk is that newly admitted flows may cause overload or the uncontrolled departure of nodes may disconnect the network. The interactions between the resource manager and the flows are as follows: A flow j 6? F (t) may request to join the network. Such an instant is denoted by fjrq+. When a new flow issues such a request, it also communicates to the resource manager the following information: 1. its path pj 2. its end-to-end deadline Dj. notify to leave Flows Resource Manager ok to join ok to leave request to join request to leave notify to join Nodes After the request by a flow to join, the resource manager: 1. accepts the flow j to the network at an instant fjok+ (which ? fjrq+), if feasible, or 2. rejects the flow j immediately, if not feasible. Details on the admission of new flows based on its characteristics and the current state of the network are given in Section 4. A flow j ? F (t) may notify and leave the network at any time that we denote by fj?. In fact, it is only advantageous to let a flow (and its constraint) to leave. The resource manager may notify a flow j ? F (t) that it has to leave the network. This might, for instance, happen if a node along the path pj requests to leave the network. In such a case, the resource manager is no longer able to provide the requested services of the flow j. The interactions between the resource manager and the nodes (which are the vertices of the graph) are as follows: A node i 6? V(t) may notify and join the network at any time. We denote such an instant by vi+. Since a node is bringing a new service to the network, there is no admission control to its request to join. A node i ? V(t) may request to leave the network to the resource manager. The instant of such a request is denoted by virq?. The resource manager lets a node leave only at time viok? (which is ? vrq?). The time i between virq? and viok? is needed by the resource manager to allow flows going through node i to properly exit. Problem formulation The problem formulation in this paper can now formally be summarized as follows; control when/how nodes are allowed to leave the network, control when/how flows are allowed to join the network, control the node deadlines D(t) = [Di(t)]?i?V(t), such that the end-to-end deadlines of all the flows in the network are met: ?t ? 0, ?j ? F (t), Rj(t) ? Dj. 3 Static networks In this section, we introduce a protocol which allows for dynamic deadlines in static networks. By a static network we mean one where no nodes or flows join or leave the network, that is ?t ? 0, G(t) = G. The traffic rates and response times of nodes may change dynamically. (4) However, the topology is static: the nodes, the flows and their end-to-end deadlines do not change over time. Hence, we can drop the time dependency of the set E(t) = E of edge, the set V(t) = V of nodes, and the set F (t) = F of flows. However, node deadlines may change to accommodate variations in the workload. To give some intuition of the challenges of guaranteeing end-to-end deadlines in a system with dynamic node deadlines (even for a static network), we present a simple example in Section 3.1. In Section 3.2 we propose a solution, which allows for dynamic node deadlines and still provides guarantees on the end-to-end deadlines. Naturally, this is also proved in Theorem 6. Finally, in Section 3.3 we adapt the opening example of Section 3.1 such that it uses the method suggested in Section 3.2 and show that it is then able to guarantee that all the end-to-end deadlines are met for all times. 3.1 Example ? issues with dynamic deadlines If node deadlines were constant then a static assignment of node deadlines, equal to any vector of node deadlines D = {Di}?i?V satisfying D ? D(G) = nDi ? R+ : ?i ? V, Di ? Ri, `j o ?j ? F , X Dpj(i) ? Dj . i=1 (5) would guarantee no end-to-end deadlines to be missed. The intuition behind Eq. (5) is simple: the sum of the node deadlines along the path pj of a flow j cannot exceed the end-to-end deadline Dj of the path j. We remark that the set D(G) is convex, since it is the polytope built from the intersection among linear half-spaces. This also implies that every line between any two points in D(G) belongs to D(G). Let us now consider the issues of performing a transition from some deadline assignment D(t1) ? D(G) to another assignment D(t2) ? D(G), hence with both the starting and ending node deadlines belonging to D(G). Since D(G) is convex then the linear transition of node deadlines t2 ? t t ? t1 D(t) = D(t1) ? t2 ? t1 + D(t2) ? t2 ? t1 always belongs to D(G) for all t ? [t1, t2]. 1 2 3 4 5 However, even if ?t ? [t1, t2], the linear combination D(t) always belongs to D(G), the end-to-end deadline may be missed anyway. Suppose we have the network G, illustrated in Figure 3 and focus on flow 1 (the blue flow), with path p1 = {1, 2}. The end-to-end deadline for this flow is D1 = 6 milliseconds (ms). Suppose now that in response to an increase of the incoming packets rate, node 2 must change its deadline from D2(t1) = 1 to D2(t2) = 5. The node deadlines of the system are therefore changing from D(t1) = [5, 1, . . .] to D(t2) = [1, 5, . . .]. Please, note that for the purpose of this example the particular choice of deadlines for nodes 3, 4, and 5 does not matter. Figure 4 shows the deadlines node 1 and node 2 over time. At time ?1 = 1 a packet enters node 1, which has D1(?1) = 5. In the worst case, node 1 will finish processing that packet at time ?1 + D1(?1) = 6ms. Suppose now, that at time t1 = 2ms, while this packet is still at node 1, the resource manager begins changing the node deadlines from D(t1) = [5, 1, . . .] towards D(t2) = [1, 5, . . .]. When the packet exits node 1 at time ?1 + D1(?1) = t2 = 6, then it enters node 2. Due to the change of node deadlines, now the value of D2(t) at t = 6 is D2(6) = 5. This means that, in the worst case, the response time of the second node is also 5ms, since R2(t2) ? D2(t2) = 5. The packet that entered flow 1 at time ?1 = 1 would therefore, in the worst case, may take up to 10ms to traverse its path p1. Hence, the end-toend deadline D1 = 6 of the packet is violated, despite the fact the sum of the node deadlines D1(t) and D2(t) along the path was never greater than D1 (that is ?t, D1(t) + D2(t) ? D1). This simple example shows that when the network is allowed to change the node deadlines dynamically, the constraint D(t) ? D(G) is not a sufficient condition to guarantee end-to-end deadlines to be met. In fact, the violation of the end-to-end deadline illustrated by this example is related to the variation of the node deadlines. The node deadlines D(t) need to satisfy a stricter constraint, which is discussed next. 6 ) s 4 m ( item2 00 Guaranteeing end-to-end deadlines In this section, we provide the conditions that allow the network to dynamically change the node deadlines without incurring any end-to-end violation, as exemplified in Section 3.1. In fact, by letting node deadlines change over time, prediction of the end-to-end response time of packets becomes difficult. In Protocol 1 we present a solution to this problem. The intuition is that by limiting the rate-of-change of the node deadlines, it is possible to compute the end-to-end response time of a flow j ? F at time t. This allows us to compute the allowed node deadlines. This is formally proved in Theorem 6. Protocol 1 Management of dynamic node deadlines in static networks. The node deadlines can never change with a rate larger than some fixed ? ? [0, 1]: ?t ? 0, ?i ? V, |D? i(t)| ? ?. The node deadlines must always be within the set of feasible node deadlines: D(t) ? D(G) = nDi ? R+ : Di ? Ri, i ? V, ?j ? F , X`j (1 + ?)`j?iDpj(i) ? Dj o. i=1 I Theorem 6 (Dynamic deadlines in static networks). No end-to-end deadlines of any flow is violated, that is with D(G) given by as long as the node deadlines D(t) never change with a rate faster than a given bound ? ? [0, 1]: and as long as the node deadlines remain within the space of feasible node deadlines: D(G) = nDi ? R+ : ?i ? V, Di ? Ri, ?j ? F , X`j (1 + ?)`j?iDpj(i) ? Dj o. i=1 Proof. We begin by recalling Assumption 1: the nodes in the network have a response-time controller which ensures that Eq. (1) always holds, that is ?t ? 0, Ri(t) ? Di(t). It follows that for the first node pj (1) of any path j ? F , it is always true that: ?t ? 0, ?j ? F , Rj,1(t) = Rpj(1)(t) ? Dpj(1)(t). It then follows that for all subsequent nodes with index i > 1, from Eq. (3), we have that ?t ? 0, ?j ? F , Rj,i(t) = Rj,i?1(t) + Rpj(i) t + Rj,i?1(t) ? Rj,i?1(t) + Dpj(i) t + Rj,i?1(t) ? Rj,i?1(t) + Dpj(i)(t) + ? Rj,i?1(t) = (1 + ?)Rj,i?1(t) + Dpj(i)(t) where each step follows: 1. from the definition of end-to-end response-time (3), 2. from Eq. (1), 3. from Eq. (7), which implies Lipschitz-continuity of Di(t) with Lipschitz-constant ?, 4. from basic math. In the next step, we show ?t ? 0, ?j ? F , ?x = 1, . . . , `j , Rj,x(t) ? x X(1 + ?)x?iDpj(i)(t), i=1 (6) (7) (8) (9) (10) (11) (12) holds by proving by induction on the index x of nodes over the path j. When x = 1, Eq. (12) follows directly from (10). For any other x > 1, we have that Rj,x(t) ? (1 + ?)Rj,x?1(t) + Dpj(x)(t) x?1 ? (1 + ?) X(1 + ?)x?1?iDpj(i)(t) + Dpj(x)(t) i=1 x?1 = X(1 + ?)x?iDpj(i)(t) + Dpj(x)(t) i=1 x = X(1 + ?)x?iDpj(i)(t) i=1 where the different steps follow: 1. because the inequality is the same as Eq. (11) 2. because we exploit the inductive hypothesis of (12) for x ? 1, 3. from basic math. Hence, Eq. (12) is proved for all x = 1, . . . , `j . Finally, we can conclude the proof by showing that which implies that as long as the node deadlines are chosen such that D(t) ? D(G) no end-to-end deadlines of any flow j ? F is violated, and the theorem is proved. J This section illustrates that if the node deadlines are changed in accordance to Protocol 1, then the issue of end-to-end deadline misses shown in Section 3.1 cannot happen. To do so, we modify the example of Section 3.1. We consider again flow 1 of Figure 3, which has path p1 = {1, 2} and end-to-end deadline D1 = 6. In this scenario, we assume that the resource manager must change D2(t) from D2(t1) = 1 to D2(t2) = 5. The reason is to allow node 2 to handle a sudden increase in incoming traffic. Recalling the example in Section 3.1, we can now verify that if the node deadlines begin in a state of D1(t1) = 5 and D2(t1) = 1 the resource manager is not able to change them at all. In fact, by writing explicitly the constraint of Eq. (9) in Protocol 1 for the path j = 1, we have D1(t1) + (1 + ?)D2(t1) ? D1 ? ? 0 1 + (1 + ?)5 ? 6 ? which means that node deadlines are not allowed to change at all. If some change is needed (for example, to handle a burst of incoming traffic), then the node deadlines must be more constrained. In fact, given the change of D2(t), the choice of D1(t) has to satisfy the following constraint: ?t ? 0, (1 + ?)D1(t) + D2(t) ? D1 = 6. Next, we compare two cases of different values of feasible rate of change ? for the node deadlines. The result are illustrated in Figure 5. Maximum rate-of-change ? = 1 To allow the network to transition quickly from D2(t1) = 1 to D2(t2) = 5 we choose ? = 1. This means that the the choices of D1 are constrained by the following condition: ?t ? 0, (1 + 1)D1(t) + D2(t) ? D1 = 6. This condition gives the resource manager the space of possible choices for D1(t) illustrated by the shaded blue region in Figure 5a. The largest possible choices for D1(t) therefore involves changing D1 from D1(t1) = 2.5 to D1(t2) = 0.5, with t1 = 1 and t2 = 5. Maximum rate-of-change ? = 0.5 The stringency of the constraint on the node deadlines when ? = 1 may be relaxed by requiring a slower transition between the node deadlines, for example ? = 0.5. By doing so, we get the following conditions on D1(t): 10 2 D1(t1) ? 3 ? 3.333, D1(t2) ? 3 ? 0.666, in order to ensure that Eq. (13) holds, and assuming that D2(t) is again changed from D2(t1) = 1 to D2(t2) = 5. Not surprisingly, by requiring a smoother transition, the node deadlines may be larger. Similarly to the previous case, this is illustrated in Figure 5b, where we illustrate that when D1(t) is chosen to be as large as possible, a packet traversing flow 1 is always guaranteed to meet its end-to-end deadline of 6ms. (13) Trade-offs with alpha This example illustrates some fundamental trade-offs that come by adopting Protocol 1. While it allows the system to change the node deadlines dynamically with a rate of ?, it imposes some constraints on the possible choices of node deadlines. The quicker one wishes to change the node deadlines, the more restricted the choice of feasible node deadlines becomes, and vice versa. The design-parameter ? should be chosen appropriately for a given application. Dynamic networks In this section, we generalize the method presented Section 3 to the case of a dynamic network G(t) = {V(t), E(t), F (t)}, where nodes and flows may join and leave the network at run time. We begin by showing, in Corollary 7, that the result of Theorem 6 transfers directly to a dynamic network. This means that the end-to-end deadline of any flow in the network will be met as long as the hypothesis of Theorem 6 hold for all states of the network G(t) at all times t. Conditions stated in Corollary 7 are fulfilled. By comparing Theorem 6 and Corollary 7 one can see that the only difference is that we now allow for a dynamic network, i.e., G(t) instead of a static network G. I Corollary 7 (Dynamic deadlines in dynamic networks). The end-to-end deadline of all flows are always met, at all times, that is: (14) (15) (16) (18) (19) ?t ? 0, ?j ? F (t) Rj(t) ? Dj, with D(G(t)) given by D G(t) = nDi ? R+ : ?i ? V(t), Di ? Ri, as long as the node deadlines D(t) never change with a rate faster than some ? ? [0, 1]: and as long as the node deadlines belong to the space of feasible node deadlines: `j ?j ? F (t), X(1 + ?)`j?iDpj(i) ? Djo. (17) i=1 Proof. We begin the proof by observing that from Eq. (15), it follows directly from Theorem 6 that for a fixed time-instance t0 it holds that ?j ? F (t0), Rj(t0) ? Dj, as long as D(t0) ? D G(t0) , with D G(t0) given by Eq. (17). In fact, this is precisely what was stated and proved in Theorem 6, but with a fixed network topology G, instead of an instantaneous ?snapshot? G(t0) of a dynamic topology. Then, the hypothesis of Eq. (16) ensures that Eq. (19) holds ?t ? 0. Therefore, it follows that Eq. (18) holds ?t ? 0, and in turn that Eq. (14) does always hold, as required. J As demonstrated by the short proof, Corollary 7 does not poses any deeper conceptual challenges compared to Theorem 6. However, the two hypothesis of (15) and (16) may be hard to hold simultaneously, if no special care is taken. This is illustrated in the next example. Example ? issues in acceptance a new flow The blind admission of new flows as soon as they request to join, may cause the violation of one of the two hypothesis of the corollary (Equations (15) and (16)) making then Corollary 7 incapable to guarantee end-to-end deadlines. At the time when any new flow j0 is admitted to the network, the set of flows F (t) includes the new flow j0 which was not previously in the set. As a consequence of the acceptance of the new flow j0 into F (t), the set of feasible node deadlines D G(t) may suddenly shrink due to the newly added constraint. As illustrated in the next example, this might in turn cause the node deadlines to be in an infeasible state, such that the end-to-end deadline of the newly accepted flow will be violated. Notice that node deadlines cannot instantaneously adapt to the new constraint, otherwise the hypothesis of ?bounded rate of change? of Eq. (15) is violated. In Figure 6, we illustrate a system where a new flow j0 registers to join the network at time fjrq0+ = 3. As soon as this flow is accepted into the network, at time fjo0k+ = 4.5, the set of feasible node deadlines (illustrated by the shaded green area) make a discrete jump. This means that the node deadlines D(t) (black thick line in Figure 6) will no longer remain within D G(t) . In other words this means that D(fjo0k+) 6? D G(fjo0k+) , which is a clear violation of the conditions of Corollary 7. ien10 l d a e d 5 e d o n 0 0 fjrq0+ fjo0k+ 2 4 6 time 8 10 D1(G) D1(t) violation The illustrated issue suggests that a special care must be taken when the network G(t) is modified, since the discrete variation of the network may not be compatible with the need of smooth node deadlines. Therefore, in Section 4.1, we present two protocols which address this issue: Protocol 2 explains how the requests of flows are managed, while Protocol 3 illustrates the management of nodes of the network. 4.1 Protocol allowing dynamic networks In this section, we present a protocol for managing how flows may join and leave the network. We show that by following this protocol, the hypothesis of Corollary 7 do always hold, making then the corollary applicable. We then present a second protocol for how to manage nodes joining and leaving the network. Management of flows The intuition behind Protocol 2, which manages the flows, comes from the fact that as soon as a new flow j0 is accepted into the network, at time fjo0k+, the network changes from G = {V(t), E (t), F (t)} to G+ = V(t), E (t), F (t) ? {j0} . The constraint corresponding to the new flow j0 is thus added to D(G) leading to the new set of feasible node deadlines D(G+) ? D(G). Therefore, in order to ensure that the node deadlines remain within the set of feasible node deadlines once j0 is accepted, i.e., that D(fjo0k+) ? D G(fjo0k+) , Protocol 2 will only accept j0 if D(t) ? D(G+). If this is the case when j0 requests to join, i.e., that D(fjrq0+) ? D(G+), then the new flow will be accepted immediately, and fjo0k+ = fjrq0+. If on the other hand, D(fjrq0+) 6? D(G+), it is not possible to accept j0 immediately. Therefore, in order to accept it, the resource manager changes the node deadlines towards a goal point D? ? D(G+). It will then accept the new flow j0, once D(t) = D?, which will occur at time fjo0k+, given by Eq. (20). Protocol 2 Management of flows. At time fjrq0+, the new flow j0 requests to join If D(fjrq0+) ? D(G+), then the flow j0 is admitted immediately, that is fjo0k+ = fjrq0+. If D(fjrq0+) ?/ D(G+), then the admission of flow j0 is delayed until the node deadlines have completed a linear transition to a goal point D? ? D(G+), that is fjo0k+ = fjrq0+ + maxi?V(t) Di? ? Di(fjrq0+) ? This implies that Protocol 2 fulfills condition (15) of Corollary 7, since the maximum rate-of-change of any node is ? during the transition. with ? being the maximum feasible rate of change of node deadlines. If D G+) = ?, then the request of flow j0 to join the network is rejected. A flow j ? F (t) may notify the resource manager and leave the network at any time. The time when it leaves is denoted by fj?. Let us comment on the choice of the ?goal point? D?. In general, any choice of D? ? D(G+) is valid. However, if the target is to minimize the transition-time fron the time fjrq0+ flow j0 request to join to the time fjo0k+ it is admitted to the network (given by Eq. (20)), then it should be chosen as D? = arg min max Di ? Di(fjrq0+) . D?G+ i We would like to point out three observations from Protocol 2. The first one is that during the transition to accept the new flow j0 the node deadlines are changed according to the following linear function: ?i ? V(fjrq0+), ?t ? [fjrq0+, fjo0k+], Di(t) = Di(fjrq0+) ? (t ? fjrq0+) fjo0k+ ? fjrq0+ + Di? ? (fjo0k+ ? t) fjo0k+ ? fjrq0+ . (21) This means that the node deadlines are changed along a line from D(fjrq0+) to D(fjo0k+). Since D(G) is a convex space, it follows that D(t) ? D(G) during the transition. Hence, it also follows that the hypothesis of Eq. (16) in Corollary 7 holds during the transition. The second observation is that by changing the node deadlines according to Eq. (21) we acquire the property that Di(fjo0k+) = D?. This means that at the end of the transition, at i time t = fjo0k+, we have D(t) ? D(G+). This implies that once the new flow j0 is accepted condition (16) of Corollary 7 continues to hold. The final observation is that by exploiting the value of fjo0k+ of Eq. (20), we have |D? i(t)| = |Di(fjrq0+) ? Di?| = ? fjo0k+ ? fjrq0+ |Di(fjrq0+) ? Di?| maxi?V(t) Di? ? Di(fjrq0+) ? ? (20) (22) By combining all three observations, we can conclude that by following Protocol 2, the hypotheses of Corollary 7 are always met when accepting new flows, and then no end-to-end deadline is violated. Finally, we remark that if there is no possible choice of a goal point, i.e., if D(G+) = ?, then the request of j0 to join is clearly rejected because the admission of the new flow j0 may cause the violation of end-to-end deadline of some flow already admitted to the network. Management of nodes The basic idea behind Protocol 3 is that when a node i ? V(t) leaves the network, it does affect the flows with a path going through i. Therefore, at time virq?, that is when node i requests to leave the network, the resource manager notifies any affected flow j ? {j : ?j ? F (t), i ? pj} that it will be pushed out from the network after a time Tileave. The rationale for this is simply that the resource manager will no longer be able to provide any guarantees for the end-to-end deadlines of the affected flows once node i has left the network. However, should the affected flows still wish to use some of the services in the network, they may request to re-join the network as a new flow. In order to allow the affected flows adequate time to do this, and to also ensure that there is enough time for the packets in the affected flows, we require Tileave to be greater than the largest end-to-end deadline of the affected flows. Moreover, it can be noticed that there is no condition on the nodes willing to join the network. Protocol 3 Management of nodes. A node i 6? V(t) may notify the resource manager and join the network at any time vi+. When a node i ? V(t) requests to leave the network (we denote such an instant by virq?), it is allowed to do so at: viok? ? virq? + Tileave with Tileave ? max{Dj : ?j ? F (t), i ? pj} The resource manager will notify all the affected flows j ? {j : ?j ? F (t), i ? pj} that they will be kicked out at time t = viok? if they have not left the network by then. Comments on handling multiple flows It should be noted that while Protocol 2 only treats the case where a single flow j0 request to join, it is possible to allow multiple flows to request. The management of this scenario can be achieved by introducing a request queue, and then applying Protocol 2 for the request at the head of the queue. Once the request is served, the resource manager can then repeat for the new head-of-the-queue request until there are no more pending requests. This methodology would only incur in a heavier notation which we prefer not to add to lighten the presentation. Moreover, as we show in the experiments of Section 5.2 a new flow can be accepted in matter of milliseconds, or even micro seconds. Therefore, given the application of cloud robotics, it is fair to assume that there will not be multiple flows requesting to join simultaneously. And should there be, introducing a request queue will not incur any significant delay for accepting new flow. 4 tl<atexish1_b64="OYkpcrA2Dfo3GzZFduwP8>B/nVLSgNEyU+C9jR0vm7JHMWqKX5TQI2 5 7 6 2 1 3 4 5 7 t07Y9KF3 l<atexish1_b64="ocP2dIjw/XnT5R+>ABVDLSgNEOzGMrQCU83qJpZkyHfvmuW time 2 1 3 5 7 6 2 1 3 transition, from time t1 to t2, flow, 4 (green), joins the network, which already has flows 1, 2, and 3. Then in the second transition, from time t2 to t3, node 6 leaves the network. When node 6 leaves the network, it affects flow 3 (yellow), which then leaves, and re-join the network as a new flow 5 (gray). In this section, by using some examples, we illustrate how the protocols between the flows, nodes, and the resource manager work. The scenario is illustrated in Figure 7 and consists of the two transitions. In the first transition, from t1 to t2, flow 4 (green) requests to join the network. The second transition, from t2 to t3 illustrates how node 6 requests to leave the network. By doing so, it will affect flow 3, which has a path passing through node i = 6. The affected flow must therefore leave the network, and re-join as a new flow j = 6, illustrated by the gray arrows in Figure 7. Next, we describe the first transition (of the new flow joining), followed by the second transition (node 6 leaving). The schedule for both transitions are depicted in Figure 8. Example ? dynamic network topology to9XvQTZm7PJk8MFqpuGH5w1 l<atexish1_b64="R2cdz3LWI/Ygj+D0CKU>ABnVSNEOyrf f rq+ 4 ? vrq? f ok+ f ok+ vok? 6 4 5 6 time 0 2 4 6 8 10 12 14 16 18 accepted into the network, as well as when nodes leave. It show a request from flow j0 = 4 to join the network at time f rq+ = 3 as well as a request from node i = 6 to leave the network at time 4 vrq? = 9. It also illustrates how the resource manager changes the node deadlines towards D? (given 6 by ?) before accepting the new flow j0 = 4. Request of a flow to join as the shaded blue region, and the goal point D? as the ? symbol. When the new flow 4 requests to join the network, at time f4rq+ = 3, the node deadlines of the system are in a state such that it cannot be accepted immediately, i.e., D(f4rq+) 6? D(G+). According to Protocol 2, this requires the resource manager to change the node deadlines to a goal point D? ? D(G+). The node deadlines will therefore be changed linearly from D(f4rq+) to D?. At time f4ok+, we have that D(t) = D?, and then flow 4 is admitted into the network. This is illustrated in Figure 7 with D(G) shown as the shaded green region, D(G+) D1(G) Request of a node to leave At time vrq? = 9, node 6 requests to leave the network. This is illustrated in Figure 8 by 6 the downward arrow. Since node 6 belongs to the path p3 of flow 3 (see Figure 7), the departure of node 6 would affect flow 3 (yellow), with end-to-end deadline D3 = 5. By following Protocol 3, the resource manager will therefore notify flow 3 that it will no longer provide any end-to-end deadline guarantees after a time v6ok? = v6rq? + T6leave = 9 + 5 = 14. The node will then be allowed to leave the network at this time v6ok?, as illustrated in Figure 8. In the figure, it can also be noticed that the space of feasible node deadlines increases when node 6 leaves. The reason is that constraint of the end-to-end deadline D3 of flow 3 is removed. In this example, we assume that flow 3 still wants to remain in the network. Therefore, it will request to re-join the network as a new flow 5 at time f5rq+. At this time, the node deadlines already allow the requesting flow 5 to join (that is D(f5rq+) ? D(G+)) and then flow 5 is admitted immediately (f5ok+ = f5rq+), as shown in Figure 8. 5 Evaluation: trade-offs with alpha In this section we evaluate some of the effects introducing Protocols 1, 2, and 3 might have on a system. We are particularly concerned with the trade-offs of choosing different values of the design-parameter ?. The intuition is that by choosing a value for ? we choose how quickly the resource manager is able to change the node deadlines in the network. A higher value of ? will therefore allow for quicker changes. This will in allow the resource manager to accept new flows faster. However, as illustrated in Section 3, a higher value of ? will require lower node deadlines. This means that the response-times of the nodes in the network have to be lower. In order to satisfy this, some response-time controllers might have to sacrifice the quality of service (QoS) provided by the nodes. For instance, in [17] the sacrifice is to sometimes discard packets, and in [18] the sacrifice is to decrease the amount of content provided by the nodes. 5.1 System used for evaluation The system used to evaluate the trade-offs of ? is presented in a previous work [17] by the authors. In short, it allows nodes in a network, such as the one illustrated in Figure 9, to ensure that the response-time is less than a specific node deadline. The way it does this is by combining admission control and service control in every node. The goal of the service controller is to ensure there is adequate processing capacity in the nodes. It does so by dynamically scaling the processing capacity according to a control-law. However, since the system is targeting a cloud-environment, the incoming traffic may be very dynamic. Moreover, the amount of processing capacity provided to the nodes may vary over time (i.e., servers might crash, etc.). The goal of the admission controller is to always ensure that the response-time of the node is less than the node deadline. If there is sufficient processing capacity to serve the incoming traffic, it will not have to discard any packets. However, should a node find itself in a situation where there is not sufficient processing capacity to meet the incoming traffic, then the admission controller will have discard packets in order to guarantee that the response-time of the node will be less than the node deadline. 1 1 2 2 3 3 Trade-off: alpha and time to accept a new flow To evaluate the impact of ? on the time required to accept a new flow, we will use the system presented in Section 5.1 and evaluate how long it takes the system to reach different goal points D? ? D(G+). Using a network with 5 nodes and 3 flows, we used the following set-up: 1. Choose the order-of-magnitude for the end-to-end deadlines D ? {0.1, 1, 10} (milliseconds) as well as a value for ? ? [10?3, 100]. 2. Assign a randomly generated end-to-end deadline to each flow, Dj ? U (0.7 ? D, 1.3 ? D). Note that Dj is drawn from a uniform distribution. 3. Chose the goal point D? as the solution to the optimal node-deadline problem, presented in [17], but adapted with the constraints of Eq. (16): minimize subject to P 1/Di i?V(t) Di ? 0 `j P (1 + ?)`j?iDpj(i) ? Dj ?j ? F (t) i=1 ?i ? V(t) 4. Simulate how long it takes the network to reach the desired goal point D?. 5. Repeated steps 2 thru 4 for 100 simulations. 6. From the 100 simulations, compute the average time to reach a goal point D?. 7. Repeat steps 1 thru 6 for a different choices of D and ?. The result of the evaluation is shown in Figure 10. As expected, it shows a clear relationship between ? and the time needed to reach D?. It is interesting to note that even when having end-to-end deadlines in the order of 10ms, and a very small ? the resource manager is still able to reach D? in less than a second. By allowing a higher ?, it is possible to reach D? in less than a microsecond. 5.3 Trade-off: alpha and quality of service To evaluate how different choices of ? affect the QoS provided by the nodes in the network we will again use the system briefly presented in Section 5.1. As mentioned there, the system used an admission controller and a service controller to ensure that the response-time of the nodes always remained less than their node deadlines. (23) 10?8 10?3 10?2 10?1 alpha 100 where ? is the fraction of packets which are dropped. The evaluation was performed using a network with 5 nodes and 3 flows, together with the following set-up: 1. Choose a processing uncertainty ?? ? {0.05, 0.1, 0.2, 0.4} and a value of ? ? [10?3, 100].. 2. Generate traffic based on data from the Swedish University Network (SUNET) and simulate the system for 20 seconds. A typical traffic pattern is illustrated in Figure 11a. 3. Compute the QoS for the simulation. 4. Repeat steps 2 and 3 for another 100 simulations. 5. Compute the average QoS for this choice of ?? and ?. 6. Repeat steps 1 thru 5 for a new choice of ?? and ?. Some comments on the processing uncertainty above is that if ?? = 0.2 a node in the network might believe it has a processing capacity of 1000 packets per second (pps), but in reality it could only handle 800 pps. Therefore, the higher ??, the higher the probability is that the node has a lack of available processing capacity. The result of the evaluation, presented in Figure 11b, where the y-axis show the quality of service and the x-axis show the values of ?. The different colors highlight the different bounds on the uncertainty for the processing capacity. An interesting observation is that the QoS does not depend so much on the uncertainty ?? as it does on the choice of ?. As expected, when ? increases, the QoS goes down. However, even for large values of ?, the QoS remains fairly high, i.e., above 0.9980. This means that 99.8% of all the packets make it through the system on time. 6 Related work Despite the fact that the addressed problem in this paper comes from very recent technology advancements (e.g. cloud computing and 5G), it is possible to abstract it in a way where related results can be found over quite vast a spectrum of older contributions. In an abstract way, the problem presented in this paper can be decomposed into the following sub-components: providing end-to-end deadline guarantees for flows in a network, splitting end-to-end deadlines into local deadlines. flow 1 flow 2 flow 3 ?? = 0.05 ?? = 0.1 ?? = 0.2 ?? = 0.4 10 000 000 alpha 100 (a) Traffic for one of the simulations. (b) Evaluation of ? and the quality-of-service. Figure 11 Simulation to evaluate how ? affect QoS of the system. It highlights that despite a highly varying traffic going through the flows (as depicted in Figure 11a) the QoS remains high, even for a value of ? close to 1. In fact, it shows that for ? = 0.1 about 99.8% of all the packets made it through the system and met their end-to-end deadlines. However, to the best of our knowledge, no work has considered all of these sub-components together in the context of a dynamic network topology. A considerable amount of previous works addresses the deadline guarantee of a sequence of jobs that needs to be processed at a given node of a network [24, 20, 21]. Within each node, jobs are scheduled by any single processor scheduling policy (FP, EDF, or else). The communication between nodes is modelled by propagating the jitter [24, 20] or the offset [21] of the task execution within a node. Gerber et al. [5] proposed an alternate method to translate end-to-end deadlines over a directed graph of nodes into constraints on the activation periods of the tasks running at the intermediate nodes. In the context of compositional analysis, previous works have addressed the problem of isolating and composing a single flow over a network of nodes. Lorente et al. [14] extended the holistic analysis to the case with nodes running at a fraction of computing capacity (abstracted by a bounded-delay time partition with bandwidth and delay). Jayachandran and Abdelzaher [10] developed several transformations (?delay composition algebra?) to reduce the analysis of a distributed system to the single processor case. Serreli et al. [22] proposed a component interface for chains of tasks activated sporadically and an intermediate deadline assignment, which minimises the requested computing capacity. Similarly, Ashjaei et al. [1] proposed resource reservation over each node along the path. In the context of computation happening at ?small? scale, it is worth mentioning the modular analysis by Hamann, Jersak, Richter, Ernst [6]. Such a modular analysis, which found an application in the automotive domain, may well be a source of inspiration to analyze the schedulability within each node and the interaction between nodes. It is, however, orthogonal to our method which focuses on the policies to allow new flows of packets (?event streams? in the terminology of [6]) to be admitted at run time. Also, network calculus [13] and real-time calculus [23] are excellent orthogonal methods to analyze the schedulability within nodes as well as theier interactions. The idea of breaking end-to-end deadlines in local deadlines was also exploited by several authors. Di Natale and Stankovic [2] proposed to split the end-to-end deadline proportionally to the local computation time or to divide equally the slack time. Marinca et al. [15] proposed two methods to assign local deadlines (?Fair Laxity Distribution? and ?Unfair Laxity Distribution?) to balance the distribution of the slack among the flows. Later, Jiang [11] used time slices to decouple the schedulability analysis of each node, reducing the complexity of the analysis. More recently, Hong et al. [8] formulated the local deadline assignment problem as a Mixed-Integer Linear Program (MILP) with the goal of maximising the slack time. The number of local deadlines, however, is very high and makes the resulting optimisation problem hard to solve. Jabob et al. [9] proposed to split among local deadlines by using a deadline ratio ? ? (0, 1) configuration parameter chosen at design-time. Related, but orthogonal to the presented research is the problem of mapping the flows of packets onto the available processing nodes. In the automotive context, Zhu et al. [26] formulated a MILP problem to find a task mapping that minimises the sum of a set of sensitive latencies. Garibay-Mart?nez et al. [4] used heuristics to partition tasks and assigned priorities to tasks sharing the same resource. 7 Conclusion and future works In this work, we presented a framework, which allows applications and cloud-services to dynamically join and leave a system over time. The intuition is that by assigning and controlling how quickly local deadlines of cloud-services may change, it is possible to guarantee the end-to-end deadlines of the applications in presence of flows and nodes dynamically leaving and joining the network. Finally, with extensive simulations we are able to show that with the suggested protocols it is possible to accept new applications in matter of milliseconds. Moreover, we show that the constraints of the protocols does not affect the quality-of-service in a significant way. This preliminary work opens for many research directions. Among them we mention: impact of node policies How can the end-to-end response time benefit from per-flow packet scheduling policies within the nodes? For example, fixed priorities, EDF, etc. decentralized protocol Can the framework be implemented in a decentralized way by exploiting per-node information rather than using the full knowledge, as in this paper? 1 2 3 4 5 6 7 Mohammad Ashjaei , Saad Mubeen, Moris Behnam, Luis Almeida, and Thomas Nolte. Endto-End Resource Reservations in Distributed Embedded Systems . In 22nd IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) , pages 1 - 11 , August 2016 . Marco Di Natale and John A. Stankovic . Dynamic End-to-end Guarantees in Distributed Real Time Systems . In Proceedings of the 15th IEEE Real-Time Systems Symposium , pages 215 - 227 , December 1994 . Neha Gandhi , Dawn M Tilbury, Yixin Diao , J Hellerstein , and Sujay Parekh . Mimo control of an apache web server: Modeling and controller design . In American Control Conference , 2002 . Proceedings of the 2002 , volume 6 , pages 4922 - 4927 . IEEE, 2002 . Task partitioning and priority assignment for distributed hard real-time systems . Journal of Computer and System Sciences , 81 ( 8 ): 1542 - 1555 , 2015 . Richard Gerber , Seongsoo Hong, and Manas Saksena . Guaranteeing Real-Time Requirements with Resource-based Calibration of Periodic Processes . IEEE Transaction on Software Engineering , 21 ( 7 ): 579 - 592 , July 1995 . Arne Hamann , Marek Jersak, Kai Richter, and Rolf Ernst . A framework for modular analysis and exploration of heterogeneous embedded systems . Real-Time Systems , 33 ( 1 ): 101 - 137 , July 2006 . doi: 10 .1007/s11241-006-6884-x. Dan Henriksson , Ying Lu, and Tarek Abdelzaher . Improved prediction for web server delay control . In Real-Time Systems , 2004 . ECRTS 2004. Proceedings. 16th Euromicro Conference on , pages 61 - 68 . IEEE, 2004 . Optimization of Task Allocation and Priority Assignment in Hard Real-time Distributed Systems . ACM Transactions on Embedded Computing Systems , 11 ( 4 ): 85 : 1 - 85 : 30 , January 2013 . doi: 10 .1145/2362336.2362352.


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2019/10747/pdf/LIPIcs-ECRTS-2019-10.pdf

Victor Millnert, Johan Eker, Enrico Bini. End-To-End Deadlines over Dynamic Topologies, LIPICS - Leibniz International Proceedings in Informatics, 2019, 10:1-10:22, DOI: 10.4230/LIPIcs.ECRTS.2019.10