Non-Clairvoyant Precedence Constrained Scheduling

LIPICS - Leibniz International Proceedings in Informatics, Jul 2019

We consider the online problem of scheduling jobs on identical machines, where jobs have precedence constraints. We are interested in the demanding setting where the jobs sizes are not known up-front, but are revealed only upon completion (the non-clairvoyant setting). Such precedence-constrained scheduling problems routinely arise in map-reduce and large-scale optimization. For minimizing the total weighted completion time, we give a constant-competitive algorithm. And for total weighted flow-time, we give an O(1/epsilon^2)-competitive algorithm under (1+epsilon)-speed augmentation and a natural "no-surprises" assumption on release dates of jobs (which we show is necessary in this context). Our algorithm proceeds by assigning virtual rates to all waiting jobs, including the ones which are dependent on other uncompleted jobs. We then use these virtual rates to decide on the actual rates of minimal jobs (i.e., jobs which do not have dependencies and hence are eligible to run). Interestingly, the virtual rates are obtained by allocating time in a fair manner, using a Eisenberg-Gale-type convex program (which we can solve optimally using a primal-dual scheme). The optimality condition of this convex program allows us to show dual-fitting proofs more easily, without having to guess and hand-craft the duals. This idea of using fair virtual rates may have broader applicability in scheduling problems.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://drops.dagstuhl.de/opus/volltexte/2019/10639/pdf/LIPIcs-ICALP-2019-63.pdf

Non-Clairvoyant Precedence Constrained Scheduling

I C A L P Non-Clairvoyant Precedence Constrained Scheduling Sahil Singla 0 1 2 3 Category Track A: Algorithms, Complexity and Games 0 Princeton University and Institute for Advanced Study , USA 1 Naveen Garg Computer Science and Engineering Department, Indian Institute of Technology , Delhi , India 2 Amit Kumar Computer Science and Engineering Department, Indian Institute of Technology , Delhi , India 3 Anupam Gupta Computer Science Department, Carnegie Mellon University , USA We consider the online problem of scheduling jobs on identical machines, where jobs have precedence constraints. We are interested in the demanding setting where the jobs sizes are not known up-front, but are revealed only upon completion (the non-clairvoyant setting). Such precedence-constrained scheduling problems routinely arise in map-reduce and large-scale optimization. For minimizing the total weighted completion time, we give a constant-competitive algorithm. And for total weighted flow-time, we give an O(1/?2)-competitive algorithm under (1 + ?)-speed augmentation and a natural ?no-surprises? assumption on release dates of jobs (which we show is necessary in this context). Our algorithm proceeds by assigning virtual rates to all waiting jobs, including the ones which are dependent on other uncompleted jobs. We then use these virtual rates to decide on the actual rates of minimal jobs (i.e., jobs which do not have dependencies and hence are eligible to run). Interestingly, the virtual rates are obtained by allocating time in a fair manner, using a Eisenberg-Gale-type convex program (which we can solve optimally using a primal-dual scheme). The optimality condition of this convex program allows us to show dual-fitting proofs more easily, without having to guess and hand-craft the duals. This idea of using fair virtual rates may have broader applicability in scheduling problems. 2012 ACM Subject Classification Theory of computation ? Online algorithms; Theory of computation ? Scheduling algorithms and phrases Online algorithms; Scheduling; Primal-Dual analysis; Nash welfare - Funding This research was supported in part by NSF awards CCF-1536002, CCF-1540541, and CCF-1617790, and the Indo-US Joint Center for Algorithms Under Uncertainty. Sahil Singla was supported in part by the Schmidt Foundation. We consider the problem of online scheduling of jobs under precedence constraints. We seek to minimize the average weighted flow time of the jobs on multiple parallel machines, in the online non-clairvoyant setting. Formally, there are m identical machines, each capable ET A CS of one unit of processing per unit of time. A set of [n] jobs arrive online. Each job has a processing requirement pj and a weight wj, and is released at some time rj. If the job finishes at time Cj, its flow or response time is defined to be Cj ? rj. The goal is to give a preemptive schedule that minimizes the total (or, equivalently, the average) weighted flow-time Pj?[n] wj ? (Cj ? rj). The main constraints of our model are the following: (i) the scheduling is done online, so the scheduler does not know of the jobs before they are released; (ii) the scheduler is non-clairvoyant ? when a job arrives, the scheduler knows its weight but not its processing time pj. (It is only when the job finishes its processing that the scheduler knows the job is done, and hence knows pj.); And (iii) there are precedence constraints between jobs given by a partial order ([n], ?): j ? j0 means job j0 cannot be started until j is finished. Naturally, the partial order should respect release dates: if j ? j0 then rj ? rj0. (We will require a stronger assumption for some of our results.) This model for constrained parallelism is a natural one, both in theory and in practice. In theory, this precedence-constrained (and non-clairvoyant!) scheduling model (with other objective functions) goes back to Graham?s work on list scheduling [8]. In practice, most languages and libraries produce parallel code that can be modeled using precedence DAGs [20, 1, 9]. Often these jobs (i.e., units of processing) are distributed among some m workstations or servers, either in server farms or on the cloud, i.e., they use identical parallel machines. 1.1 Our Results and Techniques Weighted Completion Time. We develop our techniques on the problem of minimizing the average weighted completion time Pj wjCj. Our convex-programming approach gives us: I Theorem 1.1. There is a 10-competitive deterministic online algorithm for minimizing the average weighted completion time on parallel machines with both release dates and precedences, in the online non-clairvoyant setting. For this result, at each time t, the algorithm has to know only the partial order restricted to {j ? [n] | rj ? t}, i.e., the jobs released by time t. The algorithmic idea is simple in hindsight: the algorithm looks at the minimal unfinished jobs (i.e., they do not depend on any other unfinished jobs): call them It. If Jt is the set of (already released and) unfinished jobs at time t, then It ? Jt. To figure out how to divide our processing among the jobs in It, we write a convex program that fairly divides the time among all jobs in the larger set Jt, such that (a) these jobs can ?donate? their allocated time to some preceding jobs in It, and that (b) the jobs in It do not get more than 1 unit of processing per time-step. For this fair allocation, we maximize the (weighted) Nash Welfare Pj?Jt wj log Rj, where Rj is the virtual rate of processing given to job j ? Jt, regardless of whether it can currently be run (i.e., is in It). This tries to fairly distribute the virtual rates among the jobs [19], and can be solved using an Eisenberg-Gale-type convex program. (We can solve this convex program in our setting using a simple primal-dual algorithm, see the full version.) The proof of Theorem 1.1 is via writing a linear-programming relaxation for the weighted completion time problem, and fitting a dual to it. Conveniently, the dual variables for the completion time LP naturally fall out of the dual (KKT) multipliers for the convex program! Weighted Flow Time. We then turn to the weighted flow-time minimization problem. We first observe that the problem has no competitive algorithm if there are jobs j that depend on jobs released before rj. Indeed, if OPT ever has an empty queue while the algorithm is processing jobs, the adversary could give a stream of tiny new jobs, and we would be sunk. Hence we make an additional no-surprises assumption about our instance: when a job j is released, all the jobs having a precedence relationship to j are also released at the same time. In other words, the partial order is a collection of disjoint connected DAGs, where all jobs in each connected component have the same release date. A special case of this model has been studied in [20, 1] where each DAG is viewed as a ?hyper-job? and there are no precedence constraints between different hyper-jobs. In this model, we show: I Theorem 1.2. There is an O(1/?2)-competitive deterministic non-clairvoyant online algorithm for the problem of minimizing the average weighted flow time on parallel machines with release dates and precedences, under the no-surprises and (1 + ?)-speedup assumptions. Interestingly, the algorithm for weighted flow-time is almost the same as for weighted completion time. In fact, exactly the same algorithm works for both the completion time and flow time cases, if we allow a speedup of (2 + ?) for the latter. To get the (1 + ?)-speedup algorithm, we give preference to the recently-arrived jobs, since they have a smaller current time-in-system and each unit of waiting proportionally hurts them more. This is along the lines of strategies like LAPS and WLAPS [7]. 1.2 The Intuition Consider the case of unit weight jobs on a single machine. Without precedence constraints, the round-robin algorithm, which runs all jobs at the same rate, is O(1)-competitive for the flow-time objective with a 2-speed augmentation. Now consider precedences, and let the partial order be a collection of disjoint chains: only the first remaining job from each chain can be run at each time. We generalize round-robin to this setting by running all minimal jobs simultaneously, but at rates proportional to length of the corresponding chains. We can show this algorithm is also O(1)-competitive with a 2-speed augmentation. While this is easy for chains and trees, let us now consider the case when the partial order is the union of general DAGs, where each DAG may have several minimal jobs. Even though the sum of the rates over all the minimal jobs in any particular DAG should be proportional to the number of jobs in this DAG, running all minimal jobs at equal rates does not work. (Indeed, if many jobs depend on one of these minimal jobs, and many fewer depend on the other minimal jobs in this DAG, we want to prioritize the former.) Instead, we use a convex program to find rates. Our approach assigns a ?virtual rate? Rj to each job in the DAG (regardless of whether it is minimal or not). This virtual rate allows us to ensure that even though this job may not run, it can help some minimal jobs to run at higher rates. This is done by an assignment problem where these virtual rates get translated into actual rates for the minimal jobs. The virtual rates are then calculated using Nash fairness, which gives us max-min properties that are crucial for our analysis. Analysis Challenges: In typical applications of the dual-fitting technique, the dual variables for each job encode the increase in total flow-time caused by arrival of this job. Using this notion turns out to create problems. Indeed, consider a minimal job of low weight which is running at a high rate (because a large number of jobs depend on it). The increase in overall flow-time because of its arrival is very large. However the dual LP constraints require these dual variables to be bounded by the weights of their jobs, which now becomes difficult to ensure. To avoid this, we define the dual variables directly in terms of the virtual rates of the jobs, given by the convex program. Having multiple machines instead of a single machine creates new problems. The actual rates assigned to any minimal job cannot exceed 1, and hence we have to throttle certain actual rates. Again the versatility of the convex program helps us, since we can add this as a constraint. Arguing about the optimal solution to such a convex program requires dealing with the suitable KKT conditions, from which we can infer many useful properties. We also show in the full version that the optimal solution corresponds to a natural ?water-filling? based algorithm. Finally, we obtain matching results for the case of (1 + ?)-speed augmentation. Im et al. [12] gave a general-purpose technique to translate a round-robin based algorithm to a LAPS-like algorithm. In our setting, it turns out that the LAPS-like policy needs to be run on the virtual rates of jobs. Analyzing this algorithm does not follow in a black-box manner (as prescribed by [12]), and we need to adapt our dual-fitting analysis suitably. 1.3 Related Work and Organization Completion Time. Minimizing Pj wjCj on parallel machines with precedence constraints has O(1)-approximations in the offline setting: Li [16] improves on [11, 18] to give a 3.387 + ?approximation. For related machines, the precedence constraints make the problem much harder: there is a O(log m/ log log m)-approximation [16] improving on a prior O(log m) result [4], and a hardness of ?(1) under certain complexity assumptions [3]. In the online setting, any offline algorithm for (a dual problem to) Pj wjCj gives an clairvoyant online algorithm, losing O(1) factors [11]. Two caveats: it is unclear (a) how to make this algorithm non-clairvoyant, and (b) how to solve the (dual of the) weighted completion time problem with precedences in poly-time. Flow Time without Precedence. To minimize Pj wj(Cj ? rj), strong lower bounds are known for the competitive ratio of any online algorithm even on a single machine [17]. Hence we use speed augmentation [14]. For the general setting of non-clairvoyant weighted flow-time on unrelated machines, Im et al. [13] showed that weighted round-robin with a suitable migration policy yields a (2 + ?)-competitive algorithm using (1 + ?)-speed augmentation. They gave a general purpose technique, based on the LAPS scheduling policy, to convert any such round-robin based algorithm to a (1 + ?)-competitive algorithm while losing an extra 1/? factor in the competitive ratio. Their analysis also uses a dual-fitting technique [2, 10]. However, they do not consider precedence constraints. Flow Time with Precedence. Much less is known for flow-time problems with precedence constraints. For the offline setting on identical machines, [15] give O(1)-approximations with O(1)-speedup, even for general delay functions. In the current paper, we achieve a poly(1/?)-approximation with (1 + ?)-speedup for flow-time. Interestingly, [15] show that beating a n1?c-approximation for any constant c ? [0, 1) requires a speedup of at least the optimal approximation factor of makespan minimization in the same machine environment. However, this lower bound requires different jobs with a precedence relationship to have different release dates, which is something our model disallows. (The full version gives another lower bound showing why we disallow such precedences in the online setting.) In the online setting, [20] introduced the DAG model where each job is a directed acyclic graph (of tasks) released at some time, and a job/DAG completes when all the tasks in it are finished, and we want to minimize the total unweighted flow-time. They gave a (2 + ?)-speed O(?/?)-competitive algorithm, where ? is the largest antichain within any job/DAG. [1] show poly(1/?)-competitiveness with (1 + ?)-speedup, again in the non-clairvoyant setting. The case where jobs are entire DAGs, and not individual nodes within DAGs, is captured in our weighted model by putting zero weights for all original jobs, and adding a unit-weight zero-sized job for each DAG which now depends on all jobs in the DAG. Assigning arbitrary weights to individual nodes within DAGs makes our problem quite non-trivial ? we need to take into account the structure of the DAG to assign rates to jobs. Another model to capture parallelism and precedences uses speedup functions [6, 5, 7]: relating our model to this setting remains an open question. Our work is closely related to Im et al. [12] who use a Nash fairness approach for completion-time and flow-time problems with multiple resources. While our approaches are similar, to the best of our understanding their approach does not immediately extend to the setting with precedences. Hence we have to introduce new ideas of using virtual rates (and being fair with respect to them), and throttling the induced actual rates at 1. The analyses of [12] and our work are both based on dual-fitting; however, we need some new ideas for the setting with precedences. Organization. The weighted completion time case is solved in ?2. A (2 + ?)-speedup result for weighted flow-time is in ?3. In the full version we improve this to a (1 + ?)-speedup. There we also show the need for the ?no-surprises? assumption on release dates, how to solve the convex program using a ?water-filling? based algorithm, and the missing proofs. 2 Minimizing Weighted Completion Time In this section, we describe and analyze the scheduling algorithm for the problem of minimizing weighted completion time on parallel machines. Recall that the precedence constraints are given by a DAG G, and each job j has a release date rj , processing size pj and weight wj . 2.1 The Scheduling Algorithm We first assume that each of the m machines run at rate 2 (i.e., they can perform 2 units of processing in a unit time). We will show later how to remove this assumption (at a constant loss of competitive ratio). We begin with some notation. We say that a job j is waiting at time t (with respect to a schedule) if rj ? t, but j has not been processed to completion by time t. We use Jt to denote the set of waiting jobs at time t. Note that at time t, the algorithm gets to see the subgraph Gt of G which is induced by the jobs in Jt. We say that a job j is unfinished at time t if it is either waiting at time t, or its release date is at least t (and hence the algorithm does not even know about this job). Let Ut denote the set of unfinished jobs at time t. Clearly, Jt ? Ut. At time t, the algorithm can only process those jobs in Jt which do not have a predecessor in Gt ? denote these minimal jobs by It: they are independent of all other current jobs. For every time t, the scheduling algorithm needs to assign a rate to each job j ? It. We now describe how it decides on these rates. Consider a time t. The algorithm considers a bipartite graph Ht = (It, Jt, Et) with vertex set consisting of the minimal jobs It on left and the waiting jobs Jt on right. Since It ? Jt, a job in It appears as a vertex on both sides of this bipartite graph. When there is no confusion, we slightly overload terminology by referring to a job as a vertex in Ht. The set of edges Et are as follows: let jl ? It, jr ? Jt be vertices on the left and the right side respectively. Then (jl, jr) is an edge in Et if and only if there is a directed path from jl to jr in the DAG Gt. The following convex program now computes the rate for each vertex in It. It has variables zet for each edge e ? Et. For each job j on the left side, i.e., for j ? It, define Ltj := Pe??j zet as the sum of ze values of edges incident to j. Similarly, define Rjt := Pe??j zet for a job j ? Jt, i.e., on the right side. The objective function is the Nash bargaining objective function on the Rjt values, which ensures that each waiting job gets some attention. In the full version we give a combinatorial algorithm to efficiently solve this convex program. max X wj ln Rjt j?Jt Ltj = Rjt = X j0?Jt:(j,j0)?Et X j0?It:(j0,j)?Et t zjj0 t zj0j t Lj ? 1 X Lj ? m t j?It t ze ? 0 Let (z?t, L?t, R?t) be an optimal solution to the above convex program. We define the rate of a job j ? It as being L?t . j Although we have defined this as a continuous time process, it is easy to check that the rates only change if a new job arrives, or if a job completes processing. Also observe that we have effectively combined the m machines into one in this convex program. But assuming that all events happen at integer times, we can translate the rate assignment to an actual schedule as follows. For a time slot [t, t + 1], the total rate is at most m (using (4)), so we create m time slots [t, t + 1]i, one for each machine i, and iteratively assign each job j an interval of length L?tj within these time slots. It is possible that a job may get assigned intervals in two different time slots, but the fact that L?tj ? 1 means it will not be assigned the same time in two different time slots. Further, we will never exceed the slots because of (4). Thus, we can process these jobs in the m time slots on the m parallel machines such that each job j gets processed for L?tj amount of time and no job is processed concurrently on multiple machines. This completes the description of the algorithm; in this, we assume that we run the machines at twice the speed. Call this algorithm A. The final algorithm B, which is only allowed to run the machines at speed 1, is obtained by running A in the background, and setting B to be a slowed-down version of A. Formally, if A processes a job j on machine i at time t ? R?0, then B processes this at time 2t. This completes the description of the algorithm. 2.2 A Time-Indexed LP formulation We use the dual-fitting approach to analyze the above algorithm. We write a time-indexed linear programming relaxation (LP) for the weighted completion time problem, and use the solutions to the convex program (CP) to obtain feasible primal and dual solutions for (LP) which differ by only a constant factor. We divide time into integral time slots (assuming all quantities are integers). Therefore, the variable t will refer to integer times only. For every job j and time t, we have a variable xj,t which denotes the volume of j processed during [t, t + 1]. Note that this is defined only for t ? rj. The LP relaxation is as follows: min t?xj,t Pj,t wj ? pj Pt?rj xpjj,t ? 1 ?j Pj xj,t ? m ?t Ps?t xpjj,s ? Ps?t xpjj00,s ?t, j ? j0 (CP) (1) (2) (3) (4) (5) (LP) (6) (7) (8) The following claim, whose proof is deferred to the full version, shows that it is a valid relaxation. B Claim 2.1. Let opt denote the weighted completion time of an optimal off-line policy (which knows the processing time of all the jobs). Then the optimal value of the LP relaxation is at most opt. The (LP) has a large integrality gap. Observe that the LP just imagines the m machines to be a single machine with speed m. Therefore, (LP) has a large integrality gap for two reasons: (i) a job j can be processed concurrently on multiple machines, and (ii) suppose we have a long chain of jobs of equal size in the DAG G. Then the LP allows us to process all these jobs at the same rate in parallel on multiple machines. We augment the LP lower bound with another quantity and show that the sum of these two lower bounds suffice. A chain C in G is a sequence of jobs j1, . . . , jk such that j1 ? j2 ? . . . ? jk. Define the processing time of C, p(C), as the sum of the processing time of jobs in C. For a job j, define chainj as the maximum over all chains C ending in j of p(C). It is easy to see that Pj wj ? (rj + chainj) is a lower bound (up to a factor 2) on the objective of an optimal schedule. We now write down the dual of the LP relaxation above. We have dual variables ?j for every job j, and ?t for every time t, and ?s,j?j0 max X ?j ? m X ?t j t ?j ? wj ? t + X X ?s,j?j0 ? s?t j?j0 X ?s,j0?j ? pj ? ?t ?j, t ? rj j0?j ?j, ?t ? 0 We write the dual constraint (9) in a more readable manner. For a job j and time s, let ?sin,j denote Pj0?j ?s,j0?j, and define ?so,ujt similarly. We now write the dual constraint (9) as ?j ? wj ? t + X s?t ?so,ujt ? ?sin,j ? pj ? ?t ?j, t ? rj 2.3 Properties of the Convex Program We now prove certain properties of an optimal solution (z?t, L?t, R?t) to the convex program (CP). The first property, whose proof is deferred to the full version, is easy to see: B Claim 2.2. If Pj?It L?tj < m, then L?tj = 1 for all j ? It. We now write down the KKT conditions for the convex program. (In fact, we can use (1) and (2) to replace L?tj and R?jt in the objective and the other constraints.) Then letting ?jt ? 0, ?t ? 0, ?et ? 0 be the Lagrange multipliers corresponding to constraints (3), (4) and (5), we get Rw?jt = ?jt0 + ?t ? ?e t j ?jt (L?tj ? 1) = 0 ?t ?t (Pj?It Lj ? m) = 0 ?et ? z?et = 0 ?e = (j0, j), j0 ? It, j ? Jt B Claim 2.3. Consider a job j ? Jt on the right side of Ht. Then wj ? R?jt ? ?t. B Claim 2.4. Consider a job j ? Jt on the right side of Ht. Suppose j has a neighbor j0 ? It such that L?tj0 < 1 and z?jt0j > 0. Then wj = R?jt ? ?t. A crucial notion is that of an active job: I Definition 2.5 (Active Jobs). A job j ? Jt is active at time t if it has at least one neighbor in It (in the graph Ht) running at rate strictly less than 1. Let Jtact denote the set of active jobs at time t. We can strengthen the above claim as follows. I Corollary 2.6. Consider an active job j at time t. Then wj = R?jt ? ?t. B Claim 2.7. w(Jtact)/m ? ?t ? w(Jt)/m. 2.4 Analysis via Dual Fitting We analyze the algorithm A first. We define feasible dual variables for (DLP) such that the value of the dual objective function (along with the chainj values that capture the maximum processing time over all chains ending in j) forms a lower bound on the weighted completion time of our algorithm. Intuitively, ?j would be the weighted completion time of j, and ?t would be 1/2m times the total weight of unfinished jobs at time t. Thus, Pj ?j ? m Pt ?t would be at 1/2 times the total weighted completion time. This idea works as long as all the machines are busy at any point of time, the reason being that the primal LP essentially views the m machines as a single speed-m machine. Therefore, we can generate enough dual lower bound if the rate of processing in each time slot is m. If all machines are not busy, we need to appeal to the lower bound given by the chainj values. We use the notation used in the description of the algorithm. In the graph Ht, we had assigned rates L?tj to all the nodes j in It. Recall that a vertex j ? Jt on the right side of Ht is said to be active at time t if it has a neighbor j0 ? It for which L?tj0 < 1. Otherwise, we say that j is inactive at time t. We say that an edge e = (jl, jr) ? Et, where jl ? It, jr ? Jt is active at time t if the vertex jr is active. Let At denote the set of active edges in Et. Let e = (jl, jr) be an edge in Et. By definition, there is a path from jl to jr in Gt ? we fix such a path Pe. As before, let Cj denote the completion time of job j. The dual variables are defined as follows: For each job j and time t, we define quantities ?j,t. The dual variable ?j would be equal to Pt?0 ?j,t. Fix a job j. If t ?/ [rj, Cj] we set ?j,t to 0. Now, suppose j ? Jt. Consider the job j as a vertex in Jt (i.e., right side) in the bipartite graph Ht. We set ?j,t to wj if j is active at time t, otherwise it is inactive. For each time t, we set ? to 1/2m ? w(Ut) (Recall that Ut is the set of unfinished jobs at time t). We now need to define ?t,j0?j, where j0 ? j. If j or j0 does not belong to Jt, we set this variable to 0. So assume that j, j0 ? Jt (and so the edge (j0, j) lies in Gt). We define ?t,j0?j := ?t ? X e:e?At,(j0?j)?Pe In other words, we consider all the active edges e in the graph Ht for which the corresponding path Pe contains (j0, j). We add up the fractional assignment z?et for all such edges. This completes the description of the dual variables. We first show that the objective function for (DLP) is close to the weighted completion time incurred by the algorithm. The proof is deferred to the full version. B Claim 2.8. The total weighted completion time of the jobs in A is at most 2(Pj ?j ? m ? Pt ?t) + Pj wj ? (chainj + 2rj). We now argue about feasibility of the dual constraint (9). Consider a job j and time t ? rj. Since ?j,s ? wj for all time s, Ps?t ?j,s ? wj ? t. Therefore, it suffices to show: X ?j,s + X s?t s?t ?so,ujt ? ?sin,j ? pj ? ?t Let tj? be the first time t when the job j appears in the set It. This would also be the first time when the algorithm starts processing j because a job that enters It does not leave It before completion. B Claim 2.9. For any time s lying in the range [rj, tj?), ?j,s + ?so,ujt ? ?sin,j = 0. Proof. Fix such a time s. Note that j ?/ Is. Thus j appears as a vertex on the right side in the bipartite graph Hs, but does not appear on the left side. Let e be in active edge in Hs such that the corresponding path Pe contains j as an internal vertex. Then z?es gets counted in both ?so,ujt and ?sin,j. There cannot be such a path Pe which starts with j, because then j will need to be on the left side of the bipartite graph. There could be paths Pe which end with j ? these will correspond to active edges e incident with j in the graph Ht (this happens only if j itself is active). Let ?(j) denote the edges incident with j. We have shown that (15) (16) ?so,ujt ? ?sin,j = ??t ? X e??(j)?As z?es. If j is not active, the RHS is 0, and so is ?j,s. So we are done. Therefore, assume that j is active. Now, A(s) contains all the edges incident with j, and so, the RHS is same as ??t ? R?t . j But then, Corollary 2.6 implies that ??t ? R?jt = ?wj. Since ?j,s = wj, we are done again. C Coming back to inequality (15), we can assume that t ? tj?. To see this, suppose t < tj?. Then by Claim 2.9 the LHS of this constraint is same as X ?j,s + X s?tj? s?tj? ?so,ujt ? ?sin,j . Since ?t ? ?t? (the set of unfinished jobs can only diminish as time goes on), (15) for time t j follows from the corresponding statement for time tj?. Therefore, we assume that t ? tj?. We can also assume that t ? Cj, otherwise the LHS of this constraint is 0. B Claim 2.10. Let s ? [tj?, Cj] be such that j is inactive at time s. Then ?j,s + ?so,ujt ? ?sin,j ? ?s ? L?js. Proof. We know that ?j,s = 0. As in the proof of Claim 2.9, we only need to worry about those active edges e in Hs for which Pe either ends at j or begins with j. Since any edge incident with j as a vertex on the right side is inactive, we get (let ?(j) denote the edges incident with j, where we consider j on the left side) ?j,s + ?so,ujt ? ?sin,j = ?s ? X e??(j)?A(s) z?e ? ?s ? L?js, s because ?s ? 0 and L?js = Pe??(j) z?es. C B Claim 2.11. Let s ? [tj?, Cj] be such that j is active at time s. Then ?j,s + ?so,ujt ? ?sin,j ? ?s ? L?js. Proof. The argument is very similar to the one in the previous claim. Since j is active, ?j,s = wj. As before we only need to worry about the active edges e for which Pe either ends or begins with j. Any edge which is incident with j on the right side (note that there will only one such edge ? one the one joining j to its copy on the left side of Ht) is active. The following inequality now follows as in the proof of Claim 2.10: ?j,s + ?so,ujt ? ?sin,j ? wj + ?s ? L?js ? ?s ? R?js. The result now follows from Corollary 2.6. C We are now ready to show that (15) holds. The above two claims show that the LHS of (15) is at most PsC=j t ?s ? L?js. Note that for any such time s, the rate assigned to j is L?js, and so, we perform 2 ? L?js amount of processing on j during this time slot. It follows that PsC=j t L?js ? pj/2. Now Claim 2.7 shows that ?s ? w(Us)/m ? w(Ut)/m, and so we get Cj X ?s ? L?js ? pj ? w(Ut) = pj ? ?t. s=t 2m This shows that (15) is satisfied. We can now prove our algorithm is constant competitive. I Theorem 2.12. The algorithm B is 10-competitive. Proof. We first argue about A. We have shown that the dual variables are feasible to (DLP), and so, Claim 2.8 shows that the total completion time of A is at most 2opt + Pj wj(chainj + 2rj), where opt denotes the optimal off-line objective value. Clearly, opt ? Pj wj ? rj and opt ? Pj wj ? chainj. This implies that A is 5-competitive. While going from A to B the completion time of each job doubles. J 3 Minimizing Weighted Flow Time We now consider the setting of minimizing the total weighted flow time, again in the nonclairvoyant setting. The setting is almost the same as in the completion-time case: the major change is that all jobs which depend on each other (i.e., belong to the same DAG in the ?collection of DAGs view? have the same release date). In the full version we show that if related jobs can be released over time then no competitive online algorithms are possible. As before, let Jt denote the jobs which are waiting at time t, i.e., which have been released but not yet finished, and let Gt be the union of all the DAGs induced by the jobs in Jt. Again, let It denote the minimal set of jobs in Jt, i.e., which do not have a predecessor in Gt and hence can be scheduled. I Theorem 3.1. There exists an O(1/?)-approximation algorithm for non-clairvoyant DAG scheduling to minimize the weighted flow time on m parallel machines, when there is a speedup of 2 + ?. The rest of this section gives the proof of Theorem 3.1. The algorithm remains unchanged from ?2 (we do not need the algorithm B now): we write the convex program (CP) as before, which assign rates L?tj to each job j ? It. The analysis again proceeds by writing a linear programming relaxation, and showing a feasible dual solution. The LP is almost the same as (LP), just the objective is now (with changes in dashed box): X wj ? j,t (t ? rj) ? xj,t pj . ?j + X s?t ?so,ujt ? ?sin,j ? ?t ? pj + wj (t ? rj) . Hence, the dual is also almost the same as (DLP): the new dual constraint requires that for every job j and time t ? rj: (17) The variable ?t := (1w+(J?)tm) . Recall that the machines are allowed 2(1 + ?)-speedup. The definition of the ? variables changes as follows. Let (j0 ? j) be an edge in the DAG Gt. Earlier we had considered paths Pe containing (j0 ? j) only for the active edges e. But now we include all edges. Moreover, we replace the multiplier ?t by ?jt, where ?jt := m1 ? Pj0?Jt:j0 j wj0 . In other words, we define ?t,j0?j := ?jt ? X e:e?Ht,(j0?j)?Pe In the following sections, we show that these dual settings are enough to ?pay for? the flow time of our solution (i.e., have large objective function value), and also give a feasible lower bound (i.e., are feasible for the dual linear program). 3.2 The Dual Objective Function We first show that Pj ?j ? m Pt ?t is close to the total weighted flow-time of the jobs. The quantity chainj is defined as before. Notice that chainj is still a lower bound on the flow-time of job j in the optimal schedule because all jobs of a DAG are simultaneously released. The following claim, whose result is deferred to the full version, shows that the dual objective value is close to the weighted flow time of the algorithm. B Claim 3.2. The total weighted flow-time is at most 2? Pj ?j ? m Pt ?t + Pj wj ? chainj . 3.3 Checking Dual Feasibility Now we need to check the feasibility of the dual constraint (17). In fact, we will show the following weaker version of that constraint: (18) J (19) Now we can bound ?j,s by dropping the indicator on the first term to get m ? 1 h wj ? X j0?Js:j0 j R?js0 + R?js ? ?s ? X j0?Jsact:j0?j Rj0 ?s i 1 ? m wjh X R?js0 + X R?js0 i, j0?Js j0?Js the last inequality using Claim 2.3. Simplifying, ?j,s ? m ? wj ? Pj00?Is L?js00 = 2wj. 2 J Here is a slightly different upper bound on ?j,s. I Lemma 3.4. For any time s ? rj, we have ?j,s ? 2?js ? R?js. Proof. The second term in the definition of ?j,s is at most ?js ? R?s, directly using the j definition of ?js. For the first term, assume j is active at time s, otherwise this term is 0. Now Corollary 2.6 shows that wj = ?s ? R?js, so the first term can be bounded as follows: wj X m ?j0?Js:j0 j ?s Rj0 = which completes the proof. R?js ? ?s m ? X j0?Js:j0 j ?s Rj0 (Claim 2.3) ? R?s j m ? X j0?Js:j0 j wj0 = R?js ??js, ?j + 2 X ?so,ujt ? ?sin,j ? ?t ? pj + 2 wj(t ? rj). s?t This suffices to within another factor of 2: indeed, scaling down the ? and ? variables by another factor of 2 then gives dual feasibility, and loses only another factor of 2 in the objective function. We begin by bounding ?j,s in two different ways. I Lemma 3.3. For any time s ? rj, we have ?j,s ? 2wj. Proof. Consider the second term in the definition of ?j,s. This term contains Pj0?Jsact:j0?j wj0 . By Corollary 2.6, for any j0 ? Jsact we have wj0 = R?js0 ? ?s. Therefore, X j0?Jsact:j0?j wj0 ? ?s ? X j0?Jsact:j0?j ?s Rj0 ? ?s ? X R?js0 . j0?Js To prove (18), we write ?j = Pts?=1rj ?j,s + Ps?t ?j,s, and use Lemma 3.3 to cancel the first summation with the term 2wj(t ? rj). Hence, it remains to prove X ?j,s + 2 X s?t s?t Let tj? be the time at which the algorithm starts processing j. We first argue why we can ignore times s < tj? on the LHS of (19). B Claim 3.5. Let s be a time satisfying rj ? s < tj?. Then ?j,s + 2(?so,ujt ? ?sin,j) ? 0. Proof. While computing ?so,ujt ? ?sin,j, we only need to consider paths Pe for edges e in Hs which have j as end-point. Since j does not appear on the left side of Hs, this quantity is equal to ??js ? R?js. The result now follows from Lemma 3.4. C So using Claim 3.5 in (19), it suffices to show X s?max{t,tj?} ?j,s + 2 X s?max{t,tj?} Note that we still have ?t on the right hand side, even though the summation on the left is over times s ? max{t, tj?}. The proof of the following claim is deferred to the full version. Let s be a time satisfying s ? max{t, tj?}. Then ?j,s + 2(?so,ujt ? ?sin,j) ? Hence, the left-hand side of (20) is at most 2(1 + ?)?t ? Ps?max{t,tj?} L?js. However, since job j is assigned a rate of L?js and the machines run at speed 2(1 + ?), we get that this expression is at most pj ? ?t, which is the right-hand side of (20). This proves the feasibility of the dual constraint (18). Proof of Theorem 3.1. In the preceding ?3.3 we proved that the variables ?j/2, ?t/2 and ?t,j0?j satisfy the dual constraint for the flow-time relaxation. Since Pj(?j/2) ? m Pt(?t/2) is a feasible dual, it gives a lower bound on the cost of the optimal solution. Moreover, Pj wj ? chainj is another lower bound on the cost of the optimal schedule. Now using the bound on the weighted flow-time of our schedule given by Claim 3.2, this shows that we have an O(1/?)-approximation with 2(1 + ?)-speedup. J In the full version we show how to use a slightly different scheduling policy that prioritizes the last arriving jobs to reduce the speedup to (1 + ?). 1 2 3 4 5 6 7 8 Kunal Agrawal , Jing Li , Kefu Lu , and Benjamin Moseley . Scheduling Parallel DAG Jobs Online to Minimize Average Flow Time . In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016 , Arlington , VA , USA, January 10 - 12 , 2016 , pages 176 - 189 , 2016 . doi: 10 .1137/1.9781611974331. ch14 . S. Anand , Naveen Garg, and Amit Kumar . Resource augmentation for weighted flow-time explained by dual fitting . In SODA'12 , pages 1228 - 1241 . ACM, New York, 2012 . In Algorithms - ESA 2015 - 23rd Annual European Symposium , Patras, Greece, September 14-16 , 2015 , Proceedings, pages 118 - 129 , 2015 . doi: 10 .1007/978-3- 662 -48350-3_ 11 . Algorithms , 30 ( 2 ): 323 - 343 , 1999 . doi: 10 .1006/jagm. 1998 . 0987 . Jeff Edmonds . Scheduling in the Dark . In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, May 1-4 , 1999 , Atlanta, Georgia, USA, pages 179 - 188 , 1999 . doi: 10 .1145/301250.301299. Jeff Edmonds , Donald D. Chinn , Tim Brecht, and Xiaotie Deng . Non-clairvoyant Multiprocessor Scheduling of Jobs with Changing Execution Characteristics (Extended Abstract) . In Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing , El Paso, Texas, USA, May 4- 6 , 1997 , pages 120 - 129 , 1997 . doi: 10 .1145/258533.258565. ACM Trans. Algorithms , 8 ( 3 ):Art. 28 , 10 , 2012 . doi: 10 .1145/2229163.2229172. R. L. Graham . Bounds for Certain Multiprocessing Anomalies . Bell System Technical Journal , 45 ( 9 ): 1563 - 1581 , 1966 . doi: 10 .1002/j.1538- 7305 . 1966 .tb01709.x. GRAPHENE: packing and dependency-aware scheduling for data-parallel clusters . In 12th USENIX Symposium on Operating Systems Design and Implementation , OSDI 2016 , Savannah , GA , USA, November 2- 4 , 2016 ., pages 81 - 97 , 2016 . URL: https://www.usenix.org/ conference/osdi16/technical-sessions/presentation/grandl_graphene. Anupam Gupta , Ravishankar Krishnaswamy, and Kirk Pruhs . Online Primal-Dual for Nonlinear Optimization with Applications to Speed Scaling . In Approximation and Online Algorithms - 10th International Workshop, WAOA 2012, Ljubljana, Slovenia, September 13-14 , 2012 , Revised Selected Papers, pages 173 - 186 , 2012 . doi: 10 .1007/978-3- 642 -38016-7_ 15 . Leslie A. Hall , Andreas S. Schulz, David B. Shmoys , and Joel Wein . Scheduling to minimize average completion time: off-line and on-line approximation algorithms . Math. Oper. Res. , 22 ( 3 ): 513 - 544 , 1997 . doi: 10 .1287/moor.22.3.513. Sungjin Im , Janardhan Kulkarni, and Kamesh Munagala . Competitive Algorithms from Competitive Equilibria: Non-Clairvoyant Scheduling under Polyhedral Constraints . J. ACM , 65 ( 1 ):3: 1 - 3 : 33 , 2018 . doi: 10 .1145/3136754. Sungjin Im , Janardhan Kulkarni, Kamesh Munagala, and Kirk Pruhs. SelfishMigrate: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors . In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014 , Philadelphia, PA, USA, October 18 - 21 , 2014 , pages 531 - 540 , 2014 . Bala Kalyanasundaram and Kirk Pruhs . Speed is as powerful as clairvoyance . J. ACM , 47 ( 4 ): 617 - 643 , 2000 . Janardhan Kulkarni and Shi Li . Flow-time Optimization for Concurrent Open-Shop and Precedence Constrained Scheduling Models . In Approximation, Randomization, and Combinatorial Optimization . Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22 , 2018 - Princeton, NJ, USA, pages 16 : 1 - 16 : 21 , 2018 . doi: 10 .4230/LIPIcs.APPROX-RANDOM. 2018 . 16 . Shi Li . Scheduling to minimize total weighted completion time via time-indexed linear programming relaxations . In 58th Annual IEEE Symposium on Foundations of Computer Science-FOCS 2017 , pages 283 - 294 . IEEE Computer Soc., Los Alamitos, CA, 2017 . Rajeev Motwani , Steven Phillips, and Eric Torng . Nonclairvoyant scheduling . Theorertical Computer Science , 130 ( 1 ): 17 - 47 , 1994 . Alix Munier , Maurice Queyranne, and Andreas S. Schulz . Approximation bounds for a general class of precedence constrained parallel machine scheduling problems . In Integer programming and combinatorial optimization (Houston, TX, 1998 ), volume 1412 of Lecture Notes in Comput. Sci., pages 367 - 382 . Springer, Berlin, 1998 . doi: 10 .1007/3-540-69346-7_ 28 . John F. Nash. The Bargaining Problem . Econometrica, 18 ( 2 ): 155 - 162 , 1950 . URL: http: //www.jstor.org/stable/1907266. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008 , San Francisco, California, USA, January 20 - 22 , 2008 , pages 491 - 500 , 2008 . URL: http://dl.acm.org/citation.cfm?id= 1347082 . 1347136 .


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2019/10639/pdf/LIPIcs-ICALP-2019-63.pdf

Naveen Garg, Anupam Gupta, Amit Kumar, Sahil Singla. Non-Clairvoyant Precedence Constrained Scheduling, LIPICS - Leibniz International Proceedings in Informatics, 2019, 63:1-63:14, DOI: 10.4230/LIPIcs.ICALP.2019.63