NonClairvoyant Precedence Constrained Scheduling
I C A L P
NonClairvoyant Precedence Constrained Scheduling
Sahil Singla 0 1 2 3
Category Track A: Algorithms, Complexity and Games
0 Princeton University and Institute for Advanced Study , USA
1 Naveen Garg Computer Science and Engineering Department, Indian Institute of Technology , Delhi , India
2 Amit Kumar Computer Science and Engineering Department, Indian Institute of Technology , Delhi , India
3 Anupam Gupta Computer Science Department, Carnegie Mellon University , USA
We consider the online problem of scheduling jobs on identical machines, where jobs have precedence constraints. We are interested in the demanding setting where the jobs sizes are not known upfront, but are revealed only upon completion (the nonclairvoyant setting). Such precedenceconstrained scheduling problems routinely arise in mapreduce and largescale optimization. For minimizing the total weighted completion time, we give a constantcompetitive algorithm. And for total weighted flowtime, we give an O(1/?2)competitive algorithm under (1 + ?)speed augmentation and a natural ?nosurprises? assumption on release dates of jobs (which we show is necessary in this context). Our algorithm proceeds by assigning virtual rates to all waiting jobs, including the ones which are dependent on other uncompleted jobs. We then use these virtual rates to decide on the actual rates of minimal jobs (i.e., jobs which do not have dependencies and hence are eligible to run). Interestingly, the virtual rates are obtained by allocating time in a fair manner, using a EisenbergGaletype convex program (which we can solve optimally using a primaldual scheme). The optimality condition of this convex program allows us to show dualfitting proofs more easily, without having to guess and handcraft the duals. This idea of using fair virtual rates may have broader applicability in scheduling problems. 2012 ACM Subject Classification Theory of computation ? Online algorithms; Theory of computation ? Scheduling algorithms
and phrases Online algorithms; Scheduling; PrimalDual analysis; Nash welfare

Funding This research was supported in part by NSF awards CCF1536002, CCF1540541, and
CCF1617790, and the IndoUS Joint Center for Algorithms Under Uncertainty. Sahil Singla was
supported in part by the Schmidt Foundation.
We consider the problem of online scheduling of jobs under precedence constraints. We seek
to minimize the average weighted flow time of the jobs on multiple parallel machines, in
the online nonclairvoyant setting. Formally, there are m identical machines, each capable
ET
A
CS
of one unit of processing per unit of time. A set of [n] jobs arrive online. Each job has
a processing requirement pj and a weight wj, and is released at some time rj. If the job
finishes at time Cj, its flow or response time is defined to be Cj ? rj. The goal is to give
a preemptive schedule that minimizes the total (or, equivalently, the average) weighted
flowtime Pj?[n] wj ? (Cj ? rj). The main constraints of our model are the following: (i) the
scheduling is done online, so the scheduler does not know of the jobs before they are released;
(ii) the scheduler is nonclairvoyant ? when a job arrives, the scheduler knows its weight but
not its processing time pj. (It is only when the job finishes its processing that the scheduler
knows the job is done, and hence knows pj.); And (iii) there are precedence constraints
between jobs given by a partial order ([n], ?): j ? j0 means job j0 cannot be started until j
is finished. Naturally, the partial order should respect release dates: if j ? j0 then rj ? rj0.
(We will require a stronger assumption for some of our results.)
This model for constrained parallelism is a natural one, both in theory and in practice.
In theory, this precedenceconstrained (and nonclairvoyant!) scheduling model (with other
objective functions) goes back to Graham?s work on list scheduling [8]. In practice, most
languages and libraries produce parallel code that can be modeled using precedence DAGs [20,
1, 9]. Often these jobs (i.e., units of processing) are distributed among some m workstations
or servers, either in server farms or on the cloud, i.e., they use identical parallel machines.
1.1
Our Results and Techniques
Weighted Completion Time. We develop our techniques on the problem of minimizing
the average weighted completion time Pj wjCj. Our convexprogramming approach gives us:
I Theorem 1.1. There is a 10competitive deterministic online algorithm for minimizing the
average weighted completion time on parallel machines with both release dates and precedences,
in the online nonclairvoyant setting.
For this result, at each time t, the algorithm has to know only the partial order restricted
to {j ? [n]  rj ? t}, i.e., the jobs released by time t. The algorithmic idea is simple in
hindsight: the algorithm looks at the minimal unfinished jobs (i.e., they do not depend on
any other unfinished jobs): call them It. If Jt is the set of (already released and) unfinished
jobs at time t, then It ? Jt. To figure out how to divide our processing among the jobs in It,
we write a convex program that fairly divides the time among all jobs in the larger set Jt,
such that (a) these jobs can ?donate? their allocated time to some preceding jobs in It, and
that (b) the jobs in It do not get more than 1 unit of processing per timestep.
For this fair allocation, we maximize the (weighted) Nash Welfare Pj?Jt wj log Rj, where
Rj is the virtual rate of processing given to job j ? Jt, regardless of whether it can currently
be run (i.e., is in It). This tries to fairly distribute the virtual rates among the jobs [19],
and can be solved using an EisenbergGaletype convex program. (We can solve this convex
program in our setting using a simple primaldual algorithm, see the full version.) The proof
of Theorem 1.1 is via writing a linearprogramming relaxation for the weighted completion
time problem, and fitting a dual to it. Conveniently, the dual variables for the completion
time LP naturally fall out of the dual (KKT) multipliers for the convex program!
Weighted Flow Time. We then turn to the weighted flowtime minimization problem. We
first observe that the problem has no competitive algorithm if there are jobs j that depend
on jobs released before rj. Indeed, if OPT ever has an empty queue while the algorithm is
processing jobs, the adversary could give a stream of tiny new jobs, and we would be sunk.
Hence we make an additional nosurprises assumption about our instance: when a job j is
released, all the jobs having a precedence relationship to j are also released at the same time.
In other words, the partial order is a collection of disjoint connected DAGs, where all jobs in
each connected component have the same release date. A special case of this model has been
studied in [20, 1] where each DAG is viewed as a ?hyperjob? and there are no precedence
constraints between different hyperjobs. In this model, we show:
I Theorem 1.2. There is an O(1/?2)competitive deterministic nonclairvoyant online
algorithm for the problem of minimizing the average weighted flow time on parallel machines
with release dates and precedences, under the nosurprises and (1 + ?)speedup assumptions.
Interestingly, the algorithm for weighted flowtime is almost the same as for weighted
completion time. In fact, exactly the same algorithm works for both the completion time and
flow time cases, if we allow a speedup of (2 + ?) for the latter. To get the (1 + ?)speedup
algorithm, we give preference to the recentlyarrived jobs, since they have a smaller current
timeinsystem and each unit of waiting proportionally hurts them more. This is along the
lines of strategies like LAPS and WLAPS [7].
1.2
The Intuition
Consider the case of unit weight jobs on a single machine. Without precedence constraints,
the roundrobin algorithm, which runs all jobs at the same rate, is O(1)competitive for the
flowtime objective with a 2speed augmentation. Now consider precedences, and let the
partial order be a collection of disjoint chains: only the first remaining job from each chain
can be run at each time. We generalize roundrobin to this setting by running all minimal
jobs simultaneously, but at rates proportional to length of the corresponding chains. We can
show this algorithm is also O(1)competitive with a 2speed augmentation. While this is
easy for chains and trees, let us now consider the case when the partial order is the union of
general DAGs, where each DAG may have several minimal jobs. Even though the sum of the
rates over all the minimal jobs in any particular DAG should be proportional to the number
of jobs in this DAG, running all minimal jobs at equal rates does not work. (Indeed, if many
jobs depend on one of these minimal jobs, and many fewer depend on the other minimal
jobs in this DAG, we want to prioritize the former.)
Instead, we use a convex program to find rates. Our approach assigns a ?virtual rate?
Rj to each job in the DAG (regardless of whether it is minimal or not). This virtual rate
allows us to ensure that even though this job may not run, it can help some minimal jobs to
run at higher rates. This is done by an assignment problem where these virtual rates get
translated into actual rates for the minimal jobs. The virtual rates are then calculated using
Nash fairness, which gives us maxmin properties that are crucial for our analysis.
Analysis Challenges: In typical applications of the dualfitting technique, the dual variables
for each job encode the increase in total flowtime caused by arrival of this job. Using this
notion turns out to create problems. Indeed, consider a minimal job of low weight which is
running at a high rate (because a large number of jobs depend on it). The increase in overall
flowtime because of its arrival is very large. However the dual LP constraints require these
dual variables to be bounded by the weights of their jobs, which now becomes difficult to
ensure. To avoid this, we define the dual variables directly in terms of the virtual rates of
the jobs, given by the convex program.
Having multiple machines instead of a single machine creates new problems. The actual
rates assigned to any minimal job cannot exceed 1, and hence we have to throttle certain
actual rates. Again the versatility of the convex program helps us, since we can add this as a
constraint. Arguing about the optimal solution to such a convex program requires dealing
with the suitable KKT conditions, from which we can infer many useful properties. We also
show in the full version that the optimal solution corresponds to a natural ?waterfilling?
based algorithm.
Finally, we obtain matching results for the case of (1 + ?)speed augmentation. Im et
al. [12] gave a generalpurpose technique to translate a roundrobin based algorithm to a
LAPSlike algorithm. In our setting, it turns out that the LAPSlike policy needs to be run
on the virtual rates of jobs. Analyzing this algorithm does not follow in a blackbox manner
(as prescribed by [12]), and we need to adapt our dualfitting analysis suitably.
1.3
Related Work and Organization
Completion Time. Minimizing Pj wjCj on parallel machines with precedence constraints
has O(1)approximations in the offline setting: Li [16] improves on [11, 18] to give a 3.387 +
?approximation. For related machines, the precedence constraints make the problem much
harder: there is a O(log m/ log log m)approximation [16] improving on a prior O(log m)
result [4], and a hardness of ?(1) under certain complexity assumptions [3]. In the online
setting, any offline algorithm for (a dual problem to) Pj wjCj gives an clairvoyant online
algorithm, losing O(1) factors [11]. Two caveats: it is unclear (a) how to make this algorithm
nonclairvoyant, and (b) how to solve the (dual of the) weighted completion time problem
with precedences in polytime.
Flow Time without Precedence. To minimize Pj wj(Cj ? rj), strong lower bounds are
known for the competitive ratio of any online algorithm even on a single machine [17]. Hence
we use speed augmentation [14]. For the general setting of nonclairvoyant weighted flowtime
on unrelated machines, Im et al. [13] showed that weighted roundrobin with a suitable
migration policy yields a (2 + ?)competitive algorithm using (1 + ?)speed augmentation.
They gave a general purpose technique, based on the LAPS scheduling policy, to convert any
such roundrobin based algorithm to a (1 + ?)competitive algorithm while losing an extra
1/? factor in the competitive ratio. Their analysis also uses a dualfitting technique [2, 10].
However, they do not consider precedence constraints.
Flow Time with Precedence. Much less is known for flowtime problems with precedence
constraints. For the offline setting on identical machines, [15] give O(1)approximations
with O(1)speedup, even for general delay functions. In the current paper, we achieve a
poly(1/?)approximation with (1 + ?)speedup for flowtime. Interestingly, [15] show that
beating a n1?capproximation for any constant c ? [0, 1) requires a speedup of at least the
optimal approximation factor of makespan minimization in the same machine environment.
However, this lower bound requires different jobs with a precedence relationship to have
different release dates, which is something our model disallows. (The full version gives
another lower bound showing why we disallow such precedences in the online setting.)
In the online setting, [20] introduced the DAG model where each job is a directed acyclic
graph (of tasks) released at some time, and a job/DAG completes when all the tasks in it are
finished, and we want to minimize the total unweighted flowtime. They gave a (2 + ?)speed
O(?/?)competitive algorithm, where ? is the largest antichain within any job/DAG. [1]
show poly(1/?)competitiveness with (1 + ?)speedup, again in the nonclairvoyant setting.
The case where jobs are entire DAGs, and not individual nodes within DAGs, is captured in
our weighted model by putting zero weights for all original jobs, and adding a unitweight
zerosized job for each DAG which now depends on all jobs in the DAG. Assigning arbitrary
weights to individual nodes within DAGs makes our problem quite nontrivial ? we need
to take into account the structure of the DAG to assign rates to jobs. Another model to
capture parallelism and precedences uses speedup functions [6, 5, 7]: relating our model to
this setting remains an open question.
Our work is closely related to Im et al. [12] who use a Nash fairness approach for
completiontime and flowtime problems with multiple resources. While our approaches are
similar, to the best of our understanding their approach does not immediately extend to the
setting with precedences. Hence we have to introduce new ideas of using virtual rates (and
being fair with respect to them), and throttling the induced actual rates at 1. The analyses
of [12] and our work are both based on dualfitting; however, we need some new ideas for the
setting with precedences.
Organization. The weighted completion time case is solved in ?2. A (2 + ?)speedup result
for weighted flowtime is in ?3. In the full version we improve this to a (1 + ?)speedup.
There we also show the need for the ?nosurprises? assumption on release dates, how to solve
the convex program using a ?waterfilling? based algorithm, and the missing proofs.
2
Minimizing Weighted Completion Time
In this section, we describe and analyze the scheduling algorithm for the problem of minimizing
weighted completion time on parallel machines. Recall that the precedence constraints are
given by a DAG G, and each job j has a release date rj , processing size pj and weight wj .
2.1
The Scheduling Algorithm
We first assume that each of the m machines run at rate 2 (i.e., they can perform 2 units of
processing in a unit time). We will show later how to remove this assumption (at a constant
loss of competitive ratio). We begin with some notation. We say that a job j is waiting at
time t (with respect to a schedule) if rj ? t, but j has not been processed to completion
by time t. We use Jt to denote the set of waiting jobs at time t. Note that at time t, the
algorithm gets to see the subgraph Gt of G which is induced by the jobs in Jt. We say that
a job j is unfinished at time t if it is either waiting at time t, or its release date is at least
t (and hence the algorithm does not even know about this job). Let Ut denote the set of
unfinished jobs at time t. Clearly, Jt ? Ut. At time t, the algorithm can only process those
jobs in Jt which do not have a predecessor in Gt ? denote these minimal jobs by It: they are
independent of all other current jobs. For every time t, the scheduling algorithm needs to
assign a rate to each job j ? It. We now describe how it decides on these rates.
Consider a time t. The algorithm considers a bipartite graph Ht = (It, Jt, Et) with vertex
set consisting of the minimal jobs It on left and the waiting jobs Jt on right. Since It ? Jt, a
job in It appears as a vertex on both sides of this bipartite graph. When there is no confusion,
we slightly overload terminology by referring to a job as a vertex in Ht. The set of edges Et
are as follows: let jl ? It, jr ? Jt be vertices on the left and the right side respectively. Then
(jl, jr) is an edge in Et if and only if there is a directed path from jl to jr in the DAG Gt.
The following convex program now computes the rate for each vertex in It. It has variables
zet for each edge e ? Et. For each job j on the left side, i.e., for j ? It, define Ltj := Pe??j zet
as the sum of ze values of edges incident to j. Similarly, define Rjt := Pe??j zet for a job
j ? Jt, i.e., on the right side. The objective function is the Nash bargaining objective function
on the Rjt values, which ensures that each waiting job gets some attention. In the full version
we give a combinatorial algorithm to efficiently solve this convex program.
max X wj ln Rjt
j?Jt
Ltj =
Rjt =
X
j0?Jt:(j,j0)?Et
X
j0?It:(j0,j)?Et
t
zjj0
t
zj0j
t
Lj ? 1
X Lj ? m
t
j?It
t
ze ? 0
Let (z?t, L?t, R?t) be an optimal solution to the above convex program. We define the rate of a
job j ? It as being L?t .
j
Although we have defined this as a continuous time process, it is easy to check that the
rates only change if a new job arrives, or if a job completes processing. Also observe that we
have effectively combined the m machines into one in this convex program. But assuming
that all events happen at integer times, we can translate the rate assignment to an actual
schedule as follows. For a time slot [t, t + 1], the total rate is at most m (using (4)), so
we create m time slots [t, t + 1]i, one for each machine i, and iteratively assign each job j
an interval of length L?tj within these time slots. It is possible that a job may get assigned
intervals in two different time slots, but the fact that L?tj ? 1 means it will not be assigned
the same time in two different time slots. Further, we will never exceed the slots because
of (4). Thus, we can process these jobs in the m time slots on the m parallel machines such
that each job j gets processed for L?tj amount of time and no job is processed concurrently
on multiple machines. This completes the description of the algorithm; in this, we assume
that we run the machines at twice the speed. Call this algorithm A.
The final algorithm B, which is only allowed to run the machines at speed 1, is obtained
by running A in the background, and setting B to be a sloweddown version of A. Formally,
if A processes a job j on machine i at time t ? R?0, then B processes this at time 2t. This
completes the description of the algorithm.
2.2
A TimeIndexed LP formulation
We use the dualfitting approach to analyze the above algorithm. We write a timeindexed
linear programming relaxation (LP) for the weighted completion time problem, and use the
solutions to the convex program (CP) to obtain feasible primal and dual solutions for (LP)
which differ by only a constant factor.
We divide time into integral time slots (assuming all quantities are integers). Therefore,
the variable t will refer to integer times only. For every job j and time t, we have a variable
xj,t which denotes the volume of j processed during [t, t + 1]. Note that this is defined only
for t ? rj. The LP relaxation is as follows:
min
t?xj,t
Pj,t wj ? pj
Pt?rj xpjj,t ? 1 ?j
Pj xj,t ? m ?t
Ps?t xpjj,s ? Ps?t xpjj00,s
?t, j ? j0
(CP)
(1)
(2)
(3)
(4)
(5)
(LP)
(6)
(7)
(8)
The following claim, whose proof is deferred to the full version, shows that it is a valid
relaxation.
B Claim 2.1. Let opt denote the weighted completion time of an optimal offline policy
(which knows the processing time of all the jobs). Then the optimal value of the LP relaxation
is at most opt.
The (LP) has a large integrality gap. Observe that the LP just imagines the m machines
to be a single machine with speed m. Therefore, (LP) has a large integrality gap for two
reasons: (i) a job j can be processed concurrently on multiple machines, and (ii) suppose
we have a long chain of jobs of equal size in the DAG G. Then the LP allows us to process
all these jobs at the same rate in parallel on multiple machines. We augment the LP lower
bound with another quantity and show that the sum of these two lower bounds suffice.
A chain C in G is a sequence of jobs j1, . . . , jk such that j1 ? j2 ? . . . ? jk. Define
the processing time of C, p(C), as the sum of the processing time of jobs in C. For a
job j, define chainj as the maximum over all chains C ending in j of p(C). It is easy to
see that Pj wj ? (rj + chainj) is a lower bound (up to a factor 2) on the objective of an
optimal schedule.
We now write down the dual of the LP relaxation above. We have dual variables ?j for
every job j, and ?t for every time t, and ?s,j?j0
max
X ?j ? m X ?t
j t
?j ? wj ? t + X X ?s,j?j0 ?
s?t j?j0
X ?s,j0?j ? pj ? ?t ?j, t ? rj
j0?j
?j, ?t ? 0
We write the dual constraint (9) in a more readable manner. For a job j and time s, let
?sin,j denote Pj0?j ?s,j0?j, and define ?so,ujt similarly. We now write the dual constraint (9) as
?j ? wj ? t + X
s?t
?so,ujt ? ?sin,j ? pj ? ?t ?j, t ? rj
2.3
Properties of the Convex Program
We now prove certain properties of an optimal solution (z?t, L?t, R?t) to the convex program (CP).
The first property, whose proof is deferred to the full version, is easy to see:
B Claim 2.2. If Pj?It L?tj < m, then L?tj = 1 for all j ? It.
We now write down the KKT conditions for the convex program. (In fact, we can use (1)
and (2) to replace L?tj and R?jt in the objective and the other constraints.) Then letting
?jt ? 0, ?t ? 0, ?et ? 0 be the Lagrange multipliers corresponding to constraints (3), (4)
and (5), we get
Rw?jt = ?jt0 + ?t ? ?e
t
j
?jt (L?tj ? 1) = 0
?t
?t (Pj?It Lj ? m) = 0
?et ? z?et = 0
?e = (j0, j), j0 ? It, j ? Jt
B Claim 2.3. Consider a job j ? Jt on the right side of Ht. Then wj ? R?jt ? ?t.
B Claim 2.4. Consider a job j ? Jt on the right side of Ht. Suppose j has a neighbor j0 ? It
such that L?tj0 < 1 and z?jt0j > 0. Then wj = R?jt ? ?t.
A crucial notion is that of an active job:
I Definition 2.5 (Active Jobs). A job j ? Jt is active at time t if it has at least one neighbor
in It (in the graph Ht) running at rate strictly less than 1.
Let Jtact denote the set of active jobs at time t. We can strengthen the above claim
as follows.
I Corollary 2.6. Consider an active job j at time t. Then wj = R?jt ? ?t.
B Claim 2.7. w(Jtact)/m ? ?t ? w(Jt)/m.
2.4
Analysis via Dual Fitting
We analyze the algorithm A first. We define feasible dual variables for (DLP) such that the
value of the dual objective function (along with the chainj values that capture the maximum
processing time over all chains ending in j) forms a lower bound on the weighted completion
time of our algorithm. Intuitively, ?j would be the weighted completion time of j, and ?t
would be 1/2m times the total weight of unfinished jobs at time t. Thus, Pj ?j ? m Pt ?t
would be at 1/2 times the total weighted completion time. This idea works as long as all
the machines are busy at any point of time, the reason being that the primal LP essentially
views the m machines as a single speedm machine. Therefore, we can generate enough dual
lower bound if the rate of processing in each time slot is m. If all machines are not busy, we
need to appeal to the lower bound given by the chainj values.
We use the notation used in the description of the algorithm. In the graph Ht, we had
assigned rates L?tj to all the nodes j in It. Recall that a vertex j ? Jt on the right side of Ht
is said to be active at time t if it has a neighbor j0 ? It for which L?tj0 < 1. Otherwise, we say
that j is inactive at time t. We say that an edge e = (jl, jr) ? Et, where jl ? It, jr ? Jt is
active at time t if the vertex jr is active. Let At denote the set of active edges in Et. Let
e = (jl, jr) be an edge in Et. By definition, there is a path from jl to jr in Gt ? we fix such
a path Pe. As before, let Cj denote the completion time of job j. The dual variables are
defined as follows:
For each job j and time t, we define quantities ?j,t. The dual variable ?j would be equal
to Pt?0 ?j,t. Fix a job j. If t ?/ [rj, Cj] we set ?j,t to 0. Now, suppose j ? Jt. Consider
the job j as a vertex in Jt (i.e., right side) in the bipartite graph Ht. We set ?j,t to wj if
j is active at time t, otherwise it is inactive.
For each time t, we set ? to 1/2m ? w(Ut) (Recall that Ut is the set of unfinished jobs at
time t).
We now need to define ?t,j0?j, where j0 ? j. If j or j0 does not belong to Jt, we set this
variable to 0. So assume that j, j0 ? Jt (and so the edge (j0, j) lies in Gt). We define
?t,j0?j := ?t ?
X
e:e?At,(j0?j)?Pe
In other words, we consider all the active edges e in the graph Ht for which the
corresponding path Pe contains (j0, j). We add up the fractional assignment z?et for all
such edges.
This completes the description of the dual variables.
We first show that the objective function for (DLP) is close to the weighted completion
time incurred by the algorithm. The proof is deferred to the full version.
B Claim 2.8. The total weighted completion time of the jobs in A is at most 2(Pj ?j ? m ?
Pt ?t) + Pj wj ? (chainj + 2rj).
We now argue about feasibility of the dual constraint (9). Consider a job j and time
t ? rj. Since ?j,s ? wj for all time s, Ps?t ?j,s ? wj ? t. Therefore, it suffices to show:
X ?j,s + X
s?t s?t
?so,ujt ? ?sin,j ? pj ? ?t
Let tj? be the first time t when the job j appears in the set It. This would also be the
first time when the algorithm starts processing j because a job that enters It does not leave
It before completion.
B Claim 2.9. For any time s lying in the range [rj, tj?), ?j,s + ?so,ujt ? ?sin,j = 0.
Proof. Fix such a time s. Note that j ?/ Is. Thus j appears as a vertex on the right side in
the bipartite graph Hs, but does not appear on the left side. Let e be in active edge in Hs
such that the corresponding path Pe contains j as an internal vertex. Then z?es gets counted
in both ?so,ujt and ?sin,j. There cannot be such a path Pe which starts with j, because then j
will need to be on the left side of the bipartite graph. There could be paths Pe which end
with j ? these will correspond to active edges e incident with j in the graph Ht (this happens
only if j itself is active). Let ?(j) denote the edges incident with j. We have shown that
(15)
(16)
?so,ujt ? ?sin,j = ??t ?
X
e??(j)?As
z?es.
If j is not active, the RHS is 0, and so is ?j,s. So we are done. Therefore, assume that j is
active. Now, A(s) contains all the edges incident with j, and so, the RHS is same as ??t ? R?t .
j
But then, Corollary 2.6 implies that ??t ? R?jt = ?wj. Since ?j,s = wj, we are done again. C
Coming back to inequality (15), we can assume that t ? tj?. To see this, suppose t < tj?.
Then by Claim 2.9 the LHS of this constraint is same as
X ?j,s + X
s?tj? s?tj?
?so,ujt ? ?sin,j .
Since ?t ? ?t? (the set of unfinished jobs can only diminish as time goes on), (15) for time t
j
follows from the corresponding statement for time tj?. Therefore, we assume that t ? tj?. We
can also assume that t ? Cj, otherwise the LHS of this constraint is 0.
B Claim 2.10. Let s ? [tj?, Cj] be such that j is inactive at time s. Then ?j,s + ?so,ujt ? ?sin,j ?
?s ? L?js.
Proof. We know that ?j,s = 0. As in the proof of Claim 2.9, we only need to worry about
those active edges e in Hs for which Pe either ends at j or begins with j. Since any edge
incident with j as a vertex on the right side is inactive, we get (let ?(j) denote the edges
incident with j, where we consider j on the left side)
?j,s + ?so,ujt ? ?sin,j = ?s ?
X
e??(j)?A(s)
z?e ? ?s ? L?js,
s
because ?s ? 0 and L?js = Pe??(j) z?es.
C
B Claim 2.11. Let s ? [tj?, Cj] be such that j is active at time s. Then ?j,s + ?so,ujt ? ?sin,j ?
?s ? L?js.
Proof. The argument is very similar to the one in the previous claim. Since j is active,
?j,s = wj. As before we only need to worry about the active edges e for which Pe either
ends or begins with j. Any edge which is incident with j on the right side (note that there
will only one such edge ? one the one joining j to its copy on the left side of Ht) is active.
The following inequality now follows as in the proof of Claim 2.10:
?j,s + ?so,ujt ? ?sin,j ? wj + ?s ? L?js ? ?s ? R?js.
The result now follows from Corollary 2.6.
C
We are now ready to show that (15) holds. The above two claims show that the LHS
of (15) is at most PsC=j t ?s ? L?js. Note that for any such time s, the rate assigned to j is L?js,
and so, we perform 2 ? L?js amount of processing on j during this time slot. It follows that
PsC=j t L?js ? pj/2. Now Claim 2.7 shows that ?s ? w(Us)/m ? w(Ut)/m, and so we get
Cj
X ?s ? L?js ? pj ? w(Ut) = pj ? ?t.
s=t 2m
This shows that (15) is satisfied. We can now prove our algorithm is constant competitive.
I Theorem 2.12. The algorithm B is 10competitive.
Proof. We first argue about A. We have shown that the dual variables are feasible to (DLP),
and so, Claim 2.8 shows that the total completion time of A is at most 2opt + Pj wj(chainj +
2rj), where opt denotes the optimal offline objective value. Clearly, opt ? Pj wj ? rj and
opt ? Pj wj ? chainj. This implies that A is 5competitive. While going from A to B the
completion time of each job doubles. J
3
Minimizing Weighted Flow Time
We now consider the setting of minimizing the total weighted flow time, again in the
nonclairvoyant setting. The setting is almost the same as in the completiontime case: the major
change is that all jobs which depend on each other (i.e., belong to the same DAG in the
?collection of DAGs view? have the same release date). In the full version we show that if
related jobs can be released over time then no competitive online algorithms are possible.
As before, let Jt denote the jobs which are waiting at time t, i.e., which have been released
but not yet finished, and let Gt be the union of all the DAGs induced by the jobs in Jt.
Again, let It denote the minimal set of jobs in Jt, i.e., which do not have a predecessor in Gt
and hence can be scheduled.
I Theorem 3.1. There exists an O(1/?)approximation algorithm for nonclairvoyant DAG
scheduling to minimize the weighted flow time on m parallel machines, when there is a speedup
of 2 + ?.
The rest of this section gives the proof of Theorem 3.1. The algorithm remains unchanged
from ?2 (we do not need the algorithm B now): we write the convex program (CP) as before,
which assign rates L?tj to each job j ? It. The analysis again proceeds by writing a linear
programming relaxation, and showing a feasible dual solution. The LP is almost the same
as (LP), just the objective is now (with changes in dashed box):
X wj ?
j,t
(t ? rj) ? xj,t
pj
.
?j + X
s?t
?so,ujt ? ?sin,j ? ?t ? pj + wj (t ? rj) .
Hence, the dual is also almost the same as (DLP): the new dual constraint requires that for
every job j and time t ? rj:
(17)
The variable ?t := (1w+(J?)tm) . Recall that the machines are allowed 2(1 + ?)speedup.
The definition of the ? variables changes as follows. Let (j0 ? j) be an edge in the DAG
Gt. Earlier we had considered paths Pe containing (j0 ? j) only for the active edges
e. But now we include all edges. Moreover, we replace the multiplier ?t by ?jt, where
?jt := m1 ?
Pj0?Jt:j0 j wj0 . In other words, we define
?t,j0?j := ?jt ?
X
e:e?Ht,(j0?j)?Pe
In the following sections, we show that these dual settings are enough to ?pay for? the flow
time of our solution (i.e., have large objective function value), and also give a feasible lower
bound (i.e., are feasible for the dual linear program).
3.2
The Dual Objective Function
We first show that Pj ?j ? m Pt ?t is close to the total weighted flowtime of the jobs. The
quantity chainj is defined as before. Notice that chainj is still a lower bound on the flowtime
of job j in the optimal schedule because all jobs of a DAG are simultaneously released. The
following claim, whose result is deferred to the full version, shows that the dual objective
value is close to the weighted flow time of the algorithm.
B Claim 3.2. The total weighted flowtime is at most 2? Pj ?j ? m Pt ?t + Pj wj ? chainj .
3.3
Checking Dual Feasibility
Now we need to check the feasibility of the dual constraint (17). In fact, we will show the
following weaker version of that constraint:
(18)
J
(19)
Now we can bound ?j,s by dropping the indicator on the first term to get
m ?
1 h wj ?
X
j0?Js:j0 j
R?js0 + R?js ? ?s ?
X
j0?Jsact:j0?j
Rj0
?s i
1
? m
wjh X R?js0 +
X R?js0 i,
j0?Js
j0?Js
the last inequality using Claim 2.3. Simplifying, ?j,s ? m ? wj ? Pj00?Is L?js00 = 2wj.
2
J
Here is a slightly different upper bound on ?j,s.
I Lemma 3.4. For any time s ? rj, we have ?j,s ? 2?js ? R?js.
Proof. The second term in the definition of ?j,s is at most ?js ? R?s, directly using the
j
definition of ?js. For the first term, assume j is active at time s, otherwise this term is 0.
Now Corollary 2.6 shows that wj = ?s ? R?js, so the first term can be bounded as follows:
wj
X
m ?j0?Js:j0 j
?s
Rj0
=
which completes the proof.
R?js ? ?s
m ?
X
j0?Js:j0 j
?s
Rj0
(Claim 2.3)
?
R?s
j
m ?
X
j0?Js:j0 j
wj0
=
R?js ??js,
?j + 2 X ?so,ujt ? ?sin,j ? ?t ? pj + 2 wj(t ? rj).
s?t
This suffices to within another factor of 2: indeed, scaling down the ? and ? variables by
another factor of 2 then gives dual feasibility, and loses only another factor of 2 in the
objective function. We begin by bounding ?j,s in two different ways.
I Lemma 3.3. For any time s ? rj, we have ?j,s ? 2wj.
Proof. Consider the second term in the definition of ?j,s. This term contains Pj0?Jsact:j0?j wj0 .
By Corollary 2.6, for any j0 ? Jsact we have wj0 = R?js0 ? ?s. Therefore,
X
j0?Jsact:j0?j
wj0
?
?s ?
X
j0?Jsact:j0?j
?s
Rj0
?
?s ?
X R?js0 .
j0?Js
To prove (18), we write ?j = Pts?=1rj ?j,s + Ps?t ?j,s, and use Lemma 3.3 to cancel the
first summation with the term 2wj(t ? rj). Hence, it remains to prove
X ?j,s + 2 X
s?t s?t
Let tj? be the time at which the algorithm starts processing j. We first argue why we can
ignore times s < tj? on the LHS of (19).
B Claim 3.5. Let s be a time satisfying rj ? s < tj?. Then ?j,s + 2(?so,ujt ? ?sin,j) ? 0.
Proof. While computing ?so,ujt ? ?sin,j, we only need to consider paths Pe for edges e in Hs
which have j as endpoint. Since j does not appear on the left side of Hs, this quantity is
equal to ??js ? R?js. The result now follows from Lemma 3.4. C
So using Claim 3.5 in (19), it suffices to show
X
s?max{t,tj?}
?j,s + 2
X
s?max{t,tj?}
Note that we still have ?t on the right hand side, even though the summation on the left is
over times s ? max{t, tj?}. The proof of the following claim is deferred to the full version.
Let s be a time satisfying s ? max{t, tj?}. Then ?j,s + 2(?so,ujt ? ?sin,j) ?
Hence, the lefthand side of (20) is at most 2(1 + ?)?t ? Ps?max{t,tj?} L?js. However, since
job j is assigned a rate of L?js and the machines run at speed 2(1 + ?), we get that this
expression is at most pj ? ?t, which is the righthand side of (20). This proves the feasibility
of the dual constraint (18).
Proof of Theorem 3.1. In the preceding ?3.3 we proved that the variables ?j/2, ?t/2 and
?t,j0?j satisfy the dual constraint for the flowtime relaxation. Since Pj(?j/2) ? m Pt(?t/2)
is a feasible dual, it gives a lower bound on the cost of the optimal solution. Moreover,
Pj wj ? chainj is another lower bound on the cost of the optimal schedule. Now using the
bound on the weighted flowtime of our schedule given by Claim 3.2, this shows that we have
an O(1/?)approximation with 2(1 + ?)speedup. J
In the full version we show how to use a slightly different scheduling policy that prioritizes
the last arriving jobs to reduce the speedup to (1 + ?).
1
2
3
4
5
6
7
8
Kunal Agrawal , Jing Li , Kefu Lu , and Benjamin Moseley . Scheduling Parallel DAG Jobs Online to Minimize Average Flow Time . In Proceedings of the TwentySeventh Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2016 , Arlington , VA , USA, January 10  12 , 2016 , pages 176  189 , 2016 . doi: 10 .1137/1.9781611974331. ch14 .
S. Anand , Naveen Garg, and Amit Kumar . Resource augmentation for weighted flowtime explained by dual fitting . In SODA'12 , pages 1228  1241 . ACM, New York, 2012 .
In Algorithms  ESA 2015  23rd Annual European Symposium , Patras, Greece, September 1416 , 2015 , Proceedings, pages 118  129 , 2015 . doi: 10 .1007/9783 662 483503_ 11 .
Algorithms , 30 ( 2 ): 323  343 , 1999 . doi: 10 .1006/jagm. 1998 . 0987 .
Jeff Edmonds . Scheduling in the Dark . In Proceedings of the ThirtyFirst Annual ACM Symposium on Theory of Computing, May 14 , 1999 , Atlanta, Georgia, USA, pages 179  188 , 1999 . doi: 10 .1145/301250.301299.
Jeff Edmonds , Donald D. Chinn , Tim Brecht, and Xiaotie Deng . Nonclairvoyant Multiprocessor Scheduling of Jobs with Changing Execution Characteristics (Extended Abstract) . In Proceedings of the TwentyNinth Annual ACM Symposium on the Theory of Computing , El Paso, Texas, USA, May 4 6 , 1997 , pages 120  129 , 1997 . doi: 10 .1145/258533.258565.
ACM Trans. Algorithms , 8 ( 3 ):Art. 28 , 10 , 2012 . doi: 10 .1145/2229163.2229172.
R. L. Graham . Bounds for Certain Multiprocessing Anomalies . Bell System Technical Journal , 45 ( 9 ): 1563  1581 , 1966 . doi: 10 .1002/j.1538 7305 . 1966 .tb01709.x.
GRAPHENE: packing and dependencyaware scheduling for dataparallel clusters . In 12th USENIX Symposium on Operating Systems Design and Implementation , OSDI 2016 , Savannah , GA , USA, November 2 4 , 2016 ., pages 81  97 , 2016 . URL: https://www.usenix.org/ conference/osdi16/technicalsessions/presentation/grandl_graphene.
Anupam Gupta , Ravishankar Krishnaswamy, and Kirk Pruhs . Online PrimalDual for Nonlinear Optimization with Applications to Speed Scaling . In Approximation and Online Algorithms  10th International Workshop, WAOA 2012, Ljubljana, Slovenia, September 1314 , 2012 , Revised Selected Papers, pages 173  186 , 2012 . doi: 10 .1007/9783 642 380167_ 15 .
Leslie A. Hall , Andreas S. Schulz, David B. Shmoys , and Joel Wein . Scheduling to minimize average completion time: offline and online approximation algorithms . Math. Oper. Res. , 22 ( 3 ): 513  544 , 1997 . doi: 10 .1287/moor.22.3.513.
Sungjin Im , Janardhan Kulkarni, and Kamesh Munagala . Competitive Algorithms from Competitive Equilibria: NonClairvoyant Scheduling under Polyhedral Constraints . J. ACM , 65 ( 1 ):3: 1  3 : 33 , 2018 . doi: 10 .1145/3136754.
Sungjin Im , Janardhan Kulkarni, Kamesh Munagala, and Kirk Pruhs. SelfishMigrate: A Scalable Algorithm for Nonclairvoyantly Scheduling Heterogeneous Processors . In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014 , Philadelphia, PA, USA, October 18  21 , 2014 , pages 531  540 , 2014 .
Bala Kalyanasundaram and Kirk Pruhs . Speed is as powerful as clairvoyance . J. ACM , 47 ( 4 ): 617  643 , 2000 .
Janardhan Kulkarni and Shi Li . Flowtime Optimization for Concurrent OpenShop and Precedence Constrained Scheduling Models . In Approximation, Randomization, and Combinatorial Optimization . Algorithms and Techniques, APPROX/RANDOM 2018, August 2022 , 2018  Princeton, NJ, USA, pages 16 : 1  16 : 21 , 2018 . doi: 10 .4230/LIPIcs.APPROXRANDOM. 2018 . 16 .
Shi Li . Scheduling to minimize total weighted completion time via timeindexed linear programming relaxations . In 58th Annual IEEE Symposium on Foundations of Computer ScienceFOCS 2017 , pages 283  294 . IEEE Computer Soc., Los Alamitos, CA, 2017 .
Rajeev Motwani , Steven Phillips, and Eric Torng . Nonclairvoyant scheduling . Theorertical Computer Science , 130 ( 1 ): 17  47 , 1994 .
Alix Munier , Maurice Queyranne, and Andreas S. Schulz . Approximation bounds for a general class of precedence constrained parallel machine scheduling problems . In Integer programming and combinatorial optimization (Houston, TX, 1998 ), volume 1412 of Lecture Notes in Comput.
Sci., pages 367  382 . Springer, Berlin, 1998 . doi: 10 .1007/3540693467_ 28 .
John F. Nash. The Bargaining Problem . Econometrica, 18 ( 2 ): 155  162 , 1950 . URL: http: //www.jstor.org/stable/1907266.
In Proceedings of the Nineteenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2008 , San Francisco, California, USA, January 20  22 , 2008 , pages 491  500 , 2008 .
URL: http://dl.acm.org/citation.cfm?id= 1347082 . 1347136 .