Matroid Coflow Scheduling
I C A L P
Matroid Coflow Scheduling
Manish Purohit Google 0 1 2 3
Mountain View 0 1 2 3
USA 0 1 2 3
0 Category Track C: Foundations of Networks and MultiAgent Systems: Models , Algorithms and Information Management
1 Sungjin Im University of California at Merced , USA
2 Kirk Pruhs University of Pittsburgh , PA , USA
3 Benjamin Moseley Carnegie Mellon University , Pittsburgh, PA , USA
We consider the matroid coflow scheduling problem, where each job is comprised of a set of flows and the family of sets that can be scheduled at any time form a matroid. Our main result is a polynomialtime algorithm that yields a 2approximation for the objective of minimizing the weighted completion time. This result is tight assuming P 6= N P . As a byproduct we also obtain the first (2 + )approximation algorithm for the preemptive concurrent open shop scheduling problem. 2012 ACM Subject Classification Theory of computation ? Scheduling algorithms Funding Sungjin Im: Supported in part by NSF grants CCF1409130 and CCF1617653. Benjamin Moseley: Supported in part by a Google Research Award and NSF grants CCF1617724, CCF1733873 and CCF1725543. Kirk Pruhs: Supported in part by NSF grants CCF1421508 and CCF1535755, and an IBM Faculty Award. Acknowledgements We thank the anonymous reviewers for their thorough reviews and many helpful suggestions.
and phrases Coflow Scheduling; Concurrent Open Shop; Matroid Scheduling

Coflows were introduced in [5] as: ?We propose coflows, a networking abstraction to express
the communication requirements of prevalent data parallel programming paradigms. Coflows
make it easier for the applications to convey their communication semantics to the network,
which in turn enables the network to better optimize common communication patterns.?
Data parallel application frameworks such as MapReduce [9] and Spark [31] have a unique
processing pattern that interleaves local computation with communication across machines.
Due to the size of the large data sets processed, communication often tends to be a bottleneck
in the performance of these platforms and the coflow model abstracts out this bottleneck.
Theoretical work on coflow scheduling has primarily focused on the switch model (also called
matching model) where the underlying network is assumed to have fullbisection bandwidth
and the set of flows that can be scheduled at any time step is restricted to be form a matching.
ET
A
CS
While there are several reasonable formulations/models of scheduling coflows, the following
will be convenient for our purposes. The input consists of a collection J of jobs, where each
job j ? J is comprised of a set Uj of tasks (also called flows), a nonnegative integer wj
and a release time rj. Each task e ? Uj has a processing requirement pe. For example,
in the setting of a network supporting MapReduce [9] computations, each job could be a
MapReduce job, and a task/flow could represent a required communication within a shuffle
phase of a job. Let U = ?j?J Uj be the collection of all tasks. Further the input contains
a downwardclosed set system M = (U, I). Here I ? 2U and elements of I are called the
independent sets of M. Conceptually a collection of tasks is independent (and in I) if they
can be simultaneously scheduled by the network. A feasible output is a schedule ? that
schedules all the flows. That is for each integer time t, ? specifies a collection ?t of tasks
processed/scheduled at time t. In order to be feasible, ? must satisfy the conditions that:
every task e ? U is scheduled for pe time steps, and
at each time t, the scheduled tasks/flows ?t are in I.
A job j completes at the first time Cj such that every task in Uj has been scheduled fully.
The objective is to minimize the total weighted completion time of the jobs. That is, to
minimize Pj wjCj.
In this paper, we consider coflow scheduling when the set system M forms a matroid.
The starting point for our investigations is the question whether there is an algorithm to
effectively schedule coflows that involve aggregating information, stored at various locations
in a network, to a common sink location. Such gathering communication patterns were
identified as common in [5]. We model aggregation communications by assuming that for
each job j, Uj is a collection of locations in the network where the units of information
needed for job j are stored. It is natural to define the independent sets to be locations that
can simultaneously be routed to the sink without violating any capacity constraint of the
network. In this case, M is a matroid, and more specifically, a gammoid. Note that the
symmetric problem, of disseminating data from a fixed location to various locations in the
network, is also common, and essentially equivalent to the aggregation problem.
The matroid coflow scheduling problem as defined here also naturally captures a number
of wellstudied scheduling problems.
Parallel Identical Machines Scheduling: Each job j has a single task. The matroid
M = (U, I) is the uniform matroid of rank m, i.e., any set of m jobs can be scheduled in
parallel.
(Preemptive) Concurrent Open Shop Scheduling: In the concurrent open shop scheduling
problem, each job j comprises of m tasks, one on each machine, i.e. Uj = {tij}im=1. Task
tij needs to be scheduled for time pij and the job is completed when all its tasks are
n
completed. To model this setting, consider Ti = {tij}j=1 to be set of all tasks that need
to be scheduled on machine i. M is a partition matroid that ensures that a set S of tasks
is independent if and only if S ? Ti ? 1 for each machine i.
1.1
Our Contributions
We first consider coflow scheduling on unit length tasks when M is a matroid. Our main
result is:
I Theorem 1. There is a deterministic polynomialtime algorithm for coflow scheduling with
unit length tasks, when M is a matroid, that is 2approximate with respect to the objective of
minimizing total weighted completion time.
We note that Theorem 1 can be extended to the case that tasks may have arbitrary
processing times, albeit at a slight loss in the approximation factor.
I Theorem 2. There is a deterministic polynomialtime algorithm for coflow scheduling with
arbitrary length tasks, when M is a matroid, that is (2 + )approximate with respect to the
objective of minimizing total weighted completion time, for any constant > 0.
As with all the approximation results for coflow scheduling in the literature, our algorithm
is based on rounding a natural timeindexed linear program. Intuitively the rounding extracts
a deadline Cj? for each job j. This time is roughly 1/? times later than the first time when
every task in Uj has been scheduled at least to the extent ? in the solution to LP. Here
the value of ? is randomly chosen. The expected value of Cj? is shown to be at most twice
the fractional completion time for j in the solution to LP; this ?stretching? (also called
slowmotion) idea has been used in other scheduling contexts [12, 22, 27]. This can be
viewed as deriving from the LP a fractional schedule where each job j is fully completed by
time C?. Then, we observe that the problem of scheduling tasks to meet the Cj? deadlines
j
can be expressed as a matroid intersection problem. As the matroid intersection polytope
is integral [26], one can find an integral schedule meeting these deadlines. Finally, by
derandomizing the random choice of ?, we derive our main theorem.
The approximation guarantee in Theorem 1 is tight assuming P 6= N P . This is because
it is NPhard to approximate the total weighted completion time for concurrent open shop
(even with unit sized tasks) within a factor of 2 ? [23], and this problem is a special case of
matroid coflow scheduling, where the matroid is a partition matroid. Somewhat surprisingly,
even for the concurrent open shop scheduling with release times, the previous best known
approximation factor was 3 [10, 17]. (See also additional discussion in [2].) Thus, Theorem 2
immediately yields an improved approximation algorithm for preemptive concurrent open
shop with arbitrary release times.
I Corollary 3. There is a deterministic, polynomialtime (2 + ) approximation algorithm
for the preemptive concurrent open shop scheduling problem when jobs have arbitrary release
times, for any constant > 0. If all the release times and processing requirements are
polynomially bounded, then the approximation guarantee improves to 2.
We believe our primary technical contribution is the highlevel approach to reduce a
weighted completion time scheduling problem to a deadlineconstrained scheduling problem.
Our approach to first extract a deadline for each job from the LP solution and then finding
an integer schedule that meets those deadlines can be viewed as a strict generalization of
processing jobs in increasing order of their completion time derived from the LP, which
has been a very common rounding tool in scheduling literature; e.g. [21, 28, 2]. Our
novel approach allows us to handle the matroid constraint, which we believe is natural and
quite general.
1.2
Related Results
Most of the theoretical/algorithmic work on coflows has been on matching coflows [20, 16,
15, 2, 1]. These results essentially abstract out the network by modeling the network as
an nbyn switch, or equivalently a complete bipartite graph, and by modeling supportable
flows by matchings in the graph. This is well motivated in practice as the networks in many
data centers are hierarchical, with higher network elements having higher capacities. Thus a
matching between servers at leaves of the network is a not unreasonable approximation of
a communication supportable by the network. We note that matching coflows correspond
to coflows in our framework when the set system M is an intersection of two partition
matroids. The first constant (16.54) approximation for coflow scheduling in this model was
given in [20]. Currently the best known approximation ratios are 5 for when jobs may have
variable release times, and 4 when all jobs arrive at time 0 [2, 29], respectively. Note that
the 2approximation algorithms claimed in [18] and [11] are both flawed; see [2] and [11] for
the discussion of the flaws.
Jahanjou et al. [14] consider several problems where there is an underlying network with
capacities on the edges. If the tasks are paths in the network, and I consists of collections of
paths that don?t collectively violate any edge capacity, then their work gives a algorithm
for producing a fractional schedule (which is equivalent to time being continuous) that is
O(1)approximate with respect to total weighted completion time. If the tasks are (source,
sink) pairs in the network, and I consists of collections of (source, sink) pairs that can be
simultaneously routed without violating any edge capacity, then their work gives a algorithm
for producing a fractional unsplittable schedule that is O(log E/ log log E)approximate with
respect to total weighted completion time, together with a matching hardness result; here, E
is the number of edges. Our work is not comparable to theirs since different constraints are
addressed and our focus is on integer schedules in contrast to theirs on fractional schedules.
Coflow scheduling is a generalization of the classical concurrent open shop scheduling
problem [3, 4, 10, 17, 19, 23, 30]. Several 2approximation algorithms were shown [4, 10, 17]
via LP rounding. Matching hardness results were shown in [3, 23]. When jobs have different
release times, the same LP relaxations yielded 3approximations [10, 17]. Later, [19] gave a
simple greedy algorithm that matches the best approximation ratio when all jobs arrive at
time 0. Recently, [2] gave a combinatorial 3approximation via a primaldual analysis when
jobs have nonuniform release times.
Coflow scheduling has been actively studied within the networking community; some
examples include [5, 6, 7, 18, 32].
1.3
Organization
The rest of the paper is organized as follows. In Section 2 we give some basic definitions and
notation. In Section 3 we give the linear programming formulation. In Section 4 we explain
how to round a solution to the linear program. In Section 5 we discuss the derandomization.
In Section 6, we discuss the extension to tasks with variable processing times.
2
Definitions and Notations
We first consider the matroid coflow scheduling problem with unit length tasks. We will
discuss three types of schedules, and two types of objectives. In a discretetime schedule,
we consider that time is divided into unit length intervals (also called time slots), and the
schedule specifies the set of jobs processed during each time slot. We let time slot t refer
to the interval of time (t ? 1, t]. In an integer discretetime schedule, at each time slot t,
an independent set in the matroid is scheduled. In a fractional discretetime schedule, at
each time slot t, a convex combination of independent sets from the matroid are scheduled.
In other words, in such a fractional schedule, the set of tasks scheduled at time slot t
can be expressed as PS?I ?S 1S , where PS?I ?S = 1, and 1S is the characteristic vector
corresponding to independent set S ? I. A valid feasible solution is restricted to be an
integer discretetime schedule. On the other hand, during our analysis, we will also consider
continuous schedules. A continuous schedule specifies an independent set of tasks to be
scheduled at each instantaneous time ? (as opposed to during a unitlength time slot).
The completion time Cj of a job j is the first time when all tasks in Uj have been
completed. We let qe(t) : [0, T ] ? {0, 1} denote an indicator function defined for each task
e ? U , where qe(t) = 1 if and only if task e (more precisely an independent set including e)
is scheduled at time t in ?. We let Qe(t) = R t
?=0 qe(? )d? denote the extent to which task e is
scheduled by time t. Let C?j (v) denote the first time when every task in Uj has been scheduled
by extent at least v. The fractional completion time of job j is then C?j = R ?
v=0 C?j (v)dv. We
will use cost(LP) to denote the optimum objective of the LP, which we will describe soon.
3
Linear Program
In this section we give a linear programming formulation LP of our matroid coflow problem
when tasks have unit lengths. Let xj,t be an indicator variable that specifies whether job j
completes at time t. For a task e ? Uj , let ye,t be an indicator variable that specifies whether
task e is assigned to time slot t. Let ?(S) be the rank function of the matroid.1 Let T = U 
be an upper bound on the time by which all tasks can be completed. The formulation of
LP is then:
s.t.
LP :
?j ? J,
?j ? J and ?e ? Uj and ?t ? [T ],
?S ? U and ?t ? [T ],
?j ? J and ?e ? Uj and ?t ? [rj ? 1],
min X wj
j?J
X t ? xj,t
t?[T ]
X xj,t = 1
t
X ye,s ?
s?t
X xj,s
s?t
X ye,t ? ?(S)
e?S
ye,t = 0
x, y ? 0
(1)
(2)
(3)
(4)
(5)
Constraint (1) ensures that every job is scheduled. Constraint (2) ensures that all tasks
of a job j are scheduled to at least the extent that j is completed by time t. Constraint (3)
ensures that at any time step t, the set of tasks assigned to t form an independent set in the
given matroid. Constraint (3) is the only constraint set that can potentially have a
superpolynomial size. However, for each fixed time t, the constraint is just a polymatroid, and
therefore, admits an efficient separation oracle [8, 24, 13]. In case that there are arrival/release
times, constraint (4) ensures that no tasks in Uj are processed before j?s release time rj . The
objective of LP is fractional weighted completion time.
Note that a solution to LP can be viewed as a fractional discrete schedule. We will use
Xj,t := Ps?t xj,s to denote the extent to which job j has been processed by time t, and use
Ye,t := Ps?t ye,s to denote the extent to which task e has been processed by time t.
4
Rounding
In this section, we show how to round an optimal solution to LP to obtain a 2approximate
integral (discrete) schedule. For each job j and v ? (0, 1], define C?j (v) = xj,t
1 (v ? Xj,t?1) +
(t ? 1) if v ? (Xj,t?1, Xj,t], t ? [T ]. Intuitively, C?j (v) is a linear interpolation of the discrete
1 ?(S) is defined as maxS0?S:S0?I S0.
1 C?j (?)e for each job j,
times when job j is partially completed. We set a deadline Cj? = d ?
where ? ? (0, 1] is randomly drawn according to the probability density function f (v) = 2v.
A key portion of the analysis is to show that the expected value of each wj Cj? is at most
twice the contribution of job j to the LP objective.
To analyze the expected value of C?, we construct several schedules from the LP solution.
j
In Subsection 4.1, we will show how to convert a solution of LP to a continuous schedule ?.
In Subsection 4.2 we show how to convert ? into a stretched schedule ??, which is another
continuous schedule parameterized by ? ? (0, 1]. Finally, in Subsection 4.3 we will show how
to convert this continuous schedule into (discretetime) integer schedule with the same cost.
We note that we construct schedules in Subsection 4.1 and 4.2 only for the sake of analysis.
That is, we can obtain a 2approximate integral discrete schedule only using the rounding
algorithm in Subsection 4.3 with the deadlines {Cj?}j .
4.1
Constructing the Continuous Schedule ?
We construct a continuous schedule ? from the solution to LP. For each time t, we first
decompose {ye,t}e?U into a convex combination PS?I ?S 1S of independent sets.2 To create
? this convex combination is ?smeared? across all instantaneous times during (t ? 1, t]. That
is, in ? each independent set S is scheduled for ?i(?2 ? ?1) time units during each infinitesimal
time interval (?1, ?2] ? (t ? 1, t]. This is formalized in Proposition 4. In Lemma 5 we show
that the first time when a job j is scheduled to extent v in ? is at most C?j (v). In Lemma 6
we show that the fractional weighted completion time of ? is a bit less than the objective
value of the solution to LP. This is because any processing of job j done during (t ? 1, t]
has no effect until time t on the LP objective, whereas it can have effect on j?s fractional
weighted completion time of ? during (t ? 1, t], before time t.
I Proposition 4. Consider the schedule ?. For any integer t ? [T ] and (?1, ?2] ? (t ? 1, t],
we have, R??=2?1 qe(? )d? = ye,t(?2 ? ?1).
I Lemma 5. Consider the schedule ?. For any j and v ? (0, 1],
(v ? Xj,t?1) + (t ? 1) if v ? (Xj,t?1, Xj,t], t ? [T ],
C?j (v) ? C?j (v) =:
1
xj,t
and C?j (0) = 0.
Proof. By definition, we have C?j (0) = 0, so let us assume that v > 0. We first show that
C?j (Xj,t) = t. Due to constraint (2), Ye,t ? Xj,t for all e ? Uj . Thus, by construction
of ?, all tasks in Uj are processed by at least Xj,t by time t, i.e., Qe(t) ? Xj,t, meaning
that C?j (Xj,t) ? t. We also have that C?j (Xj,t) ? t since we know by the optimality of
the LP solution that Ye,t = Xj,t for some e ? Uj , therefore, Qe(t) = Xj,t. Thus, we have
C?j (Xj,t) = t = C?j (Xj,t).
Now consider an arbitrary v ? (0, 1]. Let t ? [T ] be such that v ? (Xj,t?1, Xj,t]. Then, it
follows that xj,t 6= 0. Thus, from the above argument, we have C?j (Xj,t) = t. Let tv := C?j (v)
for notational convenience. We want to show C?j (v) ? tv. By Proposition 4 and construction
of ?, we know that the extend to which e is processed by time tv,
Qe(tv) = Ye,t?1 + ye,t(tv ? (t ? 1)) = Ye,t?1 + ye,t (v ? Xj,t?1)
xj,t
2 This is possible because {ye,t}e lies in the polymatroid associated with the matroid rank function ? due
to constraint (3). It is wellknown that this polymatroid is equivalent to the independence set polytope
of the matroid, meaning that {ye,t}e can be expressed as a convex combination of characteristic vectors
of some independent sets. For more details, see Chapter 44 of [25].
First, if ye,t ? xj,t, we immediately have Qe(tv) ? v + Ye,t?1 ? Xj,t?1 ? v due to constraint
1 (v ? Xj,t?1) ? 1, fixing the value of Ye,t = Ye,t?1 + ye,t, the
(2). Otherwise, since xj,t
righthandside decreases when we increase ye,t. Therefore, we have, Qe(t) ? Ye,t?1 ? (xj,t ?
ye,t) + xxjj,,tt (v ? Xj,t?1) = v + Ye,t ? Xe,t ? v, again due to constraint (2). Hence, we have
Qe(tv) ? v for all e ? Uj, which immediately yields C?j(v) ? tv. J
I Lemma 6. Pj?J wj Rv1=0 C?j(v)dv = cost(LP) ? Pj?J wj/2
Proof. It suffices to show that Rv1=0 C?j(v)dv = Pt?[T ] t ? xj,t ? 1/2, since summing this
equation over all j ? J multiplied by their weight wj yields the lemma.
=
=
where the last equality follows from constraint (1).
To construct ?? from ? we ?stretch? the schedule ? by a factor of 1/?. More precisely, if
an independent set S is scheduled in ? during an infinitesimal interval (?1, ?2], the same
independent set is scheduled in ?? during (?1/?, ?2/?]. In Lemma 7 we show that ??
completes job j by time Cj? = d C?j?(?) e. In Lemma 8 we upper bound the expected cost of
Pj wjCj? by twice cost(LP).
I Lemma 7. The schedule ?? completes every job j by time Cj?.
Proof. Lemma 5 shows that C?j(v) ? C?j(v) for all v ? (0, 1], meaning that every task in Uj
is completed by v units by time C?j(v) in ?. Thus, in the stretched schedule ??, every job j
completes by time C?j(?)/?, for any value of ? ? (0, 1]. J
I Lemma 8. E[Pj?J wjCj?] ? 2 cost(LP).
Proof. First note that
X wjE[C?j(?)/?] = X wj
j?J j?J
Thus, we have,
Z 1
v=0
C?j(v)/v ? (2v)dv = 2 X wj
j?J
Z 1
v=0
C?j(v)dv
Eh X wjCj?i = Eh X wjd ?1 C?j(?)ei ? Eh X wj ?1 C?j(?)i + X wj
j?J j?J j j
= 2 X wj
j
Z 1
v=0
C?j(v)dv + X wj
j
= 2 cost(LP) ? X wj/2 + X wj
j j
= 2 cost(LP)
4.3
Constructing a Discrete Integer Schedule
Let ye?,t denote how much task e is processed during time interval (t ? 1, t]. In other words,
task e appears in ye?,t units of independents sets scheduled in ?? during the time interval.
Then, {ye?,t}e?U,t?[T ] satisfies the following:
1. For all j ? J and e ? Uj, Pt?[Cj?]\[rj?1] ye?,t = 1; and .
2. For all S ? U and for all t ? [T ], Pe?S ye?,t ? ?(S),
where the second holds true since {ye?,t}e?U can be expressed as a convex combination of
independent sets scheduled during time interval (t ? 1, t], and therefore, lies in the matroid
polytope. We now interpret {ye?,t} as a fractional point in the intersection of two matroid
polytopes. We create the following two matroids. The new universe U 0 is defined as
U 0 := {(e, t)  t ? [T ], j ? J, e ? Uj s.t. rj ? t ? Cj?}. The first matroid M1 is a partition
matroid that forces to choose at most one element out of {(e, t)}t, for each e ? U . Intuitively,
this ensures that no task is scheduled more than once across times. The second matroid
ensures that elements scheduled at each time t forms an independent set in I. The following
lemma formally defines the second matroid and shows that it is indeed a matroid.
I Lemma 9. Define I2 ? 2U0 such that S0 ? U 0 is in I2 if and only if for any t ? [T ],
{e  (e, t) ? S0} ? I. Then, M2 = (U 0, I2) is a matroid.
Proof. Let I2 denote the family of independent sets of M2. It is straightforward to see
that I2 is downward closed. Thus, it suffices to show that for any A0, B0 ? I2 such
that A0 < B0, there exists (e, t) ? B0 \ A0 such that A0 ? {(e, t)} ? I2. Let Ut0 :=
{(e, t)  j ? J, e ? Uj s.t. rj ? t ? Cj?} denote the subset of U 0 restricted to time t. Consider
any fixed A0, B0 ? I2 such that A0 < B0. Then, consider any fixed time t? such that
A0 ? Ut0?  < B0 ? Ut0? ; such a time t? must exist since {Ut0}t partitions U 0. Then, for some
(e?, t?) ? (B0 ? Ut0? ) \ (A0 ? Ut0? ), it must be the case that {e?} ? {e  (e, t?) ? A0 ? Ut0? } ? I.
This is because B0 has more elements than A0 that are paired up with the fixed time t?, and
therefore, the set of elements appearing in A0 ? Ut0? remains independent with some e? added.
Further, for any other time t, the elements appearing in the pairs of A0 associated with t
remain unchanged, and therefore, is in I. J
Then, it is easy to see that {ye?,t} is a point that lies in the intersection of the polymatroids
that are defined by M1 and M2. Further, {ye?,t} belongs to the base polymatroid of M1; so
we have P(e,t)?U0 ye?,t = U . Since the matroid intersection polytope is wellknown to be
integral [26], meaning that every vertex is an integer point, a maximum independent set in
the intersection of M1 and M2 must have U  elements. Further, we can find such a maximum
independent set in polynomial time. To recap, we have found S0 ? U 0 that is a base of M1
and is independent in M2. This set S0 immediately gives the desired integer schedule where
{e  (e, t) ? S0} is scheduled at each time t. Indeed, due to S0 being a base of M1, every task
in Uj is scheduled exactly once during time interval [rj, Cj?]. Further, S0 being independent
in M2 ensures that the set of tasks scheduled at each time forms an independent set in I.
5
Derandomization
In this section, we discuss how to derandomize the choice of ? ? (0, 1], which was used to
compute the deadlines for the jobs. This will complete the proof of Theorem 1. Let us first
define step values. We say that v ? (0, 1] is a step value if Ps?t xj,s = v for some j ? J
and integer t ? [T ] ? in other words, exactly v fraction of some job j is completed by some
integer time in the LP solution. Let V denote the set of all step values; 1 ? V by definition.
Note that that V  is polynomially bounded in the input size, as the number of variables xj,t
we consider in LP is at most J  ? U .
1 C?j(?)e. This
Recall that in Lemma 8 we showed E[Pj wjCj?] ? 2 cost(LP) when Cj? := d ?
implies there exists a certain value of ? ? (0, 1] such that Pj wjCj? ? 2cost(LP). For the
purpose of derandomization, it suffices to find ? such that Pj wjC?j(?)/? ? 2 Pj wj Rv1=0 C?j(v)dv;
the equality is shown in equation (6) in expectation.
Towards this end, we aim to find ? ? (0, 1] that minimizes Pj wjC?j(?)/?. Suppose ?
was set to a value v ? (v1, v2], where v1 and v2 are two adjacent step values in V . Consider
any fixed job j. Let t ? [T ] be such that v ? (Xj,t?1, Xj,t]. By definition of step values, we
have (v1, v2] ? (Xj,t?1, Xj,t]. Thus, we have C?j(v)/v = x1j,t (1 ? Xj,vt?1 ) + t?v1 . This becomes
a linear function in z over [1/v2, 1/v1) if we set z = 1/v. Therefore, we get a piecewise linear
function g(z) by summing over all jobs multiplied by their weight and considering all pairs
of two adjacent step values in V . We set ? to the the inverse of z?s value that achieves the
global minimum, which can be found in polynomial time.
6
Arbitrary Processing Times
In this section we show how to extend Theorem 1 to allow tasks with arbitrary processing
times with a loss of (1 + ) factor in the approximation ratio for any arbitrary constant
> 0. In this setting, each task e has an arbitrary integer size pe and the task e completes
when pe independent sets including e are scheduled. As before, at each time we can schedule
a set of tasks that is independent in the given matroid and a job completes when all its
tasks complete.
6.1
Compact Linear Program
We first describe our new compact LP relaxation. Let T := Pe pe + maxj rj, which is clearly
an upper bound on the maximum time we need to consider. We define a set of times T
that consists of polynomially many time steps. First, let T include every job?s arrival time.
Next, let T include all times appearing in {b(1 + )ic}0?i?dlog1+ T e+1. In words, T includes
exponentially increasing time steps by a factor of (1 + ) starting from 1 but includes no
times greater than (1 + )2T . Let t1 = 1, t2, . . . , tk, . . . , tK+1 denote the (integer) times in T
in increasing order. Let Ii := [ti, ti+1) where i ? [K]. The idea is to rewrite LP compactly as
follows by replacing timeindexed variables with intervalindexed variables.
min X wj
j?J
X (ti+1 ? 1) ? xj,i
i?[K]
s.t.
?e ? U,
?j ? J ?e ? Uj ?i ? [K],
?S ? U ?i ? [K],
?j ? J ?e ? Uj ?i ? [K] s.t. ti+1 ? rj,
X (ti+1 ? ti)ye,i = pe
i?[K]
X ye,i0 /pe ?
i0?i
X xj,i0
i0?i
X ye,i ? ?(S)
e?S
ye,i = 0
Here, variable xj,i can be viewed as the average fraction of job j that completes per
unit time during Ii; so, when the job j completes during Ii for the first time, we have
Pi0?i xj,i0 = 1. Likewise, ye,i has an analogous meaning for each task e but it denotes the
average unit of task e that is processed per unit time during Ii. Constraint (7) ensures that
all tasks complete eventually. Constraint (9) ensures that the average vector representing how
much each task is processed per unit time during It lies in the polymatroid. Constraint (10)
enforces that no tasks in Uj are processed before j?s arrival time; this is possible since T
includes all jobs arrival times. Before explaining constraint (8), we explain the objective. If
all intervals, {Ii} were of unit length, the objective would be exactly the fractional total
weighted completion time. However, to make the LP compact, when job j completes by
xj,i fraction during interval Ii, we pretend that the fraction completes at the end of Ii,
i.e., ti+1 ? 1. Thus, we overestimate the fractional objective; but since times in Ii differ
by at most (1 + ) factor, our overestimate is by a factor of at most (1 + ). Finally, we
discuss constraint (8), which caps each job?s (cumulative) processed fraction at the analogous
quantity of each task of the job, which is measured as how much the task has been processed
divided by its processing time. We also note that this compact LP admits the same separation
oracle as the one for LP.
6.2
Rounding
As before, we seek to round the optimal LP solution. Recall that we first obtained Cj? :=
d ?1 C?j e and found an integer schedule that completes every job j before C?. We observe that
j
the first procedure is no issue. This is because we can interpret the solution to our compact
LP as a solution to LP. To see this, when a task e is processed by ? amount, pretend that
there exist pe different tasks of unit size and they are processed equally by ?/pe amount.
Thus, we can compute C?j (v) efficiently for any value of v ? (0, 1]. The derandomization can
be done similarly.
6.3
Finding An Integer Schedule
It now remains to find an integer schedule meeting the discovered deadlines, {Cj?}j?J . We
use essentially the same idea of reducing the problem to finding an integer solution to the
intersection of two matroids. However, this reduction requires some careful modifications to
be implemented in polynomial time. Also, we will aim to complete every job j by (1+O( ))Cj?
meeting the deadline slightly loosely.
The main idea is to use the fact that the continuous schedule ?? meeting the deadlines {Cj?}
only changes polynomially many times. This is because the continuous schedule ? before the
stretching is identical at all times during each of the intervals (0, t1 ?1], (t1 ?1, t2], . . . , (tK?1 ?
1, tK ] ? these intervals are stretched into (0, (t1 ? 1)/?], ((t1 ? 1)/?, t2/?], . . . , ((tK?1 ?
1)/?, tK /?], respectively. We split the interval including the time T 0 = U 2/ 2 into two, the
left one ending at U 2/ 2 and the right one starting at U 2/ 2. Here, assume that 1/ is
an integer. We also add time C?j (?)/? for every j ? J and split the intervals accordingly.
To simplify the notation, we recycle the notations Ii. By reindexing the resulting intervals
and merging some initial intervals, we have I0 := (0, T 0], I1, I2, ..., IK0 . We say that an
interval is small if its starting time or ending time is not a power of (1 + ) divided by ?;
more precisely, ((ti?1 ? 1)/?, ti/?] is small if ti?1 or ti is not a power of (1 + ) divided by ?.
Note that there are at most 4J  + 4 ? 8 J
  ? 8U  small intervals since each job?s arrival
time and deadline together can create at most 4 small intervals; the extra four come from
time 0, the final time, and T 0.
For each interval Ii, let Qe(Ii) denote the amount of task e processed during Ii, which
can be easily computed in polynomial time. For each interval, we will construct an integer
schedule that schedules each task as much as the continuous schedule ?? does without using
too many time steps compared to the interval?s length; more precisely, the integer schedule
will process at least dQe(Ii)e units of task e. We categorize the intervals into three groups.
Depending on the category where each interval belongs, we construct an integer schedule
differently or give a different upper bound on the length of the integer schedule. At the end,
we will concatenate the constructed integer intervals in increasing order of times. In the
following, I denotes I?s length.
The first interval, I0 = (0, T 0]. Using the same idea we used for handling unitsized tasks,
we find an integer schedule that processes at least bQe(Ii)c, meeting all job deadlines no
greater than T 0. Note that I0 has a polynomial length; thus, the desired integer schedule
can be computed in polynomial time. Then, we can greedily schedule each task e per unit
time such that Qe(Ii) is not an integer. Note that such a task e hasn?t completed by time
T 0, so the task (more precisely, the job to which the task belongs) has deadline at least T 0.
Therefore, we will be able to charge the extra delay of at most U  to the corresponding job?s
deadline directly.
Ii that is not small, for i ? 1. We seek to construct an integer schedule of length
(1 + O( ))Ii. Towards this end, we do the following. Suppose we divide the interval into
Ii
d U/ e subintervals of length U / ; there can be at most one subinterval of a smaller length
and we will handle it later. Next, for each subinterval of length U / , we try to schedule
U/ Qe(Ii)e units of each task e. Since the length is polynomial in U , we can find an
d Ii
integer schedule of length U / + 1 that schedules b  UIi/ Qe(Ii)c units of each task e. By
scheduling one task per unit time, we can schedule d  UIi/ Qe(Ii)e units of each task e for
U / + 1 + U  ? (U / ) ? (1 + 2 ) time steps. Here, our integer schedule?s length is at
most (1 + 2 ) times the subinterval?s length, U / . This integer schedule is repeated b  UIi/ c
times. We now handle the smaller subinterval of length less than U / . Using a similar
argument, we can process more units of each task than the continuous schedule, using at
most U / + 1 + U  ? 2U / time steps. Here we use the fact that Ii has length significantly
greater than U . To see this, suppose we had not added jobs arrival times, deadlines or T 0
in the process of creating the intervals. Then the intervals preceding Ii have exponentially
decreasing lengths by a factor of (1 + ). Using this observation, we can argue that Ii?s length
is at least /2 times Ii?s starting time. Since Ii?s starting time is greater than T 0, we have
that Ii?s length is at least ( /2) ? T 0 = ( /2) ? (U 2/ 2) = U 2/(2 ). So, we can charge the
number of time steps spent to handle the smaller subinterval, which is at most 2U / , to the
length of Ii. From all these arguments, we can construct an integer schedule of length at
most (1 + 6 )Ii.
Ii that is small, for i ? 1. We seek to construct an integer schedule of length (1+O( ))Ii+
2U / . The whole idea is the same for the intervals that are not small. The only difference
is that we cannot charge the extra time steps we spend to handle the smaller subinterval,
which is at most 2U / , to the length of Ii. Thus, we just use the upper bound on the length
of our integer schedule.
As mentioned before, we concatenate the integer schedules originating from I0, I1, . . . , IK
in this order to obtain the final schedule. It now remains to show that each job completes by
time (1 + O( ))Cj?. We already showed that our integer schedule completes every job j before
its deadline Cj? if it is smaller than T 0. For any other job j, it must be the case that C?j (?)/?
is greater than T 0. Let Ii be the interval including C?j (?)/?. Due to the way the intervals are
constructed, C?j (?)/? must be equal to Ii?s finish time. Our goal is to show that we complete
j not too late compared to Ii?s finish time. That is, we want to show that the total length of
the integer schedules originating from I0, I1, . . . , Ii is at most (1 + O( )) Pi0?i Ii0 . Indeed,
the total length is at most,
I0 + U  +
X
i0=[i]:Ii0 is small
((1 + O( ))Ii + 2U / ) +
X
(1 + O( ))Ii
i0=[i]:Ii0 is not small
i
? X(1 + O( ))Ii0  + U  + (2U / ) ? (8U ) ?
i0=0
i
X(1 + O( ))Ii0  + O( )I0
i0=0
Here, the first inequality follows from the fact that there are at most 8U  small intervals, as
argued above. The second inequality is immediate from I0 = T 0 = U 2/ 2. Therefore, we
have shown that each job completes by time (1 + O( ))Cj?, which establishes that our final
schedule?s objective is at most (1 + O( )) times the compact LP?s optimum. Since we showed
the compact LP lower bounds the optimum times (1 + ), we obtain a 2(1 + )approximate
schedule for arbitrary > 0 by scaling appropriately.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Saksham Agarwal , Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, David Shmoys, and Amin Vahdat . Sincronia: Nearoptimal Network Design for Coflows . In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '18 , pages 16  29 . ACM, 2018 .
Saba Ahmadi , Samir Khuller, Manish Purohit, and Sheng Yang . On Scheduling Coflows . In IPCO , pages 13  24 . Springer, 2017 .
Nikhil Bansal and Subhash Khot . Inapproximability of hypergraph vertex cover and applications to scheduling problems . In ICALP , pages 250  261 . Springer, 2010 .
ZhiLong Chen and Nicholas G Hall. Supply chain scheduling: Conflict and cooperation in assembly systems . Operations Research , 55 ( 6 ): 1072  1089 , 2007 .
In ACM Workshop on Hot Topics in Networks , pages 31  36 . ACM, 2012 .
Mosharaf Chowdhury and Ion Stoica . Efficient coflow scheduling without prior knowledge . In SIGCOMM , pages 393  406 . ACM, 2015 .
In SIGCOMM , SIGCOMM ' 14 , pages 443  454 , New York, NY, USA, 2014 . ACM.
William H Cunningham . Testing membership in matroid polyhedra . Journal of Combinatorial Theory , Series B , 36 ( 2 ): 161  188 , 1984 .
Communications of the ACM , 51 ( 1 ): 107  113 , 2008 .
Naveen Garg , Amit Kumar, and Vinayaka Pandit . Order scheduling models: Hardness and algorithms . In FSTTCS , pages 96  107 . Springer, 2007 .
Sungjin Im and Manish Purohit . A Tight Approximation for Coflow Scheduling for Minimizing Total Weighted Completion Time . CoRR, abs/1707.04331, 2017 . arXiv: 1707 . 04331 .
Sungjin Im , Maxim Sviridenko, and Ruben Van Der Zwaan . Preemptive and nonpreemptive generalized min sum set cover . Mathematical Programming , 145 ( 12 ): 377  401 , 2014 .
Satoru Iwata , Lisa Fleischer, and Satoru Fujishige . A combinatorial strongly polynomial algorithm for minimizing submodular functions . JACM , 48 ( 4 ): 761  777 , 2001 .
Hamidreza Jahanjou , Erez Kantor, and Rajmohan Rajaraman . Asymptotically Optimal Approximation Algorithms for Coflow Scheduling . In SPAA , pages 45  54 . ACM, 2017 .
In LATIN , pages 669  682 . Springer, 2018 .
Samir Khuller and Manish Purohit . Brief announcement: Improved approximation algorithms for scheduling coflows . In SPAA , pages 239  240 . ACM, 2016 .
Joseph YT Leung , Haibing Li , and Michael Pinedo . Scheduling orders for multiple product types to minimize total weighted completion time . Discrete Applied Mathematics , 155 ( 8 ): 945  970 , 2007 .
S. Luo , H. Yu , Y. Zhao , S. Wang , S. Yu , and L. Li . Towards Practical and Nearoptimal Coflow Scheduling for Data Center Networks . IEEE Transactions on Parallel and Distributed Systems , 27 ( 11 ): 3366  3380 , 2016 .
Minimizing the sum of weighted completion times in a concurrent open shop . Operations Research Letters , 38 ( 5 ): 390  395 , 2010 .
Zhen Qiu , Cliff Stein, and Yuan Zhong . Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks . In Symposium on Parallel Algorithms and Architectures , pages 294  303 . ACM, 2015 .
Maurice Queyranne and Andreas S Schulz. Approximation bounds for a general class of precedence constrained parallel machine scheduling problems . SIAM Journal on Computing , 35 ( 5 ): 1241  1253 , 2006 .
Maurice Queyranne and Maxim Sviridenko . A (2+ ?)approximation algorithm for the generalized preemptive open shop problem with minsum objective . Journal of Algorithms , 45 ( 2 ): 202  212 , 2002 .
Sushant Sachdeva and Rishi Saket . Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover . In IEEE Conference on Computational Complexity , pages 219  229 . IEEE, 2013 .
Alexander Schrijver . A combinatorial algorithm minimizing submodular functions in strongly polynomial time . Journal of Combinatorial Theory , Series B , 80 ( 2 ): 346  355 , 2000 .
Alexander Schrijver . Combinatorial optimization: polyhedra and efficiency , volume 24 . Springer Science & Business Media , 2003 .
Alexander Schrijver . Matroid Intersection. In Combinatorial Optimization: Polyhedra and Efficiency , volume 24 , chapter 41. Springer Science & Business Media , 2003 .
Andreas S Schulz and Martin Skutella . Randombased scheduling new approximations and LP lower bounds . In International Workshop on Randomization and Approximation Techniques in Computer Science , pages 119  133 . Springer, 1997 .
SIAM Journal on Discrete Mathematics , 15 ( 4 ): 450  469 , 2002 .
Mehrnoosh Shafiee and Javad Ghaderi . An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters . arXiv preprint , 2017 . arXiv: 1704 . 08357 .
Guoqing Wang and TC Edwin Cheng. Customer order scheduling to minimize total weighted completion time . Omega , 35 ( 5 ): 623  626 , 2007 .
Spark: Cluster computing with working sets . HotCloud , 10 ( 10 10): 95 , 2010 .
Yangming Zhao , Kai Chen, Wei Bai, Minlan Yu, Chen Tian, Yanhui Geng, Yiming Zhang, Dan Li , and Sheng Wang . RAPIER: Integrating routing and scheduling for coflowaware data center networks . In INFOCOM , pages 424  432 . IEEE, 2015 .