Matroid Coflow Scheduling

LIPICS - Leibniz International Proceedings in Informatics, Jul 2019

We consider the matroid coflow scheduling problem, where each job is comprised of a set of flows and the family of sets that can be scheduled at any time form a matroid. Our main result is a polynomial-time algorithm that yields a 2-approximation for the objective of minimizing the weighted completion time. This result is tight assuming P != NP. As a by-product we also obtain the first (2+epsilon)-approximation algorithm for the preemptive concurrent open shop scheduling problem.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

Matroid Coflow Scheduling

I C A L P Matroid Coflow Scheduling Manish Purohit Google 0 1 2 3 Mountain View 0 1 2 3 USA 0 1 2 3 0 Category Track C: Foundations of Networks and Multi-Agent Systems: Models , Algorithms and Information Management 1 Sungjin Im University of California at Merced , USA 2 Kirk Pruhs University of Pittsburgh , PA , USA 3 Benjamin Moseley Carnegie Mellon University , Pittsburgh, PA , USA We consider the matroid coflow scheduling problem, where each job is comprised of a set of flows and the family of sets that can be scheduled at any time form a matroid. Our main result is a polynomial-time algorithm that yields a 2-approximation for the objective of minimizing the weighted completion time. This result is tight assuming P 6= N P . As a by-product we also obtain the first (2 + )-approximation algorithm for the preemptive concurrent open shop scheduling problem. 2012 ACM Subject Classification Theory of computation ? Scheduling algorithms Funding Sungjin Im: Supported in part by NSF grants CCF-1409130 and CCF-1617653. Benjamin Moseley: Supported in part by a Google Research Award and NSF grants CCF-1617724, CCF-1733873 and CCF-1725543. Kirk Pruhs: Supported in part by NSF grants CCF-1421508 and CCF-1535755, and an IBM Faculty Award. Acknowledgements We thank the anonymous reviewers for their thorough reviews and many helpful suggestions. and phrases Coflow Scheduling; Concurrent Open Shop; Matroid Scheduling - Coflows were introduced in [5] as: ?We propose coflows, a networking abstraction to express the communication requirements of prevalent data parallel programming paradigms. Coflows make it easier for the applications to convey their communication semantics to the network, which in turn enables the network to better optimize common communication patterns.? Data parallel application frameworks such as MapReduce [9] and Spark [31] have a unique processing pattern that interleaves local computation with communication across machines. Due to the size of the large data sets processed, communication often tends to be a bottleneck in the performance of these platforms and the coflow model abstracts out this bottleneck. Theoretical work on coflow scheduling has primarily focused on the switch model (also called matching model) where the underlying network is assumed to have full-bisection bandwidth and the set of flows that can be scheduled at any time step is restricted to be form a matching. ET A CS While there are several reasonable formulations/models of scheduling coflows, the following will be convenient for our purposes. The input consists of a collection J of jobs, where each job j ? J is comprised of a set Uj of tasks (also called flows), a non-negative integer wj and a release time rj. Each task e ? Uj has a processing requirement pe. For example, in the setting of a network supporting MapReduce [9] computations, each job could be a MapReduce job, and a task/flow could represent a required communication within a shuffle phase of a job. Let U = ?j?J Uj be the collection of all tasks. Further the input contains a downward-closed set system M = (U, I). Here I ? 2U and elements of I are called the independent sets of M. Conceptually a collection of tasks is independent (and in I) if they can be simultaneously scheduled by the network. A feasible output is a schedule ? that schedules all the flows. That is for each integer time t, ? specifies a collection ?t of tasks processed/scheduled at time t. In order to be feasible, ? must satisfy the conditions that: every task e ? U is scheduled for pe time steps, and at each time t, the scheduled tasks/flows ?t are in I. A job j completes at the first time Cj such that every task in Uj has been scheduled fully. The objective is to minimize the total weighted completion time of the jobs. That is, to minimize Pj wjCj. In this paper, we consider coflow scheduling when the set system M forms a matroid. The starting point for our investigations is the question whether there is an algorithm to effectively schedule coflows that involve aggregating information, stored at various locations in a network, to a common sink location. Such gathering communication patterns were identified as common in [5]. We model aggregation communications by assuming that for each job j, Uj is a collection of locations in the network where the units of information needed for job j are stored. It is natural to define the independent sets to be locations that can simultaneously be routed to the sink without violating any capacity constraint of the network. In this case, M is a matroid, and more specifically, a gammoid. Note that the symmetric problem, of disseminating data from a fixed location to various locations in the network, is also common, and essentially equivalent to the aggregation problem. The matroid coflow scheduling problem as defined here also naturally captures a number of well-studied scheduling problems. Parallel Identical Machines Scheduling: Each job j has a single task. The matroid M = (U, I) is the uniform matroid of rank m, i.e., any set of m jobs can be scheduled in parallel. (Preemptive) Concurrent Open Shop Scheduling: In the concurrent open shop scheduling problem, each job j comprises of m tasks, one on each machine, i.e. Uj = {tij}im=1. Task tij needs to be scheduled for time pij and the job is completed when all its tasks are n completed. To model this setting, consider Ti = {tij}j=1 to be set of all tasks that need to be scheduled on machine i. M is a partition matroid that ensures that a set S of tasks is independent if and only if |S ? Ti| ? 1 for each machine i. 1.1 Our Contributions We first consider coflow scheduling on unit length tasks when M is a matroid. Our main result is: I Theorem 1. There is a deterministic polynomial-time algorithm for coflow scheduling with unit length tasks, when M is a matroid, that is 2-approximate with respect to the objective of minimizing total weighted completion time. We note that Theorem 1 can be extended to the case that tasks may have arbitrary processing times, albeit at a slight loss in the approximation factor. I Theorem 2. There is a deterministic polynomial-time algorithm for coflow scheduling with arbitrary length tasks, when M is a matroid, that is (2 + )-approximate with respect to the objective of minimizing total weighted completion time, for any constant > 0. As with all the approximation results for coflow scheduling in the literature, our algorithm is based on rounding a natural time-indexed linear program. Intuitively the rounding extracts a deadline Cj? for each job j. This time is roughly 1/? times later than the first time when every task in Uj has been scheduled at least to the extent ? in the solution to LP. Here the value of ? is randomly chosen. The expected value of Cj? is shown to be at most twice the fractional completion time for j in the solution to LP; this ?stretching? (also called slow-motion) idea has been used in other scheduling contexts [12, 22, 27]. This can be viewed as deriving from the LP a fractional schedule where each job j is fully completed by time C?. Then, we observe that the problem of scheduling tasks to meet the Cj? deadlines j can be expressed as a matroid intersection problem. As the matroid intersection polytope is integral [26], one can find an integral schedule meeting these deadlines. Finally, by derandomizing the random choice of ?, we derive our main theorem. The approximation guarantee in Theorem 1 is tight assuming P 6= N P . This is because it is NP-hard to approximate the total weighted completion time for concurrent open shop (even with unit sized tasks) within a factor of 2 ? [23], and this problem is a special case of matroid coflow scheduling, where the matroid is a partition matroid. Somewhat surprisingly, even for the concurrent open shop scheduling with release times, the previous best known approximation factor was 3 [10, 17]. (See also additional discussion in [2].) Thus, Theorem 2 immediately yields an improved approximation algorithm for preemptive concurrent open shop with arbitrary release times. I Corollary 3. There is a deterministic, polynomial-time (2 + ) approximation algorithm for the preemptive concurrent open shop scheduling problem when jobs have arbitrary release times, for any constant > 0. If all the release times and processing requirements are polynomially bounded, then the approximation guarantee improves to 2. We believe our primary technical contribution is the high-level approach to reduce a weighted completion time scheduling problem to a deadline-constrained scheduling problem. Our approach to first extract a deadline for each job from the LP solution and then finding an integer schedule that meets those deadlines can be viewed as a strict generalization of processing jobs in increasing order of their completion time derived from the LP, which has been a very common rounding tool in scheduling literature; e.g. [21, 28, 2]. Our novel approach allows us to handle the matroid constraint, which we believe is natural and quite general. 1.2 Related Results Most of the theoretical/algorithmic work on coflows has been on matching coflows [20, 16, 15, 2, 1]. These results essentially abstract out the network by modeling the network as an n-by-n switch, or equivalently a complete bipartite graph, and by modeling supportable flows by matchings in the graph. This is well motivated in practice as the networks in many data centers are hierarchical, with higher network elements having higher capacities. Thus a matching between servers at leaves of the network is a not unreasonable approximation of a communication supportable by the network. We note that matching coflows correspond to coflows in our framework when the set system M is an intersection of two partition matroids. The first constant (16.54) approximation for coflow scheduling in this model was given in [20]. Currently the best known approximation ratios are 5 for when jobs may have variable release times, and 4 when all jobs arrive at time 0 [2, 29], respectively. Note that the 2-approximation algorithms claimed in [18] and [11] are both flawed; see [2] and [11] for the discussion of the flaws. Jahanjou et al. [14] consider several problems where there is an underlying network with capacities on the edges. If the tasks are paths in the network, and I consists of collections of paths that don?t collectively violate any edge capacity, then their work gives a algorithm for producing a fractional schedule (which is equivalent to time being continuous) that is O(1)-approximate with respect to total weighted completion time. If the tasks are (source, sink) pairs in the network, and I consists of collections of (source, sink) pairs that can be simultaneously routed without violating any edge capacity, then their work gives a algorithm for producing a fractional unsplittable schedule that is O(log E/ log log E)-approximate with respect to total weighted completion time, together with a matching hardness result; here, E is the number of edges. Our work is not comparable to theirs since different constraints are addressed and our focus is on integer schedules in contrast to theirs on fractional schedules. Coflow scheduling is a generalization of the classical concurrent open shop scheduling problem [3, 4, 10, 17, 19, 23, 30]. Several 2-approximation algorithms were shown [4, 10, 17] via LP rounding. Matching hardness results were shown in [3, 23]. When jobs have different release times, the same LP relaxations yielded 3-approximations [10, 17]. Later, [19] gave a simple greedy algorithm that matches the best approximation ratio when all jobs arrive at time 0. Recently, [2] gave a combinatorial 3-approximation via a primal-dual analysis when jobs have non-uniform release times. Coflow scheduling has been actively studied within the networking community; some examples include [5, 6, 7, 18, 32]. 1.3 Organization The rest of the paper is organized as follows. In Section 2 we give some basic definitions and notation. In Section 3 we give the linear programming formulation. In Section 4 we explain how to round a solution to the linear program. In Section 5 we discuss the derandomization. In Section 6, we discuss the extension to tasks with variable processing times. 2 Definitions and Notations We first consider the matroid coflow scheduling problem with unit length tasks. We will discuss three types of schedules, and two types of objectives. In a discrete-time schedule, we consider that time is divided into unit length intervals (also called time slots), and the schedule specifies the set of jobs processed during each time slot. We let time slot t refer to the interval of time (t ? 1, t]. In an integer discrete-time schedule, at each time slot t, an independent set in the matroid is scheduled. In a fractional discrete-time schedule, at each time slot t, a convex combination of independent sets from the matroid are scheduled. In other words, in such a fractional schedule, the set of tasks scheduled at time slot t can be expressed as PS?I ?S 1S , where PS?I ?S = 1, and 1S is the characteristic vector corresponding to independent set S ? I. A valid feasible solution is restricted to be an integer discrete-time schedule. On the other hand, during our analysis, we will also consider continuous schedules. A continuous schedule specifies an independent set of tasks to be scheduled at each instantaneous time ? (as opposed to during a unit-length time slot). The completion time Cj of a job j is the first time when all tasks in Uj have been completed. We let qe(t) : [0, T ] ? {0, 1} denote an indicator function defined for each task e ? U , where qe(t) = 1 if and only if task e (more precisely an independent set including e) is scheduled at time t in ?. We let Qe(t) = R t ?=0 qe(? )d? denote the extent to which task e is scheduled by time t. Let C?j (v) denote the first time when every task in Uj has been scheduled by extent at least v. The fractional completion time of job j is then C?j = R ? v=0 C?j (v)dv. We will use cost(LP) to denote the optimum objective of the LP, which we will describe soon. 3 Linear Program In this section we give a linear programming formulation LP of our matroid coflow problem when tasks have unit lengths. Let xj,t be an indicator variable that specifies whether job j completes at time t. For a task e ? Uj , let ye,t be an indicator variable that specifies whether task e is assigned to time slot t. Let ?(S) be the rank function of the matroid.1 Let T = |U | be an upper bound on the time by which all tasks can be completed. The formulation of LP is then: s.t. LP : ?j ? J, ?j ? J and ?e ? Uj and ?t ? [T ], ?S ? U and ?t ? [T ], ?j ? J and ?e ? Uj and ?t ? [rj ? 1], min X wj j?J X t ? xj,t t?[T ] X xj,t = 1 t X ye,s ? s?t X xj,s s?t X ye,t ? ?(S) e?S ye,t = 0 x, y ? 0 (1) (2) (3) (4) (5) Constraint (1) ensures that every job is scheduled. Constraint (2) ensures that all tasks of a job j are scheduled to at least the extent that j is completed by time t. Constraint (3) ensures that at any time step t, the set of tasks assigned to t form an independent set in the given matroid. Constraint (3) is the only constraint set that can potentially have a superpolynomial size. However, for each fixed time t, the constraint is just a polymatroid, and therefore, admits an efficient separation oracle [8, 24, 13]. In case that there are arrival/release times, constraint (4) ensures that no tasks in Uj are processed before j?s release time rj . The objective of LP is fractional weighted completion time. Note that a solution to LP can be viewed as a fractional discrete schedule. We will use Xj,t := Ps?t xj,s to denote the extent to which job j has been processed by time t, and use Ye,t := Ps?t ye,s to denote the extent to which task e has been processed by time t. 4 Rounding In this section, we show how to round an optimal solution to LP to obtain a 2-approximate integral (discrete) schedule. For each job j and v ? (0, 1], define C?j (v) = xj,t 1 (v ? Xj,t?1) + (t ? 1) if v ? (Xj,t?1, Xj,t], t ? [T ]. Intuitively, C?j (v) is a linear interpolation of the discrete 1 ?(S) is defined as maxS0?S:S0?I |S0|. 1 C?j (?)e for each job j, times when job j is partially completed. We set a deadline Cj? = d ? where ? ? (0, 1] is randomly drawn according to the probability density function f (v) = 2v. A key portion of the analysis is to show that the expected value of each wj Cj? is at most twice the contribution of job j to the LP objective. To analyze the expected value of C?, we construct several schedules from the LP solution. j In Subsection 4.1, we will show how to convert a solution of LP to a continuous schedule ?. In Subsection 4.2 we show how to convert ? into a stretched schedule ??, which is another continuous schedule parameterized by ? ? (0, 1]. Finally, in Subsection 4.3 we will show how to convert this continuous schedule into (discrete-time) integer schedule with the same cost. We note that we construct schedules in Subsection 4.1 and 4.2 only for the sake of analysis. That is, we can obtain a 2-approximate integral discrete schedule only using the rounding algorithm in Subsection 4.3 with the deadlines {Cj?}j . 4.1 Constructing the Continuous Schedule ? We construct a continuous schedule ? from the solution to LP. For each time t, we first decompose {ye,t}e?U into a convex combination PS?I ?S 1S of independent sets.2 To create ? this convex combination is ?smeared? across all instantaneous times during (t ? 1, t]. That is, in ? each independent set S is scheduled for ?i(?2 ? ?1) time units during each infinitesimal time interval (?1, ?2] ? (t ? 1, t]. This is formalized in Proposition 4. In Lemma 5 we show that the first time when a job j is scheduled to extent v in ? is at most C?j (v). In Lemma 6 we show that the fractional weighted completion time of ? is a bit less than the objective value of the solution to LP. This is because any processing of job j done during (t ? 1, t] has no effect until time t on the LP objective, whereas it can have effect on j?s fractional weighted completion time of ? during (t ? 1, t], before time t. I Proposition 4. Consider the schedule ?. For any integer t ? [T ] and (?1, ?2] ? (t ? 1, t], we have, R??=2?1 qe(? )d? = ye,t(?2 ? ?1). I Lemma 5. Consider the schedule ?. For any j and v ? (0, 1], (v ? Xj,t?1) + (t ? 1) if v ? (Xj,t?1, Xj,t], t ? [T ], C?j (v) ? C?j (v) =: 1 xj,t and C?j (0) = 0. Proof. By definition, we have C?j (0) = 0, so let us assume that v > 0. We first show that C?j (Xj,t) = t. Due to constraint (2), Ye,t ? Xj,t for all e ? Uj . Thus, by construction of ?, all tasks in Uj are processed by at least Xj,t by time t, i.e., Qe(t) ? Xj,t, meaning that C?j (Xj,t) ? t. We also have that C?j (Xj,t) ? t since we know by the optimality of the LP solution that Ye,t = Xj,t for some e ? Uj , therefore, Qe(t) = Xj,t. Thus, we have C?j (Xj,t) = t = C?j (Xj,t). Now consider an arbitrary v ? (0, 1]. Let t ? [T ] be such that v ? (Xj,t?1, Xj,t]. Then, it follows that xj,t 6= 0. Thus, from the above argument, we have C?j (Xj,t) = t. Let tv := C?j (v) for notational convenience. We want to show C?j (v) ? tv. By Proposition 4 and construction of ?, we know that the extend to which e is processed by time tv, Qe(tv) = Ye,t?1 + ye,t(tv ? (t ? 1)) = Ye,t?1 + ye,t (v ? Xj,t?1) xj,t 2 This is possible because {ye,t}e lies in the polymatroid associated with the matroid rank function ? due to constraint (3). It is well-known that this polymatroid is equivalent to the independence set polytope of the matroid, meaning that {ye,t}e can be expressed as a convex combination of characteristic vectors of some independent sets. For more details, see Chapter 44 of [25]. First, if ye,t ? xj,t, we immediately have Qe(tv) ? v + Ye,t?1 ? Xj,t?1 ? v due to constraint 1 (v ? Xj,t?1) ? 1, fixing the value of Ye,t = Ye,t?1 + ye,t, the (2). Otherwise, since xj,t right-hand-side decreases when we increase ye,t. Therefore, we have, Qe(t) ? Ye,t?1 ? (xj,t ? ye,t) + xxjj,,tt (v ? Xj,t?1) = v + Ye,t ? Xe,t ? v, again due to constraint (2). Hence, we have Qe(tv) ? v for all e ? Uj, which immediately yields C?j(v) ? tv. J I Lemma 6. Pj?J wj Rv1=0 C?j(v)dv = cost(LP) ? Pj?J wj/2 Proof. It suffices to show that Rv1=0 C?j(v)dv = Pt?[T ] t ? xj,t ? 1/2, since summing this equation over all j ? J multiplied by their weight wj yields the lemma. = = where the last equality follows from constraint (1). To construct ?? from ? we ?stretch? the schedule ? by a factor of 1/?. More precisely, if an independent set S is scheduled in ? during an infinitesimal interval (?1, ?2], the same independent set is scheduled in ?? during (?1/?, ?2/?]. In Lemma 7 we show that ?? completes job j by time Cj? = d C?j?(?) e. In Lemma 8 we upper bound the expected cost of Pj wjCj? by twice cost(LP). I Lemma 7. The schedule ?? completes every job j by time Cj?. Proof. Lemma 5 shows that C?j(v) ? C?j(v) for all v ? (0, 1], meaning that every task in Uj is completed by v units by time C?j(v) in ?. Thus, in the stretched schedule ??, every job j completes by time C?j(?)/?, for any value of ? ? (0, 1]. J I Lemma 8. E[Pj?J wjCj?] ? 2 cost(LP). Proof. First note that X wjE[C?j(?)/?] = X wj j?J j?J Thus, we have, Z 1 v=0 C?j(v)/v ? (2v)dv = 2 X wj j?J Z 1 v=0 C?j(v)dv Eh X wjCj?i = Eh X wjd ?1 C?j(?)ei ? Eh X wj ?1 C?j(?)i + X wj j?J j?J j j = 2 X wj j Z 1 v=0 C?j(v)dv + X wj j = 2 cost(LP) ? X wj/2 + X wj j j = 2 cost(LP) 4.3 Constructing a Discrete Integer Schedule Let ye?,t denote how much task e is processed during time interval (t ? 1, t]. In other words, task e appears in ye?,t units of independents sets scheduled in ?? during the time interval. Then, {ye?,t}e?U,t?[T ] satisfies the following: 1. For all j ? J and e ? Uj, Pt?[Cj?]\[rj?1] ye?,t = 1; and . 2. For all S ? U and for all t ? [T ], Pe?S ye?,t ? ?(S), where the second holds true since {ye?,t}e?U can be expressed as a convex combination of independent sets scheduled during time interval (t ? 1, t], and therefore, lies in the matroid polytope. We now interpret {ye?,t} as a fractional point in the intersection of two matroid polytopes. We create the following two matroids. The new universe U 0 is defined as U 0 := {(e, t) | t ? [T ], j ? J, e ? Uj s.t. rj ? t ? Cj?}. The first matroid M1 is a partition matroid that forces to choose at most one element out of {(e, t)}t, for each e ? U . Intuitively, this ensures that no task is scheduled more than once across times. The second matroid ensures that elements scheduled at each time t forms an independent set in I. The following lemma formally defines the second matroid and shows that it is indeed a matroid. I Lemma 9. Define I2 ? 2U0 such that S0 ? U 0 is in I2 if and only if for any t ? [T ], {e | (e, t) ? S0} ? I. Then, M2 = (U 0, I2) is a matroid. Proof. Let I2 denote the family of independent sets of M2. It is straightforward to see that I2 is downward closed. Thus, it suffices to show that for any A0, B0 ? I2 such that |A0| < |B0|, there exists (e, t) ? B0 \ A0 such that A0 ? {(e, t)} ? I2. Let Ut0 := {(e, t) | j ? J, e ? Uj s.t. rj ? t ? Cj?} denote the subset of U 0 restricted to time t. Consider any fixed A0, B0 ? I2 such that |A0| < |B0|. Then, consider any fixed time t? such that |A0 ? Ut0? | < |B0 ? Ut0? |; such a time t? must exist since {Ut0}t partitions U 0. Then, for some (e?, t?) ? (B0 ? Ut0? ) \ (A0 ? Ut0? ), it must be the case that {e?} ? {e | (e, t?) ? A0 ? Ut0? } ? I. This is because B0 has more elements than A0 that are paired up with the fixed time t?, and therefore, the set of elements appearing in A0 ? Ut0? remains independent with some e? added. Further, for any other time t, the elements appearing in the pairs of A0 associated with t remain unchanged, and therefore, is in I. J Then, it is easy to see that {ye?,t} is a point that lies in the intersection of the polymatroids that are defined by M1 and M2. Further, {ye?,t} belongs to the base polymatroid of M1; so we have P(e,t)?U0 ye?,t = |U |. Since the matroid intersection polytope is well-known to be integral [26], meaning that every vertex is an integer point, a maximum independent set in the intersection of M1 and M2 must have |U | elements. Further, we can find such a maximum independent set in polynomial time. To recap, we have found S0 ? U 0 that is a base of M1 and is independent in M2. This set S0 immediately gives the desired integer schedule where {e | (e, t) ? S0} is scheduled at each time t. Indeed, due to S0 being a base of M1, every task in Uj is scheduled exactly once during time interval [rj, Cj?]. Further, S0 being independent in M2 ensures that the set of tasks scheduled at each time forms an independent set in I. 5 Derandomization In this section, we discuss how to derandomize the choice of ? ? (0, 1], which was used to compute the deadlines for the jobs. This will complete the proof of Theorem 1. Let us first define step values. We say that v ? (0, 1] is a step value if Ps?t xj,s = v for some j ? J and integer t ? [T ] ? in other words, exactly v fraction of some job j is completed by some integer time in the LP solution. Let V denote the set of all step values; 1 ? V by definition. Note that that |V | is polynomially bounded in the input size, as the number of variables xj,t we consider in LP is at most |J | ? |U |. 1 C?j(?)e. This Recall that in Lemma 8 we showed E[Pj wjCj?] ? 2 cost(LP) when Cj? := d ? implies there exists a certain value of ? ? (0, 1] such that Pj wjCj? ? 2cost(LP). For the purpose of derandomization, it suffices to find ? such that Pj wjC?j(?)/? ? 2 Pj wj Rv1=0 C?j(v)dv; the equality is shown in equation (6) in expectation. Towards this end, we aim to find ? ? (0, 1] that minimizes Pj wjC?j(?)/?. Suppose ? was set to a value v ? (v1, v2], where v1 and v2 are two adjacent step values in V . Consider any fixed job j. Let t ? [T ] be such that v ? (Xj,t?1, Xj,t]. By definition of step values, we have (v1, v2] ? (Xj,t?1, Xj,t]. Thus, we have C?j(v)/v = x1j,t (1 ? Xj,vt?1 ) + t?v1 . This becomes a linear function in z over [1/v2, 1/v1) if we set z = 1/v. Therefore, we get a piece-wise linear function g(z) by summing over all jobs multiplied by their weight and considering all pairs of two adjacent step values in V . We set ? to the the inverse of z?s value that achieves the global minimum, which can be found in polynomial time. 6 Arbitrary Processing Times In this section we show how to extend Theorem 1 to allow tasks with arbitrary processing times with a loss of (1 + ) factor in the approximation ratio for any arbitrary constant > 0. In this setting, each task e has an arbitrary integer size pe and the task e completes when pe independent sets including e are scheduled. As before, at each time we can schedule a set of tasks that is independent in the given matroid and a job completes when all its tasks complete. 6.1 Compact Linear Program We first describe our new compact LP relaxation. Let T := Pe pe + maxj rj, which is clearly an upper bound on the maximum time we need to consider. We define a set of times T that consists of polynomially many time steps. First, let T include every job?s arrival time. Next, let T include all times appearing in {b(1 + )ic}0?i?dlog1+ T e+1. In words, T includes exponentially increasing time steps by a factor of (1 + ) starting from 1 but includes no times greater than (1 + )2T . Let t1 = 1, t2, . . . , tk, . . . , tK+1 denote the (integer) times in T in increasing order. Let Ii := [ti, ti+1) where i ? [K]. The idea is to rewrite LP compactly as follows by replacing time-indexed variables with interval-indexed variables. min X wj j?J X (ti+1 ? 1) ? xj,i i?[K] s.t. ?e ? U, ?j ? J ?e ? Uj ?i ? [K], ?S ? U ?i ? [K], ?j ? J ?e ? Uj ?i ? [K] s.t. ti+1 ? rj, X (ti+1 ? ti)ye,i = pe i?[K] X ye,i0 /pe ? i0?i X xj,i0 i0?i X ye,i ? ?(S) e?S ye,i = 0 Here, variable xj,i can be viewed as the average fraction of job j that completes per unit time during Ii; so, when the job j completes during Ii for the first time, we have Pi0?i xj,i0 = 1. Likewise, ye,i has an analogous meaning for each task e but it denotes the average unit of task e that is processed per unit time during Ii. Constraint (7) ensures that all tasks complete eventually. Constraint (9) ensures that the average vector representing how much each task is processed per unit time during It lies in the polymatroid. Constraint (10) enforces that no tasks in Uj are processed before j?s arrival time; this is possible since T includes all jobs arrival times. Before explaining constraint (8), we explain the objective. If all intervals, {Ii} were of unit length, the objective would be exactly the fractional total weighted completion time. However, to make the LP compact, when job j completes by xj,i fraction during interval Ii, we pretend that the fraction completes at the end of Ii, i.e., ti+1 ? 1. Thus, we overestimate the fractional objective; but since times in Ii differ by at most (1 + ) factor, our overestimate is by a factor of at most (1 + ). Finally, we discuss constraint (8), which caps each job?s (cumulative) processed fraction at the analogous quantity of each task of the job, which is measured as how much the task has been processed divided by its processing time. We also note that this compact LP admits the same separation oracle as the one for LP. 6.2 Rounding As before, we seek to round the optimal LP solution. Recall that we first obtained Cj? := d ?1 C?j e and found an integer schedule that completes every job j before C?. We observe that j the first procedure is no issue. This is because we can interpret the solution to our compact LP as a solution to LP. To see this, when a task e is processed by ? amount, pretend that there exist pe different tasks of unit size and they are processed equally by ?/pe amount. Thus, we can compute C?j (v) efficiently for any value of v ? (0, 1]. The derandomization can be done similarly. 6.3 Finding An Integer Schedule It now remains to find an integer schedule meeting the discovered deadlines, {Cj?}j?J . We use essentially the same idea of reducing the problem to finding an integer solution to the intersection of two matroids. However, this reduction requires some careful modifications to be implemented in polynomial time. Also, we will aim to complete every job j by (1+O( ))Cj? meeting the deadline slightly loosely. The main idea is to use the fact that the continuous schedule ?? meeting the deadlines {Cj?} only changes polynomially many times. This is because the continuous schedule ? before the stretching is identical at all times during each of the intervals (0, t1 ?1], (t1 ?1, t2], . . . , (tK?1 ? 1, tK ] ? these intervals are stretched into (0, (t1 ? 1)/?], ((t1 ? 1)/?, t2/?], . . . , ((tK?1 ? 1)/?, tK /?], respectively. We split the interval including the time T 0 = |U |2/ 2 into two, the left one ending at |U |2/ 2 and the right one starting at |U |2/ 2. Here, assume that 1/ is an integer. We also add time C?j (?)/? for every j ? J and split the intervals accordingly. To simplify the notation, we recycle the notations Ii. By reindexing the resulting intervals and merging some initial intervals, we have I0 := (0, T 0], I1, I2, ..., IK0 . We say that an interval is small if its starting time or ending time is not a power of (1 + ) divided by ?; more precisely, ((ti?1 ? 1)/?, ti/?] is small if ti?1 or ti is not a power of (1 + ) divided by ?. Note that there are at most 4|J | + 4 ? 8 J | | ? 8|U | small intervals since each job?s arrival time and deadline together can create at most 4 small intervals; the extra four come from time 0, the final time, and T 0. For each interval Ii, let Qe(Ii) denote the amount of task e processed during Ii, which can be easily computed in polynomial time. For each interval, we will construct an integer schedule that schedules each task as much as the continuous schedule ?? does without using too many time steps compared to the interval?s length; more precisely, the integer schedule will process at least dQe(Ii)e units of task e. We categorize the intervals into three groups. Depending on the category where each interval belongs, we construct an integer schedule differently or give a different upper bound on the length of the integer schedule. At the end, we will concatenate the constructed integer intervals in increasing order of times. In the following, |I| denotes I?s length. The first interval, I0 = (0, T 0]. Using the same idea we used for handling unit-sized tasks, we find an integer schedule that processes at least bQe(Ii)c, meeting all job deadlines no greater than T 0. Note that I0 has a polynomial length; thus, the desired integer schedule can be computed in polynomial time. Then, we can greedily schedule each task e per unit time such that Qe(Ii) is not an integer. Note that such a task e hasn?t completed by time T 0, so the task (more precisely, the job to which the task belongs) has deadline at least T 0. Therefore, we will be able to charge the extra delay of at most |U | to the corresponding job?s deadline directly. Ii that is not small, for i ? 1. We seek to construct an integer schedule of length (1 + O( ))|Ii|. Towards this end, we do the following. Suppose we divide the interval into |Ii| d |U|/ e subintervals of length |U |/ ; there can be at most one subinterval of a smaller length and we will handle it later. Next, for each subinterval of length |U |/ , we try to schedule |U|/ Qe(Ii)e units of each task e. Since the length is polynomial in |U |, we can find an d |Ii| integer schedule of length |U |/ + 1 that schedules b | U|I|i/| Qe(Ii)c units of each task e. By scheduling one task per unit time, we can schedule d | U|I|i/| Qe(Ii)e units of each task e for |U |/ + 1 + |U | ? (|U |/ ) ? (1 + 2 ) time steps. Here, our integer schedule?s length is at most (1 + 2 ) times the subinterval?s length, |U |/ . This integer schedule is repeated b | U|I|i/| c times. We now handle the smaller subinterval of length less than |U |/ . Using a similar argument, we can process more units of each task than the continuous schedule, using at most |U |/ + 1 + |U | ? 2|U |/ time steps. Here we use the fact that Ii has length significantly greater than |U |. To see this, suppose we had not added jobs arrival times, deadlines or T 0 in the process of creating the intervals. Then the intervals preceding Ii have exponentially decreasing lengths by a factor of (1 + ). Using this observation, we can argue that Ii?s length is at least /2 times Ii?s starting time. Since Ii?s starting time is greater than T 0, we have that Ii?s length is at least ( /2) ? T 0 = ( /2) ? (|U |2/ 2) = |U |2/(2 ). So, we can charge the number of time steps spent to handle the smaller subinterval, which is at most 2|U |/ , to the length of Ii. From all these arguments, we can construct an integer schedule of length at most (1 + 6 )|Ii|. Ii that is small, for i ? 1. We seek to construct an integer schedule of length (1+O( ))|Ii|+ 2|U |/ . The whole idea is the same for the intervals that are not small. The only difference is that we cannot charge the extra time steps we spend to handle the smaller subinterval, which is at most 2|U |/ , to the length of Ii. Thus, we just use the upper bound on the length of our integer schedule. As mentioned before, we concatenate the integer schedules originating from I0, I1, . . . , IK in this order to obtain the final schedule. It now remains to show that each job completes by time (1 + O( ))Cj?. We already showed that our integer schedule completes every job j before its deadline Cj? if it is smaller than T 0. For any other job j, it must be the case that C?j (?)/? is greater than T 0. Let Ii be the interval including C?j (?)/?. Due to the way the intervals are constructed, C?j (?)/? must be equal to Ii?s finish time. Our goal is to show that we complete j not too late compared to Ii?s finish time. That is, we want to show that the total length of the integer schedules originating from I0, I1, . . . , Ii is at most (1 + O( )) Pi0?i |Ii0 |. Indeed, the total length is at most, |I0| + |U | + X i0=[i]:Ii0 is small ((1 + O( ))|Ii| + 2|U |/ ) + X (1 + O( ))|Ii| i0=[i]:Ii0 is not small i ? X(1 + O( ))|Ii0 | + |U | + (2|U |/ ) ? (8|U |) ? i0=0 i X(1 + O( ))|Ii0 | + O( )|I0| i0=0 Here, the first inequality follows from the fact that there are at most 8|U | small intervals, as argued above. The second inequality is immediate from |I0| = T 0 = |U |2/ 2. Therefore, we have shown that each job completes by time (1 + O( ))Cj?, which establishes that our final schedule?s objective is at most (1 + O( )) times the compact LP?s optimum. Since we showed the compact LP lower bounds the optimum times (1 + ), we obtain a 2(1 + )-approximate schedule for arbitrary > 0 by scaling appropriately. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Saksham Agarwal , Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, David Shmoys, and Amin Vahdat . Sincronia: Near-optimal Network Design for Coflows . In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '18 , pages 16 - 29 . ACM, 2018 . Saba Ahmadi , Samir Khuller, Manish Purohit, and Sheng Yang . On Scheduling Coflows . In IPCO , pages 13 - 24 . Springer, 2017 . Nikhil Bansal and Subhash Khot . Inapproximability of hypergraph vertex cover and applications to scheduling problems . In ICALP , pages 250 - 261 . Springer, 2010 . Zhi-Long Chen and Nicholas G Hall. Supply chain scheduling: Conflict and cooperation in assembly systems . Operations Research , 55 ( 6 ): 1072 - 1089 , 2007 . In ACM Workshop on Hot Topics in Networks , pages 31 - 36 . ACM, 2012 . Mosharaf Chowdhury and Ion Stoica . Efficient coflow scheduling without prior knowledge . In SIGCOMM , pages 393 - 406 . ACM, 2015 . In SIGCOMM , SIGCOMM ' 14 , pages 443 - 454 , New York, NY, USA, 2014 . ACM. William H Cunningham . Testing membership in matroid polyhedra . Journal of Combinatorial Theory , Series B , 36 ( 2 ): 161 - 188 , 1984 . Communications of the ACM , 51 ( 1 ): 107 - 113 , 2008 . Naveen Garg , Amit Kumar, and Vinayaka Pandit . Order scheduling models: Hardness and algorithms . In FSTTCS , pages 96 - 107 . Springer, 2007 . Sungjin Im and Manish Purohit . A Tight Approximation for Co-flow Scheduling for Minimizing Total Weighted Completion Time . CoRR, abs/1707.04331, 2017 . arXiv: 1707 . 04331 . Sungjin Im , Maxim Sviridenko, and Ruben Van Der Zwaan . Preemptive and non-preemptive generalized min sum set cover . Mathematical Programming , 145 ( 1-2 ): 377 - 401 , 2014 . Satoru Iwata , Lisa Fleischer, and Satoru Fujishige . A combinatorial strongly polynomial algorithm for minimizing submodular functions . JACM , 48 ( 4 ): 761 - 777 , 2001 . Hamidreza Jahanjou , Erez Kantor, and Rajmohan Rajaraman . Asymptotically Optimal Approximation Algorithms for Coflow Scheduling . In SPAA , pages 45 - 54 . ACM, 2017 . In LATIN , pages 669 - 682 . Springer, 2018 . Samir Khuller and Manish Purohit . Brief announcement: Improved approximation algorithms for scheduling co-flows . In SPAA , pages 239 - 240 . ACM, 2016 . Joseph Y-T Leung , Haibing Li , and Michael Pinedo . Scheduling orders for multiple product types to minimize total weighted completion time . Discrete Applied Mathematics , 155 ( 8 ): 945 - 970 , 2007 . S. Luo , H. Yu , Y. Zhao , S. Wang , S. Yu , and L. Li . Towards Practical and Near-optimal Coflow Scheduling for Data Center Networks . IEEE Transactions on Parallel and Distributed Systems , 27 ( 11 ): 3366 - 3380 , 2016 . Minimizing the sum of weighted completion times in a concurrent open shop . Operations Research Letters , 38 ( 5 ): 390 - 395 , 2010 . Zhen Qiu , Cliff Stein, and Yuan Zhong . Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks . In Symposium on Parallel Algorithms and Architectures , pages 294 - 303 . ACM, 2015 . Maurice Queyranne and Andreas S Schulz. Approximation bounds for a general class of precedence constrained parallel machine scheduling problems . SIAM Journal on Computing , 35 ( 5 ): 1241 - 1253 , 2006 . Maurice Queyranne and Maxim Sviridenko . A (2+ ?)-approximation algorithm for the generalized preemptive open shop problem with minsum objective . Journal of Algorithms , 45 ( 2 ): 202 - 212 , 2002 . Sushant Sachdeva and Rishi Saket . Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover . In IEEE Conference on Computational Complexity , pages 219 - 229 . IEEE, 2013 . Alexander Schrijver . A combinatorial algorithm minimizing submodular functions in strongly polynomial time . Journal of Combinatorial Theory , Series B , 80 ( 2 ): 346 - 355 , 2000 . Alexander Schrijver . Combinatorial optimization: polyhedra and efficiency , volume 24 . Springer Science & Business Media , 2003 . Alexander Schrijver . Matroid Intersection. In Combinatorial Optimization: Polyhedra and Efficiency , volume 24 , chapter 41. Springer Science & Business Media , 2003 . Andreas S Schulz and Martin Skutella . Random-based scheduling new approximations and LP lower bounds . In International Workshop on Randomization and Approximation Techniques in Computer Science , pages 119 - 133 . Springer, 1997 . SIAM Journal on Discrete Mathematics , 15 ( 4 ): 450 - 469 , 2002 . Mehrnoosh Shafiee and Javad Ghaderi . An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters . arXiv preprint , 2017 . arXiv: 1704 . 08357 . Guoqing Wang and TC Edwin Cheng. Customer order scheduling to minimize total weighted completion time . Omega , 35 ( 5 ): 623 - 626 , 2007 . Spark: Cluster computing with working sets . HotCloud , 10 ( 10 -10): 95 , 2010 . Yangming Zhao , Kai Chen, Wei Bai, Minlan Yu, Chen Tian, Yanhui Geng, Yiming Zhang, Dan Li , and Sheng Wang . RAPIER: Integrating routing and scheduling for coflow-aware data center networks . In INFOCOM , pages 424 - 432 . IEEE, 2015 .

This is a preview of a remote PDF:

Sungjin Im, Benjamin Moseley, Kirk Pruhs, Manish Purohit. Matroid Coflow Scheduling, LIPICS - Leibniz International Proceedings in Informatics, 2019, 145:1-145:14, DOI: 10.4230/LIPIcs.ICALP.2019.145