Logo

7. Linear Programming and Reductions

7.1 Introduction

For a linear programming problem, we are provided a set of nn variables, to which we must assign real values such that they

  1. Satisfy a provided set of linear equations/inequalities
  2. Maximize/minimize a given linear objective function

We can draw the linear constraints on an nn-dimensional hyperplane. The optimum solution, if it exists, will be located at a vertex of the nn-dimensional polygon. This should naturally make sense; the optimum should make at least one of the inequalities reach its extremity.

Occasionally, the linear program can have no optimum, though. This occurs when the linear program is infeasible, i.e. the constraints are unsatisfiable, or unbounded, when it is possible to achieve arbitrarily small/large objective values.

Solving Linear Programs

All linear programs can be solved by the simplex method. Essentially, this algorithm starts at an arbitrary vertex, and greedily traverses to an adjacent vertex with a better objective value, until it reaches a vertex whose direct neighbors all have worse or equal objective value. Essentially, it hill-climbs up to the optimal vertex.

Why does local optimality imply global optimality? That is, why is the greedy algorithm correct? Consider the profit line passing through the vertex. Since all the vertex's neighbors lie below this line, the rest of the feasible polygon must also lie below this line because the polygon is convex. Simply consider the intersection of two inequalities in a 2D plane. The result must be a convex vertex.

Convexity
The notion of a "convex" polygon in terms of its angles does not extend well to higher dimensions. However, the formal definition of convexity extends to higher dimensions: a convex region is a set of points SS such that x,yS\forall x,y\in S, the line segment xyx-y is entirely contained within the region.

Notably, all linear programs (LPs) can be reduced to one another via simple transformations. This is intuitively evidenced by how all LPs can be solved by the simplex algorithm.

Variants of LP

A general linear program has several degrees of freedom

  1. It can be either a maximization or minimization problem
  2. Its constraints can be equations and/or inequalities
  3. The variables are often restricted to be nonnegative, but they can also be unrestricted in sign

Additionally, we discuss the simple transformations that can reduce all LP options to one another.

  1. To transform a maximization problem to a minimization problem, or vice versa, negate all coefficients of the objective function.
    1. To turn an inequality constraint i=1naixib\sum_{i=1}^{n}a_{i}x_{i}\leq b into an equation, we introduce a new variable ss and replace the inequality with the equation and inequality pair i=1naixi+s=b\sum_{i=1}^{n}a_{i}x_{i}+s=b and s0s\geq0. This is called the slack variable for the inequality.
    2. To change an equality into an inequality, simple rewrite ax=bax=b into the pair of inequalities axbax\geq b and axbax\leq b.
  2. To transform a variable xx unrestricted in sign, we introduce two nonnegative variables x+,x0x^{+},x^{-}\geq0 and replace xx with x+xx^{+}-x^{-}. Thus, xx can take on any real value by adjusting these variables.

Applying these transformations enables us to reduce any LP to the standard form, which are described by the following constraints.

  1. All variables are nonnegative
  2. All constraints are equations
  3. The objective function should be minimized

7.2 Flows in Networks

Maximum Flow

In this problem, we are provided a directed/undirected graph GG in which each edge has a capacity cec_{e}. The goal is to find the maximum flow that can be sent from a source node ss to sink node tt such that the capacities of each edge are not exceeded. More formally, we define a flow within a network to be a set of variables fef_{e} for each edge ee in the network, satisfying the following properties.

  1. It doesn't violate edge capacities, i.e. 0fece, eE0\leq f_{e}\leq c_{e},\ \forall e\in E.
  2. For all nodes uu besides ss and tt, the amount of flow entering uu equals the amount of flow exiting uu. That is, flow is conserved.

Then, the size of a flow is the total quantity sent from ss to tt, which, by the conservation principle, is the quantity of flow leaving ss. The maximum possible size across all valid flows is the maximum flow.

Notice how these constraints are just a linear program with several equations and an objective function (amount of flow leaving ss) to be maximized.

Simplex Solution

Since maximum flow is an LP, we can apply the simple simplex solution to this problem! Of course, at the moment, we really only have a geometric intuition regarding the notion of an optimal solution. We won't formalize the generalized simplex algorithm yet. But, it turns out the maximum flow problem has a fairly concise interpretation of the simplex algorithm.

  1. Start with zero flow.
  2. Repeatedly choose an appropriate path from ss to tt, known as an augmenting path, and increase the flow along the edges of this path as much as possible.*

As you may notice, this algorithm seems remarkably like a greedy algorithm! This should make sense, intuitively. Recall that the simplex algorithm is really just a greedy hill-climbing algorithm in the geometric interpretation.

You might notice that * for the second step though. There's a slight complication with a method; this greedy design is actually not always optimal, and is not the simplex algorithm for maximum flow. Instead, the simplex algorithm introduces one detail that makes it optimal: it allows new paths to cancel existing flows.

Thus, the simplex algorithm, for each iteration, finds a path from sts\to t whose edges (u,v)(u,v) can be categorized as one of two types.

  1. (u,v)(u,v) in the original network, and is not yet at full capacity. Then, this edge can handle up to cuvfuvc_{uv}-f_{uv} additional units of flow.
  2. The reverse edge (v,u)(v,u) is in the original network, and there is some flow along it. Then, this edge can handle up to fvuf_{vu} additional units of flow.

These opportunities to increase flow can be imagined as a slightly modified graph GfG^{f}, known as a residual network, which represent the remaining flow that can be added along an edge. Thus, simplex is really just choosing an sts\to t path in this residual network.

Note that this algorithm is actually known as the Ford-Fulkerson method.

Minimum Cut

Maximum flow is actually the dual of another graph problem, minimum cut. The minimum cut problem presents the same graph as a maximum flow problem, except the goal is not to maximize flow, but minimize the total capacity of all edges from LL to RR, where L,RL,R are two disjoint groups of vertices such that sLs \in L and tRt\in R. This is known as an (s,t)(s,t)-cut; thus, the minimum cut problem seeks to minimize the "size" of the (s,t)(s,t)-cut.

Dual?
For an LP problem, its dual is a problem with equivalent solution, but phrased as the reverse optimization for the objective function. For maximum flow, it seeks to find the maximum size of a flow from ss to tt. For minimum cut, its dual, it seeks to find the minimum total capacity across an (s,t)(s,t)-cut. It's that simple!

In fact, this gives rise to a theorem known as

Max-Flow Min-Cut Theorem
The size of the maximum flow in a network equals the capacity of the smallest (s,t)(s,t)-cut.

Efficiency

Though we have outlined the simplex algorithm for maximum flow, we have yet to discuss exactly how efficient it is.

Certainly, we can run a DFS or BFS algorithm on the residual graph for each iteration to find a valid path. But, how many times will this repeat? If we suppose all edges in the original network have integer capacities C\leq C, then we can prove via induction that each iteration will increase the flow by an integer amount (1\geq1). This is actually known as the

Integral Flow Theorem
If every capacity in the network is an integer, then the size of the maximum flow is an integer, and there exists a maximum flow such that each individual edge flow is integral.

As a direct result of this theorem, we can show that the algorithm will take at most CC iterations, and with a runtime of E\lvert E \rvert from DFS/BFS, the total runtime is O(CE)O(C\lvert E \rvert). But, we can do better!

Indeed, if the flows are chosen poorly, the simplex algorithm can take O(CE)O(C\lvert E \rvert) steps. However, it turns out using BFS to choose the flow each iteration actually bounds the number of iterations by O(VE)O(\lvert V \rvert \lvert E \rvert). Thus, the total runtime is O(VE2)O(\lvert V \rvert \lvert E \rvert ^{2}).

Edmonds-Karp Algorithm

The Edmonds-Karp algorithm, as described in the previous section, is just the simplex algorithm with BFS applied to the residual graph each iteration to identify a new flow. The key idea of using BFS is that, every time an augmenting path is found, one of the edges will become saturated. This edge may be revisited in the future, but, since BFS will always find paths in increasing order of edges, the number of edges between ss and this edge will increase by at least one each time it is revisited in an augmenting path. An augmenting path must, before visiting this edge, traverse at least one vertex, ss, and can traverse at most V1\lvert V \rvert-1 vertices (all vertices except tt). Therefore, this edge will reappear in an augmenting path at most O(V)O(\lvert V \rvert) times. Thus, considering all edges, the number of augmenting paths that will be found by this algorithm is bounded by O(VE)O(\lvert V \rvert \lvert E \rvert). Since BFS has a runtime of O(E)O(\lvert E \rvert), the total runtime is O(VE2)O(\lvert V \rvert \lvert E \rvert ^{2}).

\square

As a bonus, the Edmond-Karp algorithm works for even graphs with irrational edge capacities.

7.3 Bipartite Matching

Consider a bipartite graph whose two sets are equally sized and denoted AA and BB. The Bipartite Matching problem considers if it's possible to form a matching of pairs between nodes in AA and BB such that every pair of nodes has an edge between them. This is known as a perfect matching, in graph theoretic terminology.

This problem can actually be reduced to a maximum flow problem! Create a source node ss with outgoing edges to all nodes in AA and a new sink node tt with incoming edges from all nodes in BB. Then, assign each edge a capacity of 11. Then, there exists a perfect matching     \iff the network has a flow whose size equals the number of pairs (which is the same as A\lvert A \rvert or B\lvert B \rvert). There is one nuance to this reduction, though; the Integral Flow Theorem guarantees that, if the network has a maximal flow satisfying the above condition, then there exists a valid perfect matching since each individual edge flow is integral.

7.4 Duality

duality.png

7.5 Zero-Sum Games

A zero-sum game is a game with one or more players in which the expected payout for each player is 00. For instance, rock-paper-scissors (RPS) is a simple example of a zero-sum game.

For a zero-sum game, we can actually represent the situation with an LP. Consider two players AA and BB playing RPS. Each player gains 11 point for a win, 1-1 for a loss, and 00 for a draw. Both players, of course, seek to maximize their score. In particular, each player's goal also implies that they seek to minimize their opponent's score. WLOG, let us focus on AA's score. Then, AA will play to maximize AA's score, and BB will play to minimize AA's score.W

Well, if AA chooses to play completely randomly, then it turns out that BB can force an expected score of 00 for AA, no matter what AA does. Conversely, if BB chooses to play completely randomly, then it turns out that AA can force an expected score of 00 for BB. In other words, neither player can hope to increase/decrease the expected score of AA beyond 00. Thus, the best each player can do is play randomly, with an expected score of 00.

This remains the same even if one of AA or BB announces their strategy first! WLOG, let AA announce their strategy first. Intuitively, one would expect this to favor BB, then. However, if both players play optimally (i.e. AA chooses the random strategy), BB cannot take advantage of this, and thus the expected score is still 00. Remarkably, this is a consequence of linear programming duality, as you may have noticed from the maximization/minimization parallel. In essence, AA, in their choice of strategy, is maximizing the minimum guaranteed expected average for themselves, while BB is minimizing the maximum guaranteed average of AA's score. In other words, AA is choosing the strategy such that it guarantees an expected score x\geq x, where xx is maximal, regardless of BB's choice of strategy. BB is choosing a strategy such that it guarantees AA's expected score is y\leq y, where yy is minimal, regardless of AA's choice of strategy.

Interestingly enough, in zero-sum games, x=yx=y, i.e. there is a uniquely defined optimal strategy (across both players). This can be generalized to arbitrary games, and essentially illustrates the existence of mixed strategies that are optimal for both players and achieve the same value. This result is known as the minimax theorem, and is described by the equation

maxx miny i,jGijxiyi=miny maxx i,jGijxiyi\underset{ \mathbf{x} }{ \mathrm{max} }\ \underset{ \mathbf{y} }{ \mathrm{min} }\ \underset{ i,j }{ \sum }G_{ij}x_{i}y_{i} = \underset{ \mathbf{y} }{ \mathrm{min} }\ \underset{ \mathbf{x} }{ \mathrm{max} }\ \underset{ i,j }{ \sum }G_{ij}x_{i}y_{i}

Note that when writing the LP for, WLOG, player AA, one should seek to maximize some variable aa with constraints describing the idea that, no matter what strategy player BB picks, AA's expected score is a\geq a. For instance, consider the zero-sum game described by the following matrix

[xyzx211y020z332]\begin{bmatrix} & x & y & z \\ x & -2 & 1 & 1 \\ y & 0 & -2 & 0 \\ z & 3 & 3 & -2 \end{bmatrix}

Where the rows describe AA's choices and the columns describe BB's choices, and MijM_{ij} describe the score AA receives if AA picks choice ii and BB picks choice jj. Then, our LP has an objective function of max a\mathrm{max}\ a and constraints of

2x+3zax2y+3zax2zax+y+z=1\begin{align*} -2x+3z&\geq a \\ x-2y+3z&\geq a \\ x-2z&\geq a \\ x+y+z &= 1 \end{align*}

Make sure this makes sense to you before moving on :)

Writing LPs for Zero Sum Games

This is a bit of tricky topic for me personally, which I've come back to write about while studying for the CS 170 final exam. In short, just remember the following. Player AA is always looking to maximize some quantity zz, where zz is the minimum (bounded by \leq constraints) score that player BB is able to induce. This involves writing a zz\leq constraint for each possibility for player BB, to ensure none of the possibilities can allow BB to induce a score less than zz. The analogue is true for player BB, in which they are looking to minimize some quantity bounded by \geq constraints for each possibility for player AA.

7.6 The Simplex Algorithm

Finally, we describe the generalized simplex algorithm for solving LP problems. At a high level overview, the simplex algorithm receives an input of a set of linear inequalities, a set of variables in those inequalities, and a linear objective function, and finds the optimal feasible point using the follow strategy.

Let vv be any vertex of the feasible region. While there exists a neighbor vv' of vv with better objective value, vvv\leftarrow v'.

This is easy to visualize in 2D or 3D. But, what about in nn dimensions?

Notation

First, it's necessary to clarify some notations. Let xiRx_{i}\in \mathbb{R} represent the variables, where 1in1\leq i\leq n. Any linear equation involving the xix_{i}'s defines a hyperplane in the space Rn\mathbb{R}^{n}. A linear inequality defines a half-space, all points that are either precisely on the hyperplane or lie on one side of it.

Meanwhile, a vertex is a unique point at which some subset of hyperplanes meet. This is always specified by at least nn inequalities. In particular, note that, with more than nn equations, we are guaranteed linear dependency; in other words, each vertex is actually specified by a set of precisely nn inequalities. By extension, we can note that two vertices are neighbors if they share n1n-1 defining inequalities.

*Notably, there can exist vertices that are generated by two different sets of inequalities. In particular, this occurs whenever some we have linear dependency for vertex. (which implies there are multiple sets of nn inequalities that can generate this vertex). These are degenerate vertices, and we will discuss how to handle them later. For now, assume all vertices are nondegenerate.

Algorithm

The algorithm, as previously described, consists of two simple steps.

  1. Check the optimality of the current vertex, and halt if optimal.
  2. Determine where to move next. Repeat.

Origin

Both steps are, in fact, easy if the vertex is at the origin. Consider a generic LP

max cTxAxbx0\begin{gather*} \mathrm{max}\ \mathbf{c}^{T}x \\ \mathbf{Ax}\leq \mathbf{b} \\ \mathbf{x} \geq \mathbf{0} \end{gather*}

Where x=[x1,,xn]T\mathbf{x}=[x_{1},\dots,x_{n}]^{T}. Also, suppose the origin is within the feasible region. Then, it is certainly a vertex, as it is the unique point at which the nn inequalities {x10,,xn0}\{ x_{1}\geq0,\dots,x_{n}\geq0 \} reach their extremes, or are tight.

Task 1:
Claim: the origin is optimal     \iff ci0,ic_{i}\leq0,\forall i.
Proof: If all ci0c_{i}\leq0, then, considering that x0\mathbf{x}\geq0, it is impossible to increase the objective value, since each xix_{i} can only increase, and this will only result in cTx\mathbf{c}^{T}x decreasing.

\square

Task 2:
We can move to another constraint by increasing xix_{i} until another constraint is hit. In other words, we release the tight constraint of xi0x_{i}\geq 0 (i.e., xi=0x_{i}=0 makes it tight), and thus allow increasing xix_{i} until some other inequality is now tight. This results in precisely nn tight inequalities (or more, but recall this can be reduced due to linear dependency).

Non-Origin

But what if we are at a non-origin vertex? Then, it suffices to transform the coordinate system such that this vertex is located at the origin.

Consider the nn tight inequalities that determine a non-origin vertex v\mathbf{v}. Let the equation of one be denoted aixbi\mathbf{a}_{i}\cdot \mathbf{x}\leq b_{i}. Then, the distance from any point x\mathbf{x} to that hyperplane is yi=biaixy_{i}=b_{i}-\mathbf{a}_{i}\cdot x. These nn equations essentially define the distances yiy_{i} as linear functions in xix_{i}. The relationship can then be inverted to express the xix_{i}'s as linear functions of the yiy_{i}'s, which consequently allows us to rewrite the entire LP in terms of the yiy_{i}'s (via substitution). If you consider the hyperplanes as the hyperaxes of the nn-dimensional space, then yiy_{i} are precisely the coordinates of any solution x\mathbf{x} in this new definition of the axes!

Notably, this LP with this new coordinate system has three properties:

  1. It includes the inequalities y0y\geq0, which are just transformed versions of the inequalities defining vv's nonnegativity.
  2. vv is the origin in y\mathbf{y}-coordinate-space.
  3. The cost function is now max cu+c~Ty\mathrm{max}\ c_{\mathbf{u}}+\tilde{\mathbf{c}}^{T}\mathbf{y}, where cuc_{\mathbf{u}} is the value of the objective function at u\mathbf{u} and c~\tilde{\mathbf{c}} is the cost function transformed from in terms of xix_{i} to in terms of yiy_i.

Final Considerations

Starting Vertex

How do we find a starting vertex for the simplex algorithm? This is not entirely trivial because, in general, it's not necessarily true that the origin will be contained within the feasible region. (Otherwise, as previously stated, the origin is always a vertex).

Luckily, finding a starting vertex can itself be reduced to an LP. (And, fortunately, this LP has a known starting vertex).

Consider an LP in standard form, i.e.

min cTx satisfying (Ax=b)(x0)\mathrm{min}\ \mathbf{c}^{T}\mathbf{x}\text{ satisfying }(\mathbf{Ax}=\mathbf{b}) \land (\mathbf{x}\geq 0)

First, make the RHS of the equations all nonnegative. IF bi<0b_{i}<0, just negate both sides of the iith equation. Then, create the following, separate LP:

Note that, for this LP, the starting vertex zi=bi, iz_{i}=b_{i},\ \forall i, and all other variables (x1,,xnx_{1},\dots,x_{n}) are set to 00. Thus, we can subsequently just start the simplex algorithm.

The result is one of two possibilities. If min z1++zn=0\mathrm{min}\ z_{1}+\dots+z_{n}=0, then zi=0, iz_{i}=0,\ \forall i, and we get a starting feasible vertex of the original LP by just ignoring the ziz_{i}'s.

Why is this true? Make sure it makes sense to you before moving on!

In contrast, if min z1++zn>0\mathrm{min}\ z_{1}+\dots+z_{n}>0, then the original LP is actually infeasible anyways! In other words, the original LP's equations required some nonzero ziz_{i}'s to be feasible, implying the original LP cannot be solved.

Degeneracy

As aforementioned, we previously assumed all vertices are nondegenerate. But, what if we do have degenerate vertices? Well, it could potentially result in vertices that seem like optimal solutions, since they are locally optimal, but are in fact suboptimal.

Naively, we can attempt to fix this by modifying simplex to detect degeneracy and continue moving to a different vertex despite lack of improvements in cost; however, this could result in an infinite loop. Instead, a smarter solution is to slightly perturb each bib_{i} to bi±ϵib_{i}\pm\epsilon_{i}, where ϵi\epsilon_{i} is some tiny error value randomly chosen for each ii. This ensures that there is no degeneracy in the entire system!

Unboundedness

What if an LP is unbounded in value? That is, the objective function can be made arbitrarily small/large when being minimized/maximized, respectively. In fact, simplex can be modified to detect this. When exploring the neighbors of a vertex, we can detect if removing a tight inequality and replacing it with another results in an undetermined system of equations with infinite solutions. In this case, simplex will simply halt.

Time Complexity

Let the LP have nn variables and mm inequality constraints. Consider an arbitrary vertex u\mathbf{u}. By definition it is the unique point at which nn inequality constraints are tight. Each of its neighbors must share at least n1n-1 inequalities with it, and have mm degrees of freedom for the other inequality, so u\mathbf{u} has O(mn)O(mn) neighbors.

For each vertex u\mathbf{u}, we must determine (1) if this vertex is more optimal and (2) if it is a true vertex. The first is simply a dot product. The second task is slow, however; it involves solving a system of nn equations in nn variables (i.e. satisfying the nn chosen inequalities as equations, to ensure they're tight) and checking the feasibility of the result. Applying Gaussian elimination to solve this has a runtime of O(n3)O(n^{3}). This results in an overall runtime per iteration of O(mn4)O(mn^{4}). But, we can do better.

Recall the idea of moving u\mathbf{u} to the origin, providing a local view of the neighbors' optimality. It turns out that this transformation only has a runtime of O((m+n)n)O((m+n)n); this is a result of exploiting the fact that the local view changes by just one defining inequality between iterations. Then, after transforming u\mathbf{u} to the origin, recall that the local view of the objective function is max cu+c~y\mathrm{max}\ c_{\mathbf{u}}+\mathbf{\tilde{c}}\cdot\mathbf{y}. (Recall that cuc_{\mathbf{u}} is the value of the objective function at u\mathbf{u}, and can be calculated with just a dot product). Thus, it actually suffices to pick any c~i>0\tilde{c}_{i}>0 (if there is none, the current vertex is optimal). And, since the LP has been rewritten in terms of yiy_{i}, it's much easier to determine how much yiy_{i} can be increased before some inequality is violated. Thus, the total runtime for each iteration of simplex is just O((m+n)n+mn)=O(mn)O((m+n)n+mn)=O(mn). (Note that mnm\geq n, otherwise the LP is unbounded).

Then, how many iterations of simplex are there? There can't be more than (m+nn)\binom{m+n}{n}, since this is the maximum number of possible vertices. This is, in fact, exponential in nn. And, in fact, it's not possible to produce a non-exponential upper bound; there do exist LPs in which simplex takes an exponential number of iterations! However, this does not typically occur in practice.