Logo

5. Greedy Algorithms

Lecture: Task Scheduling

Input is nn jobs with start and end times, i.e. [(s1,t1),,(sn,tn)][(s_{1},t_{1}),\dots,(s_{n},t_{n})]. Schedule the maximum number of non-overlapping jobs, and return this number.

The solution is to greedily select the job that finishes first!

Proof.
Let (s1,t1),,(sr,tr)(s_{1},t_{1}),\dots,(s_{r},t_{r}) denote the schedule produced by the greedy algorithm. Now, let (s1,t1),,(s,t)(s_{1}',t_{1}'),\dots,(s_{\ell}',t_{\ell}') denote a known optimal solution.

Claim 1: rr\leq \ell.

This follows from the fact that \ell is optimal.

Claim 2: Let Hi=(s1,t1),,(si,ti),(si+1,ti+1),,(s,t)H_{i}=(s_{1},t_{1}),\dots,(s_{i},t_{i}),(s_{i+1}',t_{i+1}'),\dots,(s_{\ell}',t_{\ell}'). Then, HiH_{i} is optimal i{0,,r}\forall i\in \{ 0,\dots,r \}.

Base Case: i=0i=0.
H0H_{0} is equivalent to the given optimal solution, so this is, by definition, optimal.

Inductive Hypothesis: i=ki=k
Assume that HiH_{i} is optimal for i=ki=k.

Inductive Step: i=k+1i=k+1
Note that ti<si+1<ti+1ti+1<si+2t_{i}<s_{i+1}<t_{i+1}\leq t_{i+1}'< s_{i+2}'. Therefore, replacing (si+1,ti+1)(s_{i+1}',t_{i+1}') with (si+1,ti+1)(s_{i+1},t_{i+1}) works just fine, and Hi+1H_{i+1} is optimal since HiH_{i} with optimal. Thus, the greedy algorithm's solution is optimal.

\square

Runtime Analysis.
This algorithm requires first sorting the tasks by finishing time in ascending order, and then iterating over them in linear time. Therefore, the overall time complexity is O(nlogn)O(n\log n).

5.1 Minimum Spanning Trees

Given an undirected graph G=(V,E)G=(V,E) with edges weights wew_{e}, the minimum spanning tree (MST) is a tree T=(V,E)T=(V,E') with EEE'\subseteq E that minimizes

weight(T)=eEwe\text{weight}(T)=\sum_{e\in E'}w_{e}

Aside: Tree Properties

A tree is defined a an undirected graph that is connected an acyclic.

A tree with V=n\lvert V \rvert=n has E=n1\lvert E \rvert=n-1.

Consider the construction of a tree, one edge at a time, starting from nn components, one vertex each. Each connection reduces the number of components by 1 (otherwise, it produces a cycle). Thus, n1n-1 edges leads to 1 component.

Any connected, undirected graph G=(V,E)G=(V,E) with E=V1\lvert E \rvert=\lvert V \rvert-1 is a tree. (Converse of previous).

Simply show GG is acyclic. While the graph contains a cycle, remove an edge from the cycle. The process terminates with G=(V,E),EEG'=(V,E'),E'\subseteq E that is acyclic and connected. Thus, GG is a tree     \implies E=V1\lvert E' \rvert=\lvert V \rvert-1 by the previous property     E=E\implies E=E'     \implies no edges were removed     \implies GG was acyclic already     \implies GG is a tree.

An undirected graph is a tree     \iff there exists a unique path between any pair of nodes

Otherwise, there must be a cycle.

5.1.1 Greedy Approach

Just repeatedly add the next lightest edge that doesn't produce a cycle. The correctness of this algorithm relies on the cut property.

5.1.2 Cut Property

Definition. Suppose edges XX are part of a minimum spanning tree of G=(V,E)G=(V,E). Pick a subset of nodes SS such that no edges in XX cross between SS and VSV-S. Let ee be the lightest edge across this partition. Then, X{e}X\cup \{ e \} is part of some MST.

Proof. If ee is part of XX's MST, TT, then we are done. Thus, assume henceforth that e∉Te\not\in T. Add edge ee to TT. Since TT is connected, this must have generated a cycle since there was already a path between the endpoints of ee. This cycle must therefore have some edge across the cut (S,VS)(S,V-S). Remove this edge to produce T=T{e}{e}T'=T\cup \{ e \}-\{ e' \}. TT' is connected, since

Removing a cycle edge cannot disconnect a graph.

By tree properties, it is also a tree. Consider the weight of TT'.

weight(T)=weight(T)+w(e)w(e)\text{weight}(T')=\text{weight}(T)+w(e)-w(e')

Since ee is the lightest crossing edge, w(e)w(e)w(e)\leq w(e'). Thus, weight(T)weight(T)    \text{weight}(T')\leq\text{weight}(T)\implies TT' is an MST.

5.1.3 Kruskal's Algorithm

The aforementioned greedy approach is Kruskal's Algorithm. Consider any edge we add in this process. It's the lightest edge we have yet to add, by definition, and since it does not generate a cycle, it satisfies the cut property. Therefore, this edge is part of some MST, and the end result of the algorithm will be an MST.

Kruskal itself is implemented via a disjoint set union. This data structure efficiently merges disjoint sets of nodes and checks for cycles (whether or not two nodes are in the same disjoint set).

5.1.4 Disjoint Set Union

We store each set as a tree with a root node / representative. When checking if two nodes are in the same set, we simply check if they have the same root node. (find\text{find}).

When merging sets, the naive way is to just choose one of the sets' root nodes to be the new root node at random; a smarter way is to choose it based on rank or size. We'll consider rank. In essence, the root node of the set with the higher rank is made the new root node of both. That is, the shorter tree's root node points to the root of the taller tree. (union\text{union}). This gives a worst case complexity for find\text{find} and union\text{union} of O(logn)O(\log n), whereas the naive solution is O(n)O(n).

Proof.

Property 1. For any xx, rank(x)<rank(parent(x))\text{rank}(x)<\text{rank}(\text{parent}(x)).

A root node of rank kk is created by the merger of two trees with roots of rank k1k-1. It follows by induction that

Property 2. Any root node of rank kk has 2k\geq 2^{k} nodes in its tree.

This extends to internal nodes as well, actually. Importantly, different rank-kk nodes cannot have common descendants, which means that

Property 3. If there are nn elements overall, there can be at most n2k\frac{n}{2^{k}} nodes of rank kk.

That is, the maximum rank is logn\log n. This allows the implementation of find\text{find} and union\text{union} to have O(logn)O(\log n) time complexity.

Path Compression

With the above data structure, Kruskal's Algorithm has a runtime of O(ElogV)O(\lvert E \rvert \log \lvert V \rvert) for sorting the edges (since logElogV\log \lvert E \rvert \approx \log \lvert V \rvert) and a runtime of O(ElogV)O(\lvert E \rvert \log \lvert V \rvert) for DSU operations. However, what if the edges were already sorted? Then the DSU becomes the bottleneck.

We can actually develop more efficient operations for DSUs. Every time find\text{find} is called, we do path compression. Since find\text{find} is a recursive function that visits the parent until reaching the root, on its way back through the recursive stack, it can set the parent of each node directly to the root. This results in the amortized cost of the find\text{find} operation becoming O(1)O(1), since the distance between a node and its root will be reduced to 1 every time find\text{find} passes through that node. union\text{union} also becomes O(1)O(1) by extension. The formal proof of this will not be discussed here, though.

5.1.5 Prim's Algorithm

An alternative to Kruskal's Algorithm is Prim's Algorithm, which functions identically to Dijkstra's Algorithm except it tracks the minimum distance from the MST discovered so far for each node. It maintains a priority queue ordered by the current distance of each (MST-adjacent) node from the current MST, and extends the MST by one edge each time by popping a node of the priority queue, adding the corresponding cost-minimizing edge that connects this node, and relaxing all adjacent vertices to this node that are not currently in the MST.

Aside: Randomized Algorithm for Minimum Cut

Consider the last edge that Kruskal's Algorithm adds to the MST of an unweighted graph. This forms a cut (S,S)(S,\overline{S}) in the graph. Remarkably, there is a probability 1n2\geq \frac{1}{n^{2}} such that (S,S)(S,\overline{S}) is the minimum cut in the graph, where the size of the cut (S,S)(S,\overline{S}) is the number of edges cross between SS and S\overline{S}. In other words, repeating the process O(n2)O(n^{2}) times and outputting the smallest cut found is likely to identify the minimum cut in GG. This thus reveals an O(mn2logn)O(mn^{2}\log n) algorithm for unweighted minimum cuts. (And a bit of further optimization provides an O(n2logn)O(n^{2}\log n) algorithm).

Why does this work? At each stage of Kruskal's algorithm, the vertex set VV is partitioned into connected components. The only edges that can be added to the MST must be connecting two distinct components. Meanwhile, the number of edges incident to each component must be at least the size of the minimum cut in GG. Let this be CC. If there are kk components in the graph, the number of eligible edges is kC2\geq \frac{kC}{2}. (Each component has at least CC incident edges, divide by 2 for overcounting). The edges are randomly ordered since the graph is unweighted; therefore, the chance that the next eligible edge chosen is from the minimum cut is at most C/(kC2)=2kC/\left( \frac{kC}{2} \right)=\frac{2}{k}. Therefore, with probability 12k=k2k1-\frac{2}{k}=\frac{k-2}{k}, the minimum cut is left intact. Extending this to all iterations of Kruskal's, we get

n2nn3n12413=1n(n1)1n2\frac{n-2}{n}\cdot \frac{n-3}{n-1}\cdot \frac{2}{4} \cdot \frac{1}{3} = \frac{1}{n(n-1)}\approx \frac{1}{n^{2}}

5.2 Huffman Codes

Input is a text of length TT from an alphabet Γ\Gamma with nn letters and frequencies {fi:iΓ}\{ f_{i}:i\in\Gamma \}. Output is a cost-minimizing encoding of the alphabet for this text, given that

cost(T)=i(fi)(# of bits in encoding of i)\text{cost}(T)=\sum_{i}(f_{i})(\text{\# of bits in encoding of }i)

In order to ensure decodability, we require that an encoding cannot be a prefix of another encoding. We do this by creating a binary tree, in which left edges represent a 0 and right edges represent a 1, and only assigning leaves to be encodings.

Algorithm.
Sort symbols by ascending order of frequency. Associate each of these symbols with a node. Pop the first two symbols fif_{i}, fjf_{j} (lowest frequencies). Connect these two with a parent node, and push a symbol (either one, doesn't matter) with frequency fi+fjf_{i}+f_{j}. This corresponds to the parent node.

pq = priority_queue<freq, sym>;
for (int i = 1; i < n; i++) {
	pq.push(<freq[i], i>);
	// create node for i
}
for (int i = n; i < 2n - 1; i++) {
	a = pq.pop(); b = pq.pop();
	// create node for i
	pq.push(freq[a] + freq[b], i);
}

Proof.

Claim: there exists an optimal tree TT where the lowest two frequencies fi,fjf_{i},f_{j} are the deepest siblings.

Let OO be an optimal tree. Let fuf_{u} be the deepest node in OO; fuf_{u} must have a sibling fvf_{v}. Let fi,fjf_{i},f_{j} exist somewhere in the tree. Now, swap fif_{i} with fuf_{u} and fjf_{j} with fvf_{v} to produce a new tree MM. We note that fi,fjfu,fvf_{i},f_{j}\leq f_{u},f_{v}. Therefore, cost(M)cost(O)    cost(M)=cost(O)\text{cost}(M)\leq\text{cost}(O)\implies\text{cost}(M)=\text{cost}(O). Therefore, MM must be optimal, and the claim is proved true.

Lemma: Huffman finds the optimal tree

Base Case: n=2n=2
The tree encodes one symbol to 1 and the other to 0. This is clearly optimal.

Inductive Hypothesis: n=kn=k
Assume that the lemma is true for n=kn=k.

Inductive Step: n=k+1n=k+1
By the previous claim, we know there exists an optimal solution Ok+1O_{k+1} with fi,fjf_{i},f_{j} as the deepest nodes. Note that

cost(Ok+1)=cost(Ok)+fi+fj\text{cost}(O_{k+1})=\text{cost}(O_{k})+f_{i}+f_{j}

Where OnO_{n} is the tree formed after removing fi,fjf_{i},f_{j} but keeping fi+fjf_{i}+f_{j} as the new leaf node. This equality makes sense because adding back fi,fjf_{i},f_{j} will increase the cost of the subtree rooted at fi+fjf_{i}+f_{j} from (fi+fj)x(f_{i}+f_{j})x to fi(x+1)+fj(x+1)f_{i}(x+1)+f_{j}(x+1), where xx is the depth of the fi+fjf_{i}+f_{j} node.

Recall that Huffman will place fi,fjf_{i},f_{j} as siblings in the tree. Running Huffman on the smaller subproblem of size kk (removing fi,fjf_{i},f_{j} but keeping fi+fjf_{i}+f_{j} as a new symbol) will, by the induction hypothesis, generate an optimal tree. Let HkH_{k} represent the Huffman-generated tree of size kk. Then, we can write the following equalities:

cost(Hk)=cost(Ok)cost(Hk+1)=cost(Hk)+fi+fj\begin{align*} \text{cost}(H_{k}) &= \text{cost}(O_{k}) \\ \text{cost}(H_{k+1}) &= \text{cost}(H_{k})+f_{i}+f_{j} \end{align*}

The first is based on the fact that the cost of HkH_{k} is optimal, and the second is based on the aforementioned construction and extension to k+1k+1. We can then write

cost(Hk+1)=cost(Ok)+fi+fj=cost(Ok+1)\begin{align*} \text{cost}(H_{k+1}) &= \text{cost}(O_{k})+f_{i}+f_{j} \\ &= \text{cost}(O_{k+1}) \end{align*}

In other words, the solution that Huffman encoding produces is optimal.

\square

5.3 Horn Formulas

Horn formulas are a methodology expressing logical facts and deriving conclusions. Knowledge is represented by

Given these clauses, the goal is determining if there is an assignment of variables such that all clauses are true—this is a satisfying assignment. This can solved with a greedy algorithm

  1. Set all variables to false
  2. While there is an implication with an empty left side, set the variable on the right side to true and remove it from the left side of all other implications.
  3. Finally, if all pure negative clauses are satisfied, the assignment has been found; otherwise, the formulas are not satisfiable.

That is, this algorithm greedily chooses to guarantee satisfaction of all negative clauses at first. Then, it only sets variables to true if it's necessary to satisfy an implication.

5.4 Set Cover

Consider an undirected graph G=(V,E)G=(V,E). A set cover problem involves finding the set of vertices of minimum size such that, vV\forall v\in V, vv is in the set or at most one edge away from a vertex in the set.

A greedy solution that immediately comes to mind is to simply add the node that would cover the most new nodes. However, this solution does not necessarily produce the most optimal set cover! But, it is very close to optimal.

Claim. Suppose BB contains nn elements and that the optimal cover consists of kk sets. Then the greedy algorithm will use at most klnnk\ln n sets.

Note that, here, each set is one of the chosen nodes, as described above, along with the new nodes it covers.

Proof. Let ntn_{t} be the number of elements not covered after tt iterations of the greedy algorithm. Given that the optimal cover is kk sets, there must exist some set in the optimal cover that has at least nt/kn_{t}/k of the remaining elements. Thus,

nt+1ntntk=nt(11k)n_{t+1}\leq n_{t}-\frac{n_{t}}{k}=n_{t}\left( 1-\frac{1}{k} \right)

This implies that

ntn0(11k)tn_{t}\leq n_{0}\left( 1-\frac{1}{k} \right)^{t}

Since 1xex1-x\leq e^{ -x }, with equality only when x=0x=0, we can write

ntn0(11k)t<n0(e1/k)t=net/k    t=klnnn_{t}\leq n_{0}\left( 1-\frac{1}{k} \right)^{t}<n_{0}(e^{ -1/k })^{t}=ne^{ -t/k }\implies \boxed{ t=k\ln n }

As, at t=klnnt=k\ln n, nt<nelnn=1n_{t}<ne^{ -\ln n }=1, i.e. all elements are covered. Therefore, the ratio between the greedy algorithm's solution and the optimal solution is at most lnn\ln n. This is the approximation factor of the greedy algorithm. In fact, this is enough; there is, in fact, no provably polynomial time algorithm with a smaller approximation factor.