5. Greedy Algorithms

Lecture: Task Scheduling

Input is $n$ jobs with start and end times, i.e. $[(s_{1},t_{1}),\dots,(s_{n},t_{n})]$ . Schedule the maximum number of non-overlapping jobs, and return this number.

The solution is to greedily select the job that finishes first!

Proof.
Let $(s_{1},t_{1}),\dots,(s_{r},t_{r})$ denote the schedule produced by the greedy algorithm. Now, let $(s_{1}',t_{1}'),\dots,(s_{\ell}',t_{\ell}')$ denote a known optimal solution.

Claim 1: $r\leq \ell$ .

This follows from the fact that $\ell$ is optimal.

Claim 2: Let $H_{i}=(s_{1},t_{1}),\dots,(s_{i},t_{i}),(s_{i+1}',t_{i+1}'),\dots,(s_{\ell}',t_{\ell}')$ . Then, $H_{i}$ is optimal $\forall i\in \{ 0,\dots,r \}$ .

Base Case: $i=0$ .
$H_{0}$ is equivalent to the given optimal solution, so this is, by definition, optimal.

Inductive Hypothesis: $i=k$
Assume that $H_{i}$ is optimal for $i=k$ .

Inductive Step: $i=k+1$
Note that $t_{i}<s_{i+1}<t_{i+1}\leq t_{i+1}'< s_{i+2}'$ . Therefore, replacing $(s_{i+1}',t_{i+1}')$ with $(s_{i+1},t_{i+1})$ works just fine, and $H_{i+1}$ is optimal since $H_{i}$ with optimal. Thus, the greedy algorithm's solution is optimal.

$\square$

Runtime Analysis.
This algorithm requires first sorting the tasks by finishing time in ascending order, and then iterating over them in linear time. Therefore, the overall time complexity is $O(n\log n)$ .

5.1 Minimum Spanning Trees

Given an undirected graph $G=(V,E)$ with edges weights $w_{e}$ , the minimum spanning tree (MST) is a tree $T=(V,E')$ with $E'\subseteq E$ that minimizes

\text{weight}(T)=\sum_{e\in E'}w_{e}

Aside: Tree Properties

A tree is defined a an undirected graph that is connected an acyclic.

A tree with $\lvert V \rvert=n$ has $\lvert E \rvert=n-1$ .

Consider the construction of a tree, one edge at a time, starting from $n$ components, one vertex each. Each connection reduces the number of components by 1 (otherwise, it produces a cycle). Thus, $n-1$ edges leads to 1 component.

Any connected, undirected graph $G=(V,E)$ with $\lvert E \rvert=\lvert V \rvert-1$ is a tree. (Converse of previous).

Simply show $G$ is acyclic. While the graph contains a cycle, remove an edge from the cycle. The process terminates with $G'=(V,E'),E'\subseteq E$ that is acyclic and connected. Thus, $G$ is a tree $\implies$ $\lvert E' \rvert=\lvert V \rvert-1$ by the previous property $\implies E=E'$ $\implies$ no edges were removed $\implies$ $G$ was acyclic already $\implies$ $G$ is a tree.

An undirected graph is a tree $\iff$ there exists a unique path between any pair of nodes

Otherwise, there must be a cycle.

5.1.1 Greedy Approach

Just repeatedly add the next lightest edge that doesn't produce a cycle. The correctness of this algorithm relies on the cut property.

5.1.2 Cut Property

Definition. Suppose edges $X$ are part of a minimum spanning tree of $G=(V,E)$ . Pick a subset of nodes $S$ such that no edges in $X$ cross between $S$ and $V-S$ . Let $e$ be the lightest edge across this partition. Then, $X\cup \{ e \}$ is part of some MST.

Proof. If $e$ is part of $X$ 's MST, $T$ , then we are done. Thus, assume henceforth that $e\not\in T$ . Add edge $e$ to $T$ . Since $T$ is connected, this must have generated a cycle since there was already a path between the endpoints of $e$ . This cycle must therefore have some edge across the cut $(S,V-S)$ . Remove this edge to produce $T'=T\cup \{ e \}-\{ e' \}$ . $T'$ is connected, since

Removing a cycle edge cannot disconnect a graph.

By tree properties, it is also a tree. Consider the weight of $T'$ .

\text{weight}(T')=\text{weight}(T)+w(e)-w(e')

Since $e$ is the lightest crossing edge, $w(e)\leq w(e')$ . Thus, $\text{weight}(T')\leq\text{weight}(T)\implies$ $T'$ is an MST.

5.1.3 Kruskal's Algorithm

The aforementioned greedy approach is Kruskal's Algorithm. Consider any edge we add in this process. It's the lightest edge we have yet to add, by definition, and since it does not generate a cycle, it satisfies the cut property. Therefore, this edge is part of some MST, and the end result of the algorithm will be an MST.

Kruskal itself is implemented via a disjoint set union. This data structure efficiently merges disjoint sets of nodes and checks for cycles (whether or not two nodes are in the same disjoint set).

5.1.4 Disjoint Set Union

We store each set as a tree with a root node / representative. When checking if two nodes are in the same set, we simply check if they have the same root node. ( $\text{find}$ ).

When merging sets, the naive way is to just choose one of the sets' root nodes to be the new root node at random; a smarter way is to choose it based on rank or size. We'll consider rank. In essence, the root node of the set with the higher rank is made the new root node of both. That is, the shorter tree's root node points to the root of the taller tree. ( $\text{union}$ ). This gives a worst case complexity for $\text{find}$ and $\text{union}$ of $O(\log n)$ , whereas the naive solution is $O(n)$ .

Proof.

Property 1. For any $x$ , $\text{rank}(x)<\text{rank}(\text{parent}(x))$ .

A root node of rank $k$ is created by the merger of two trees with roots of rank $k-1$ . It follows by induction that

Property 2. Any root node of rank $k$ has $\geq 2^{k}$ nodes in its tree.

This extends to internal nodes as well, actually. Importantly, different rank- $k$ nodes cannot have common descendants, which means that

Property 3. If there are $n$ elements overall, there can be at most $\frac{n}{2^{k}}$ nodes of rank $k$ .

That is, the maximum rank is $\log n$ . This allows the implementation of $\text{find}$ and $\text{union}$ to have $O(\log n)$ time complexity.

Path Compression

With the above data structure, Kruskal's Algorithm has a runtime of $O(\lvert E \rvert \log \lvert V \rvert)$ for sorting the edges (since $\log \lvert E \rvert \approx \log \lvert V \rvert$ ) and a runtime of $O(\lvert E \rvert \log \lvert V \rvert)$ for DSU operations. However, what if the edges were already sorted? Then the DSU becomes the bottleneck.

We can actually develop more efficient operations for DSUs. Every time $\text{find}$ is called, we do path compression. Since $\text{find}$ is a recursive function that visits the parent until reaching the root, on its way back through the recursive stack, it can set the parent of each node directly to the root. This results in the amortized cost of the $\text{find}$ operation becoming $O(1)$ , since the distance between a node and its root will be reduced to 1 every time $\text{find}$ passes through that node. $\text{union}$ also becomes $O(1)$ by extension. The formal proof of this will not be discussed here, though.

5.1.5 Prim's Algorithm

An alternative to Kruskal's Algorithm is Prim's Algorithm, which functions identically to Dijkstra's Algorithm except it tracks the minimum distance from the MST discovered so far for each node. It maintains a priority queue ordered by the current distance of each (MST-adjacent) node from the current MST, and extends the MST by one edge each time by popping a node of the priority queue, adding the corresponding cost-minimizing edge that connects this node, and relaxing all adjacent vertices to this node that are not currently in the MST.

Aside: Randomized Algorithm for Minimum Cut

Consider the last edge that Kruskal's Algorithm adds to the MST of an unweighted graph. This forms a cut $(S,\overline{S})$ in the graph. Remarkably, there is a probability $\geq \frac{1}{n^{2}}$ such that $(S,\overline{S})$ is the minimum cut in the graph, where the size of the cut $(S,\overline{S})$ is the number of edges cross between $S$ and $\overline{S}$ . In other words, repeating the process $O(n^{2})$ times and outputting the smallest cut found is likely to identify the minimum cut in $G$ . This thus reveals an $O(mn^{2}\log n)$ algorithm for unweighted minimum cuts. (And a bit of further optimization provides an $O(n^{2}\log n)$ algorithm).

Why does this work? At each stage of Kruskal's algorithm, the vertex set $V$ is partitioned into connected components. The only edges that can be added to the MST must be connecting two distinct components. Meanwhile, the number of edges incident to each component must be at least the size of the minimum cut in $G$ . Let this be $C$ . If there are $k$ components in the graph, the number of eligible edges is $\geq \frac{kC}{2}$ . (Each component has at least $C$ incident edges, divide by 2 for overcounting). The edges are randomly ordered since the graph is unweighted; therefore, the chance that the next eligible edge chosen is from the minimum cut is at most $C/\left( \frac{kC}{2} \right)=\frac{2}{k}$ . Therefore, with probability $1-\frac{2}{k}=\frac{k-2}{k}$ , the minimum cut is left intact. Extending this to all iterations of Kruskal's, we get

\frac{n-2}{n}\cdot \frac{n-3}{n-1}\cdot \frac{2}{4} \cdot \frac{1}{3} = \frac{1}{n(n-1)}\approx \frac{1}{n^{2}}

5.2 Huffman Codes

Input is a text of length $T$ from an alphabet $\Gamma$ with $n$ letters and frequencies $\{ f_{i}:i\in\Gamma \}$ . Output is a cost-minimizing encoding of the alphabet for this text, given that

\text{cost}(T)=\sum_{i}(f_{i})(\text{\# of bits in encoding of }i)

In order to ensure decodability, we require that an encoding cannot be a prefix of another encoding. We do this by creating a binary tree, in which left edges represent a 0 and right edges represent a 1, and only assigning leaves to be encodings.

Algorithm.
Sort symbols by ascending order of frequency. Associate each of these symbols with a node. Pop the first two symbols $f_{i}$ , $f_{j}$ (lowest frequencies). Connect these two with a parent node, and push a symbol (either one, doesn't matter) with frequency $f_{i}+f_{j}$ . This corresponds to the parent node.

pq = priority_queue<freq, sym>;
for (int i = 1; i < n; i++) {
	pq.push(<freq[i], i>);
	// create node for i
}
for (int i = n; i < 2n - 1; i++) {
	a = pq.pop(); b = pq.pop();
	// create node for i
	pq.push(freq[a] + freq[b], i);
}

Proof.

Claim: there exists an optimal tree $T$ where the lowest two frequencies $f_{i},f_{j}$ are the deepest siblings.

Let $O$ be an optimal tree. Let $f_{u}$ be the deepest node in $O$ ; $f_{u}$ must have a sibling $f_{v}$ . Let $f_{i},f_{j}$ exist somewhere in the tree. Now, swap $f_{i}$ with $f_{u}$ and $f_{j}$ with $f_{v}$ to produce a new tree $M$ . We note that $f_{i},f_{j}\leq f_{u},f_{v}$ . Therefore, $\text{cost}(M)\leq\text{cost}(O)\implies\text{cost}(M)=\text{cost}(O)$ . Therefore, $M$ must be optimal, and the claim is proved true.

Lemma: Huffman finds the optimal tree

Base Case: $n=2$
The tree encodes one symbol to 1 and the other to 0. This is clearly optimal.

Inductive Hypothesis: $n=k$
Assume that the lemma is true for $n=k$ .

Inductive Step: $n=k+1$
By the previous claim, we know there exists an optimal solution $O_{k+1}$ with $f_{i},f_{j}$ as the deepest nodes. Note that

\text{cost}(O_{k+1})=\text{cost}(O_{k})+f_{i}+f_{j}

Where $O_{n}$ is the tree formed after removing $f_{i},f_{j}$ but keeping $f_{i}+f_{j}$ as the new leaf node. This equality makes sense because adding back $f_{i},f_{j}$ will increase the cost of the subtree rooted at $f_{i}+f_{j}$ from $(f_{i}+f_{j})x$ to $f_{i}(x+1)+f_{j}(x+1)$ , where $x$ is the depth of the $f_{i}+f_{j}$ node.

Recall that Huffman will place $f_{i},f_{j}$ as siblings in the tree. Running Huffman on the smaller subproblem of size $k$ (removing $f_{i},f_{j}$ but keeping $f_{i}+f_{j}$ as a new symbol) will, by the induction hypothesis, generate an optimal tree. Let $H_{k}$ represent the Huffman-generated tree of size $k$ . Then, we can write the following equalities:

\begin{align*} \text{cost}(H_{k}) &= \text{cost}(O_{k}) \\ \text{cost}(H_{k+1}) &= \text{cost}(H_{k})+f_{i}+f_{j} \end{align*}

The first is based on the fact that the cost of $H_{k}$ is optimal, and the second is based on the aforementioned construction and extension to $k+1$ . We can then write

\begin{align*} \text{cost}(H_{k+1}) &= \text{cost}(O_{k})+f_{i}+f_{j} \\ &= \text{cost}(O_{k+1}) \end{align*}

In other words, the solution that Huffman encoding produces is optimal.

$\square$

5.3 Horn Formulas

Horn formulas are a methodology expressing logical facts and deriving conclusions. Knowledge is represented by

Implications, of the form $(x_{1}\land x_{2}\land \dots \land x_{n})\implies y$ .
Pure negative clauses, of the form $(\overline{x_{1}}\lor \overline{x_{2}}\lor\dots \lor \overline{x_{n}})$ .

Given these clauses, the goal is determining if there is an assignment of variables such that all clauses are true—this is a satisfying assignment. This can solved with a greedy algorithm

Set all variables to false
While there is an implication with an empty left side, set the variable on the right side to true and remove it from the left side of all other implications.
Finally, if all pure negative clauses are satisfied, the assignment has been found; otherwise, the formulas are not satisfiable.

That is, this algorithm greedily chooses to guarantee satisfaction of all negative clauses at first. Then, it only sets variables to true if it's necessary to satisfy an implication.

5.4 Set Cover

Consider an undirected graph $G=(V,E)$ . A set cover problem involves finding the set of vertices of minimum size such that, $\forall v\in V$ , $v$ is in the set or at most one edge away from a vertex in the set.

A greedy solution that immediately comes to mind is to simply add the node that would cover the most new nodes. However, this solution does not necessarily produce the most optimal set cover! But, it is very close to optimal.

Claim. Suppose $B$ contains $n$ elements and that the optimal cover consists of $k$ sets. Then the greedy algorithm will use at most $k\ln n$ sets.

Note that, here, each set is one of the chosen nodes, as described above, along with the new nodes it covers.

Proof. Let $n_{t}$ be the number of elements not covered after $t$ iterations of the greedy algorithm. Given that the optimal cover is $k$ sets, there must exist some set in the optimal cover that has at least $n_{t}/k$ of the remaining elements. Thus,

n_{t+1}\leq n_{t}-\frac{n_{t}}{k}=n_{t}\left( 1-\frac{1}{k} \right)

This implies that

n_{t}\leq n_{0}\left( 1-\frac{1}{k} \right)^{t}

Since $1-x\leq e^{ -x }$ , with equality only when $x=0$ , we can write

n_{t}\leq n_{0}\left( 1-\frac{1}{k} \right)^{t}<n_{0}(e^{ -1/k })^{t}=ne^{ -t/k }\implies \boxed{ t=k\ln n }

As, at $t=k\ln n$ , $n_{t}<ne^{ -\ln n }=1$ , i.e. all elements are covered. Therefore, the ratio between the greedy algorithm's solution and the optimal solution is at most $\ln n$ . This is the approximation factor of the greedy algorithm. In fact, this is enough; there is, in fact, no provably polynomial time algorithm with a smaller approximation factor.