5. Greedy Algorithms
Lecture: Task Scheduling
Input is jobs with start and end times, i.e. . Schedule the maximum number of non-overlapping jobs, and return this number.
The solution is to greedily select the job that finishes first!
Proof.
Let denote the schedule produced by the greedy algorithm. Now, let denote a known optimal solution.
Claim 1: .
This follows from the fact that is optimal.
Claim 2: Let . Then, is optimal .
Base Case: .
is equivalent to the given optimal solution, so this is, by definition, optimal.
Inductive Hypothesis:
Assume that is optimal for .
Inductive Step:
Note that . Therefore, replacing with works just fine, and is optimal since with optimal. Thus, the greedy algorithm's solution is optimal.
Runtime Analysis.
This algorithm requires first sorting the tasks by finishing time in ascending order, and then iterating over them in linear time. Therefore, the overall time complexity is .
5.1 Minimum Spanning Trees
Given an undirected graph with edges weights , the minimum spanning tree (MST) is a tree with that minimizes
Aside: Tree Properties
A tree is defined a an undirected graph that is connected an acyclic.
A tree with has .
Consider the construction of a tree, one edge at a time, starting from components, one vertex each. Each connection reduces the number of components by 1 (otherwise, it produces a cycle). Thus, edges leads to 1 component.
Any connected, undirected graph with is a tree. (Converse of previous).
Simply show is acyclic. While the graph contains a cycle, remove an edge from the cycle. The process terminates with that is acyclic and connected. Thus, is a tree by the previous property no edges were removed was acyclic already is a tree.
An undirected graph is a tree there exists a unique path between any pair of nodes
Otherwise, there must be a cycle.
5.1.1 Greedy Approach
Just repeatedly add the next lightest edge that doesn't produce a cycle. The correctness of this algorithm relies on the cut property.
5.1.2 Cut Property
Definition. Suppose edges are part of a minimum spanning tree of . Pick a subset of nodes such that no edges in cross between and . Let be the lightest edge across this partition. Then, is part of some MST.
Proof. If is part of 's MST, , then we are done. Thus, assume henceforth that . Add edge to . Since is connected, this must have generated a cycle since there was already a path between the endpoints of . This cycle must therefore have some edge across the cut . Remove this edge to produce . is connected, since
Removing a cycle edge cannot disconnect a graph.
By tree properties, it is also a tree. Consider the weight of .
Since is the lightest crossing edge, . Thus, is an MST.
5.1.3 Kruskal's Algorithm
The aforementioned greedy approach is Kruskal's Algorithm. Consider any edge we add in this process. It's the lightest edge we have yet to add, by definition, and since it does not generate a cycle, it satisfies the cut property. Therefore, this edge is part of some MST, and the end result of the algorithm will be an MST.
Kruskal itself is implemented via a disjoint set union. This data structure efficiently merges disjoint sets of nodes and checks for cycles (whether or not two nodes are in the same disjoint set).
5.1.4 Disjoint Set Union
We store each set as a tree with a root node / representative. When checking if two nodes are in the same set, we simply check if they have the same root node. ().
When merging sets, the naive way is to just choose one of the sets' root nodes to be the new root node at random; a smarter way is to choose it based on rank or size. We'll consider rank. In essence, the root node of the set with the higher rank is made the new root node of both. That is, the shorter tree's root node points to the root of the taller tree. (). This gives a worst case complexity for and of , whereas the naive solution is .
Proof.
Property 1. For any , .
A root node of rank is created by the merger of two trees with roots of rank . It follows by induction that
Property 2. Any root node of rank has nodes in its tree.
This extends to internal nodes as well, actually. Importantly, different rank- nodes cannot have common descendants, which means that
Property 3. If there are elements overall, there can be at most nodes of rank .
That is, the maximum rank is . This allows the implementation of and to have time complexity.
Path Compression
With the above data structure, Kruskal's Algorithm has a runtime of for sorting the edges (since ) and a runtime of for DSU operations. However, what if the edges were already sorted? Then the DSU becomes the bottleneck.
We can actually develop more efficient operations for DSUs. Every time is called, we do path compression. Since is a recursive function that visits the parent until reaching the root, on its way back through the recursive stack, it can set the parent of each node directly to the root. This results in the amortized cost of the operation becoming , since the distance between a node and its root will be reduced to 1 every time passes through that node. also becomes by extension. The formal proof of this will not be discussed here, though.
5.1.5 Prim's Algorithm
An alternative to Kruskal's Algorithm is Prim's Algorithm, which functions identically to Dijkstra's Algorithm except it tracks the minimum distance from the MST discovered so far for each node. It maintains a priority queue ordered by the current distance of each (MST-adjacent) node from the current MST, and extends the MST by one edge each time by popping a node of the priority queue, adding the corresponding cost-minimizing edge that connects this node, and relaxing all adjacent vertices to this node that are not currently in the MST.
Aside: Randomized Algorithm for Minimum Cut
Consider the last edge that Kruskal's Algorithm adds to the MST of an unweighted graph. This forms a cut in the graph. Remarkably, there is a probability such that is the minimum cut in the graph, where the size of the cut is the number of edges cross between and . In other words, repeating the process times and outputting the smallest cut found is likely to identify the minimum cut in . This thus reveals an algorithm for unweighted minimum cuts. (And a bit of further optimization provides an algorithm).
Why does this work? At each stage of Kruskal's algorithm, the vertex set is partitioned into connected components. The only edges that can be added to the MST must be connecting two distinct components. Meanwhile, the number of edges incident to each component must be at least the size of the minimum cut in . Let this be . If there are components in the graph, the number of eligible edges is . (Each component has at least incident edges, divide by 2 for overcounting). The edges are randomly ordered since the graph is unweighted; therefore, the chance that the next eligible edge chosen is from the minimum cut is at most . Therefore, with probability , the minimum cut is left intact. Extending this to all iterations of Kruskal's, we get
5.2 Huffman Codes
Input is a text of length from an alphabet with letters and frequencies . Output is a cost-minimizing encoding of the alphabet for this text, given that
In order to ensure decodability, we require that an encoding cannot be a prefix of another encoding. We do this by creating a binary tree, in which left edges represent a 0 and right edges represent a 1, and only assigning leaves to be encodings.
Algorithm.
Sort symbols by ascending order of frequency. Associate each of these symbols with a node. Pop the first two symbols , (lowest frequencies). Connect these two with a parent node, and push a symbol (either one, doesn't matter) with frequency . This corresponds to the parent node.
pq = priority_queue<freq, sym>;
for (int i = 1; i < n; i++) {
pq.push(<freq[i], i>);
// create node for i
}
for (int i = n; i < 2n - 1; i++) {
a = pq.pop(); b = pq.pop();
// create node for i
pq.push(freq[a] + freq[b], i);
}
Proof.
Claim: there exists an optimal tree where the lowest two frequencies are the deepest siblings.
Let be an optimal tree. Let be the deepest node in ; must have a sibling . Let exist somewhere in the tree. Now, swap with and with to produce a new tree . We note that . Therefore, . Therefore, must be optimal, and the claim is proved true.
Lemma: Huffman finds the optimal tree
Base Case:
The tree encodes one symbol to 1 and the other to 0. This is clearly optimal.
Inductive Hypothesis:
Assume that the lemma is true for .
Inductive Step:
By the previous claim, we know there exists an optimal solution with as the deepest nodes. Note that
Where is the tree formed after removing but keeping as the new leaf node. This equality makes sense because adding back will increase the cost of the subtree rooted at from to , where is the depth of the node.
Recall that Huffman will place as siblings in the tree. Running Huffman on the smaller subproblem of size (removing but keeping as a new symbol) will, by the induction hypothesis, generate an optimal tree. Let represent the Huffman-generated tree of size . Then, we can write the following equalities:
The first is based on the fact that the cost of is optimal, and the second is based on the aforementioned construction and extension to . We can then write
In other words, the solution that Huffman encoding produces is optimal.
5.3 Horn Formulas
Horn formulas are a methodology expressing logical facts and deriving conclusions. Knowledge is represented by
- Implications, of the form .
- Pure negative clauses, of the form .
Given these clauses, the goal is determining if there is an assignment of variables such that all clauses are true—this is a satisfying assignment. This can solved with a greedy algorithm
- Set all variables to false
- While there is an implication with an empty left side, set the variable on the right side to true and remove it from the left side of all other implications.
- Finally, if all pure negative clauses are satisfied, the assignment has been found; otherwise, the formulas are not satisfiable.
That is, this algorithm greedily chooses to guarantee satisfaction of all negative clauses at first. Then, it only sets variables to true if it's necessary to satisfy an implication.
5.4 Set Cover
Consider an undirected graph . A set cover problem involves finding the set of vertices of minimum size such that, , is in the set or at most one edge away from a vertex in the set.
A greedy solution that immediately comes to mind is to simply add the node that would cover the most new nodes. However, this solution does not necessarily produce the most optimal set cover! But, it is very close to optimal.
Claim. Suppose contains elements and that the optimal cover consists of sets. Then the greedy algorithm will use at most sets.
Note that, here, each set is one of the chosen nodes, as described above, along with the new nodes it covers.
Proof. Let be the number of elements not covered after iterations of the greedy algorithm. Given that the optimal cover is sets, there must exist some set in the optimal cover that has at least of the remaining elements. Thus,
This implies that
Since , with equality only when , we can write
As, at , , i.e. all elements are covered. Therefore, the ratio between the greedy algorithm's solution and the optimal solution is at most . This is the approximation factor of the greedy algorithm. In fact, this is enough; there is, in fact, no provably polynomial time algorithm with a smaller approximation factor.