6. Dynamic Programming

6.1 Shortest paths in DAGs

Dynamic programming solutions are, implicitly, taking advantage of the linearization of a hidden DAG.

6.2 Longest Increasing Subsequences (LIS)

Naive DP

Memoize the length of the LIS ending at each index. Array is already a linearized DAG as is! Iterate forward through array, and find the longest LIS among previous indices such that this element could extend the LIS.

int n;
vector<int> v(n);
vector<int> dp(n);

for (int i = 0; i < n; i++) {
	dp[i] = 1;
	for (int j = 0; j < i; j++) {
		if (v[i] > v[j])
			dp[i] = max(dp[i], dp[j] + 1);
	}
}

It's also possible to maintain an array to track ancestors, to recover the LIS. This has a time complexity of $O(n^{2})$ .

Optimized DP

We can improve upon the previous algorithm. The idea behind this algorithm is that we only really need to maintain one "candidate LIS." In essence, we maintain a DP array that tracks the LIS, except, for each iteration (iterating forward), we either (1) append to the end of the candidate LIS or (2) replace an element in the LIS by binary searching for the smallest element in the LIS that is $\geq$ the current element. This candidate LIS does not really track the actual elements of the LIS, as one might notice. However, it does efficiently calculate the length of the LIS.

It's difficult to reason about why this works. The main intuition is that, for each iteration, all elements in the candidate LIS are elements before the current element in the array. And, critically, each element essentially records the LIS ending at that element. Thus, replacing an element in the candidate LIS with the current element promises to potentially lead to an LIS that is at least as good as the current one. Let the element $x$ at index $i$ in the candidate LIS, and let the current element under consideration be $y$ . Let $y$ be replacing $x$ in the candidate LIS. By definition, $y\leq x$ and $y$ is greater than the element before $x$ in the candidate LIS. Consider an LIS of which $x$ is a member of, but not the end of. Then, the information of this LIS is already tracked by a further element in the candidate LIS. Now, consider an LIS of which $x$ is the end of. note that any future LIS constructed by extending this LIS must use elements later than $y$ in the array; thus, it is clearly optimal to greedily choose to replace $x$ with $y$ in the candidate LIS, as $y$ can only ever lead to a better LIS than $x$ .

C++ code:

int n = arr.size();
vector<int> dp;
vector<int> arr(n);

dp.push_back(arr[0]);
for (int i = 1; i < n; i++) {
	if (arr[i] > dp.back()) {
		// new longest sequence
		dp.push_back(arr[i]);
	} else {
		int x = lower_bound(dp.begin(), dp.end(), arr[i]) - dp.begin();
		dp[x] = arr[i];
	}
}
return dp.size();

6.3 Edit Distance

Given two strings, the algorithm can perform 3 operations:

Insert a character into one of the strings
Delete a character from a string
Substitute a character in a string with another character

The goal is to find the minimum number of operations to make the two strings equal; i.e., the edit distance.

We can solve this problem with a 2D dynamic programming approach. Let $i$ denote the current index in $s_{1}$ and $j$ denote the current index in $s_{2}$ , where $s_{1},s_{2}$ are the input strings. Let $dp[i][j]$ denote the edit distance of the two strings $s_{1}[0:i]$ and $s_{2}[0:j]$ . Then we can note that $dp[i][j]$ is equivalent to the minimum of the following subproblems:

$dp[i][j-1]+1$ , i.e. the equivalent of adding a character to $s_{2}[0\dots (j-1)]$ or deleting a character from $s_{1}[0\dots i]$ after equalizing the two strings.
$dp[i-1][j]+1$ , i.e. the same as (1) but the two strings are switched.
$dp[i-1][j-1]+1$ if $s_{1}[i]\neq s_{2}[j]$ , i.e. substituting $s_{1}[i]$ with $s_{2}[j]$ , or vice versa.
$dp[i-1][j-1]$ if $s[i]=s_{2}[j]$ , i.e. the cost of equalizing $s_{1}[0\dots(i-1)]$ and $s_{2}[0\dots(j-1)]$ , since $s_{1}[i]$ is already equal to $s_{2}[j]$ .

Therefore, it suffices to construct a DP table for this problem. This clearly is $O(mn)$ in time complexity.

6.4 Knapsack

Problem: Given $n$ items with a weight and value $(w,v)$ and a total weight capacity $W$ , determine the maximum value $V$ that can be achieved without exceeding $W$ .

With Repetition

In this version of the problem, it assumes there is an infinite number of each item, i.e. the same item can be chosen multiple times.

Let $V(x)$ denote the maximum value $V$ that can be achieved without exceeding the capacity $x$ . Let $(w_{i},v_{i})$ denote the weight and value of item $i$ . Let $I$ denote the set of items. Then, it follows that

V(x)=\mathrm{min}(\{ V(x-w_{i})+v_{i},\ \forall i\in I:w_{i}\leq x\})

In other words, at each capacity, simply consider the addition of each item to a subproblem whose capacity is reduced by that item's weight. Hopefully this makes sense intuitively!

Of course, recursively finding this is computationally expensive. Instead, we memoize our results while iterating forwards through the possible capacities $\{ 1,\dots,W \}$ . Additionally, at each step, it's necessary to check $n$ subproblems. Thus, the total time complexity is $O(nW)$ .

Without Repetition

In this version, it assumes there is exactly one instance of each item, i.e. an item can only be chosen once.

With this problem, the strategy is somewhat similar. However, we reverse the order. Instead of iterating through the set of capacities $\{ 1,\dots,W \}$ and iterating through all items for each capacity, we iterate through all items and iterate through the set of capacities for each item. Additionally, we iterate the set of capacities in reverse, starting from $W$ . This change helps maintain the requirement that, for every member of the memoized DP array, it will only have considered each element once. (Iterating forward the capacities result in $V(x)$ being reused later for $V(x+w_{i})$ ).

Thus, since the iterations are essentially the same, the time complexity of this algorithm is $O(nW)$ too.

6.5 Chain Matrix Multiplication

There are, in fact, different costs to the order in which one multiplies a sequence of matrices $A_{1}\times\dots \times A_{n}$ . (Where one chooses to place parentheses, since matrix multiplication is associative). (Also, recall that the sizes of the matrices must be $a_{1}\times a_{2}$ , $a_{2}\times a_{3}$ , $\dots$ , $a_{n}\times a_{n+1}$ , resulting in the different costs of each matrix multiplication).

The greedy approach, to always perform the cheapest matrix, fails. We consider a dynamic programming approach instead. Note first that the cost of multiplying an $m\times n$ matrix by a $n\times p$ matrix is approximately $O(mnp)$ .

Now, think about the problem like divide and conquer first. Our problem is multiplying $A_{1}\times\dots \times A_{n}$ . In divide and conquer, we might choose a matrix $A_{k}$ , $1\leq k<n$ , such that we then compute the problems $A_{1}\times\dots \times A_{k}$ and $A_{k+1}\times\dots \times A_{n}$ separately, and then combining afterwards. However, we must also consider all possible $k$ , as it is not guaranteed that our split is optimal.

To write out the subproblems, we first generalize this conclusion by replacing $A_{1}$ with $A_{i}$ and $A_{n}$ with $A_{j}$ , just to make notation easier. Then, let $C(i,j)$ denote the cost of multiplying $A_{i}\times\dots \times A_{j}$ , where $1\leq i\leq j\leq n$ , with an obvious base case of $C(x,x)=0$ . Then,

C(i,j)=\mathrm{min}_{i\leq k<j}\{ C(i,k)+C(k+1,j)+a_{i-1}\cdot a_{k}\cdot a_{j} \}

This is sufficient to solve the problem, and we simply compute this for all $i,j$ , memoizing to avoid recomputation. This is a 2D table of size $n^{2}$ , and calculating each entry requires looping through, at most, $O(n)$ other entries. Thus, the total runtime is $O(n^{3})$ .

6.6 Shortest Paths

Shortest Reliable Paths

The shortest reliable path is defined as the shortest path from $s$ to $t$ that uses at most $k$ edges.

Let $dist(v,i)$ be the length of the shortest path from $s$ to $v$ that uses $i$ edges, where $i\leq k$ . Let $\ell(u,v)$ denote the weight of the edge from $u$ to $v$ . We can then define

dist(v,i)=\underset{ (u,v)\in E }{ \mathrm{min} }(dist(u,i-1)+\ell(u,v))

Thus, we have a DP relation. We can then essentially run a modified version of Bellman-Ford's, where we iterate $k$ times and maintain a 2D DP array $dist$ of size $k\times n$ . Since we need to check each edge at most twice for each iteration, the total runtime is $O(km)$ .

All-Pairs Shortest Paths

The Floyd-Warshall algorithm finds the shortest path between all pairs of vertices in a graph, and works even in the presence of negative edges. The pseudocode is as follows:

for (int k = 0; k < n; ++k) {
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
            if (d[i][k] < INF && d[k][j] < INF)
                d[i][j] = min(d[i][j], d[i][k] + d[k][j]); 
        }
    }
}

Source

In essence, Floyd-Warshall's maintains a 2-d matrix $d$ that keeps track of the distance from node $i$ and node $j$ in $d[i][j]$ . Before the $k$ th step, $d[i][j]$ stores the minimum distance path whose intermediate nodes are only within the set $\{ 1,\dots,k-1 \}$ . At each step, either $d[i][j]$ is still the minimum or, if $d[i][k]$ and $d[k][j]$ has been found, then there may be a new minimum path traversing through node $k$ .

The runtime of Floyd-Warshall's is clearly $O(n^{3})$ .

Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a known NP-hard problem. Essentially, given a graph of $n$ nodes and pairwise distances between nodes, it asks for the best order of traversing the nodes starting from some node $v$ such that each node is visited exactly once and the path begins and ends at $v$ . In essence, the minimum cost Hamiltonian tour.

The brute force approach evaluates every single tour, which gives a time complexity of $O(n!)$ . With dynamic programming, it's possible to reduce this to exponential time complexity. Consider how we might construct a subproblem. An intuitive idea is to consider the subproblem of a prefix of the tour, i.e. a path $v,\dots,u$ that does not fully complete the tour. More formally, for a subset of nodes $S\subseteq \{ 1,2,\dots,n \}$ that includes $1$ and the node $j\in S$ , let $C(S,j)$ be the length of the shortest path visiting each node in $S$ exactly once, starting at $1$ and ending at $j$ . In essence, a Hamiltonian path for a subset of the graph. Note that $C(S,1)=\infty$ since the path should not start and end at $1$ unless $S$ is precisely the entire set of nodes.

Now, we calculate $C(S,j)$ based on its subproblems. This involves essentially just iterating through all possible choices for the second-to-last node. In other words,

C(S,j)=\underset{ i\in S,i\neq j }{ \mathrm{min} }C(S-\{ j \},i)+dist(i,j)

There are at most $2^{n}\cdot n$ subproblems, and each takes linear time to combine. Thus, the total runtime is $O(n^{2}2^{n})$ .

6.7 Independent Sets in Trees

We denote a subset of nodes $S\subset V$ as an independent set of a graph $G=(V,E)$ if there are no edges between any nodes in $S$ . One problem is to find the largest independent set of a graph. This problem is, in general, intractable. However, in a tree, the problem can be solved in linear time using dynamic programming.

Let $dp(v)$ denote the size of the largest independent set of the subtree rooted at $v$ . We consider the possible subproblems. Assume first that we include the root node in the set. Then, we cannot include any of the children of the root node. However, we can include the union of the largest independent sets of each of its grandchildren. Assume next that we don't include the root node. Then, we can include the union of the largest independent sets of each of its children. Therefore, we have the following dp relation

dp(v)=\mathrm{max}\left( 1+\underset{ \text{grandchildren }w\text{ of }u }{ \sum } dp(w), \underset{ \text{children }w\text{ of }u}{ \sum } dp(w) \right)

Thus, we can root the node at an arbitrary node $r$ and simply perform DFS with memoization (calculating in the postorder routine). Thus, the overall runtime is $O(n+m)$ .