Logo

6. Dynamic Programming

6.1 Shortest paths in DAGs

Dynamic programming solutions are, implicitly, taking advantage of the linearization of a hidden DAG.

6.2 Longest Increasing Subsequences (LIS)

Naive DP

Memoize the length of the LIS ending at each index. Array is already a linearized DAG as is! Iterate forward through array, and find the longest LIS among previous indices such that this element could extend the LIS.

int n;
vector<int> v(n);
vector<int> dp(n);

for (int i = 0; i < n; i++) {
	dp[i] = 1;
	for (int j = 0; j < i; j++) {
		if (v[i] > v[j])
			dp[i] = max(dp[i], dp[j] + 1);
	}
}

It's also possible to maintain an array to track ancestors, to recover the LIS. This has a time complexity of O(n2)O(n^{2}).

Optimized DP

We can improve upon the previous algorithm. The idea behind this algorithm is that we only really need to maintain one "candidate LIS." In essence, we maintain a DP array that tracks the LIS, except, for each iteration (iterating forward), we either (1) append to the end of the candidate LIS or (2) replace an element in the LIS by binary searching for the smallest element in the LIS that is \geq the current element. This candidate LIS does not really track the actual elements of the LIS, as one might notice. However, it does efficiently calculate the length of the LIS.

It's difficult to reason about why this works. The main intuition is that, for each iteration, all elements in the candidate LIS are elements before the current element in the array. And, critically, each element essentially records the LIS ending at that element. Thus, replacing an element in the candidate LIS with the current element promises to potentially lead to an LIS that is at least as good as the current one. Let the element xx at index ii in the candidate LIS, and let the current element under consideration be yy. Let yy be replacing xx in the candidate LIS. By definition, yxy\leq x and yy is greater than the element before xx in the candidate LIS. Consider an LIS of which xx is a member of, but not the end of. Then, the information of this LIS is already tracked by a further element in the candidate LIS. Now, consider an LIS of which xx is the end of. note that any future LIS constructed by extending this LIS must use elements later than yy in the array; thus, it is clearly optimal to greedily choose to replace xx with yy in the candidate LIS, as yy can only ever lead to a better LIS than xx.

C++ code:

int n = arr.size();
vector<int> dp;
vector<int> arr(n);

dp.push_back(arr[0]);
for (int i = 1; i < n; i++) {
	if (arr[i] > dp.back()) {
		// new longest sequence
		dp.push_back(arr[i]);
	} else {
		int x = lower_bound(dp.begin(), dp.end(), arr[i]) - dp.begin();
		dp[x] = arr[i];
	}
}
return dp.size();

6.3 Edit Distance

Given two strings, the algorithm can perform 3 operations:

  1. Insert a character into one of the strings
  2. Delete a character from a string
  3. Substitute a character in a string with another character

The goal is to find the minimum number of operations to make the two strings equal; i.e., the edit distance.

We can solve this problem with a 2D dynamic programming approach. Let ii denote the current index in s1s_{1} and jj denote the current index in s2s_{2}, where s1,s2s_{1},s_{2} are the input strings. Let dp[i][j]dp[i][j] denote the edit distance of the two strings s1[0:i]s_{1}[0:i] and s2[0:j]s_{2}[0:j]. Then we can note that dp[i][j]dp[i][j] is equivalent to the minimum of the following subproblems:

  1. dp[i][j1]+1dp[i][j-1]+1, i.e. the equivalent of adding a character to s2[0(j1)]s_{2}[0\dots (j-1)] or deleting a character from s1[0i]s_{1}[0\dots i] after equalizing the two strings.
  2. dp[i1][j]+1dp[i-1][j]+1, i.e. the same as (1) but the two strings are switched.
  3. dp[i1][j1]+1dp[i-1][j-1]+1 if s1[i]s2[j]s_{1}[i]\neq s_{2}[j], i.e. substituting s1[i]s_{1}[i] with s2[j]s_{2}[j], or vice versa.
  4. dp[i1][j1]dp[i-1][j-1] if s[i]=s2[j]s[i]=s_{2}[j], i.e. the cost of equalizing s1[0(i1)]s_{1}[0\dots(i-1)] and s2[0(j1)]s_{2}[0\dots(j-1)], since s1[i]s_{1}[i] is already equal to s2[j]s_{2}[j].

Therefore, it suffices to construct a DP table for this problem. This clearly is O(mn)O(mn) in time complexity.

6.4 Knapsack

Problem: Given nn items with a weight and value (w,v)(w,v) and a total weight capacity WW, determine the maximum value VV that can be achieved without exceeding WW.

With Repetition

In this version of the problem, it assumes there is an infinite number of each item, i.e. the same item can be chosen multiple times.

Let V(x)V(x) denote the maximum value VV that can be achieved without exceeding the capacity xx. Let (wi,vi)(w_{i},v_{i}) denote the weight and value of item ii. Let II denote the set of items. Then, it follows that

V(x)=min({V(xwi)+vi, iI:wix})V(x)=\mathrm{min}(\{ V(x-w_{i})+v_{i},\ \forall i\in I:w_{i}\leq x\})

In other words, at each capacity, simply consider the addition of each item to a subproblem whose capacity is reduced by that item's weight. Hopefully this makes sense intuitively!

Of course, recursively finding this is computationally expensive. Instead, we memoize our results while iterating forwards through the possible capacities {1,,W}\{ 1,\dots,W \}. Additionally, at each step, it's necessary to check nn subproblems. Thus, the total time complexity is O(nW)O(nW).

Without Repetition

In this version, it assumes there is exactly one instance of each item, i.e. an item can only be chosen once.

With this problem, the strategy is somewhat similar. However, we reverse the order. Instead of iterating through the set of capacities {1,,W}\{ 1,\dots,W \} and iterating through all items for each capacity, we iterate through all items and iterate through the set of capacities for each item. Additionally, we iterate the set of capacities in reverse, starting from WW. This change helps maintain the requirement that, for every member of the memoized DP array, it will only have considered each element once. (Iterating forward the capacities result in V(x)V(x) being reused later for V(x+wi)V(x+w_{i})).

Thus, since the iterations are essentially the same, the time complexity of this algorithm is O(nW)O(nW) too.

6.5 Chain Matrix Multiplication

There are, in fact, different costs to the order in which one multiplies a sequence of matrices A1××AnA_{1}\times\dots \times A_{n}. (Where one chooses to place parentheses, since matrix multiplication is associative). (Also, recall that the sizes of the matrices must be a1×a2a_{1}\times a_{2}, a2×a3a_{2}\times a_{3}, \dots, an×an+1a_{n}\times a_{n+1}, resulting in the different costs of each matrix multiplication).

The greedy approach, to always perform the cheapest matrix, fails. We consider a dynamic programming approach instead. Note first that the cost of multiplying an m×nm\times n matrix by a n×pn\times p matrix is approximately O(mnp)O(mnp).

Now, think about the problem like divide and conquer first. Our problem is multiplying A1××AnA_{1}\times\dots \times A_{n}. In divide and conquer, we might choose a matrix AkA_{k}, 1k<n1\leq k<n, such that we then compute the problems A1××AkA_{1}\times\dots \times A_{k} and Ak+1××AnA_{k+1}\times\dots \times A_{n} separately, and then combining afterwards. However, we must also consider all possible kk, as it is not guaranteed that our split is optimal.

To write out the subproblems, we first generalize this conclusion by replacing A1A_{1} with AiA_{i} and AnA_{n} with AjA_{j}, just to make notation easier. Then, let C(i,j)C(i,j) denote the cost of multiplying Ai××AjA_{i}\times\dots \times A_{j}, where 1ijn1\leq i\leq j\leq n, with an obvious base case of C(x,x)=0C(x,x)=0. Then,

C(i,j)=minik<j{C(i,k)+C(k+1,j)+ai1akaj}C(i,j)=\mathrm{min}_{i\leq k<j}\{ C(i,k)+C(k+1,j)+a_{i-1}\cdot a_{k}\cdot a_{j} \}

This is sufficient to solve the problem, and we simply compute this for all i,ji,j, memoizing to avoid recomputation. This is a 2D table of size n2n^{2}, and calculating each entry requires looping through, at most, O(n)O(n) other entries. Thus, the total runtime is O(n3)O(n^{3}).

6.6 Shortest Paths

Shortest Reliable Paths

The shortest reliable path is defined as the shortest path from ss to tt that uses at most kk edges.

Let dist(v,i)dist(v,i) be the length of the shortest path from ss to vv that uses ii edges, where iki\leq k. Let (u,v)\ell(u,v) denote the weight of the edge from uu to vv. We can then define

dist(v,i)=min(u,v)E(dist(u,i1)+(u,v))dist(v,i)=\underset{ (u,v)\in E }{ \mathrm{min} }(dist(u,i-1)+\ell(u,v))

Thus, we have a DP relation. We can then essentially run a modified version of Bellman-Ford's, where we iterate kk times and maintain a 2D DP array distdist of size k×nk\times n. Since we need to check each edge at most twice for each iteration, the total runtime is O(km)O(km).

All-Pairs Shortest Paths

The Floyd-Warshall algorithm finds the shortest path between all pairs of vertices in a graph, and works even in the presence of negative edges. The pseudocode is as follows:

for (int k = 0; k < n; ++k) {
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
            if (d[i][k] < INF && d[k][j] < INF)
                d[i][j] = min(d[i][j], d[i][k] + d[k][j]); 
        }
    }
}

Source

In essence, Floyd-Warshall's maintains a 2-d matrix dd that keeps track of the distance from node ii and node jj in d[i][j]d[i][j]. Before the kkth step, d[i][j]d[i][j] stores the minimum distance path whose intermediate nodes are only within the set {1,,k1}\{ 1,\dots,k-1 \}. At each step, either d[i][j]d[i][j] is still the minimum or, if d[i][k]d[i][k] and d[k][j]d[k][j] has been found, then there may be a new minimum path traversing through node kk.

The runtime of Floyd-Warshall's is clearly O(n3)O(n^{3}).

Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a known NP-hard problem. Essentially, given a graph of nn nodes and pairwise distances between nodes, it asks for the best order of traversing the nodes starting from some node vv such that each node is visited exactly once and the path begins and ends at vv. In essence, the minimum cost Hamiltonian tour.

The brute force approach evaluates every single tour, which gives a time complexity of O(n!)O(n!). With dynamic programming, it's possible to reduce this to exponential time complexity. Consider how we might construct a subproblem. An intuitive idea is to consider the subproblem of a prefix of the tour, i.e. a path v,,uv,\dots,u that does not fully complete the tour. More formally, for a subset of nodes S{1,2,,n}S\subseteq \{ 1,2,\dots,n \} that includes 11 and the node jSj\in S, let C(S,j)C(S,j) be the length of the shortest path visiting each node in SS exactly once, starting at 11 and ending at jj. In essence, a Hamiltonian path for a subset of the graph. Note that C(S,1)=C(S,1)=\infty since the path should not start and end at 11 unless SS is precisely the entire set of nodes.

Now, we calculate C(S,j)C(S,j) based on its subproblems. This involves essentially just iterating through all possible choices for the second-to-last node. In other words,

C(S,j)=miniS,ijC(S{j},i)+dist(i,j)C(S,j)=\underset{ i\in S,i\neq j }{ \mathrm{min} }C(S-\{ j \},i)+dist(i,j)

There are at most 2nn2^{n}\cdot n subproblems, and each takes linear time to combine. Thus, the total runtime is O(n22n)O(n^{2}2^{n}).

6.7 Independent Sets in Trees

We denote a subset of nodes SVS\subset V as an independent set of a graph G=(V,E)G=(V,E) if there are no edges between any nodes in SS. One problem is to find the largest independent set of a graph. This problem is, in general, intractable. However, in a tree, the problem can be solved in linear time using dynamic programming.

Let dp(v)dp(v) denote the size of the largest independent set of the subtree rooted at vv. We consider the possible subproblems. Assume first that we include the root node in the set. Then, we cannot include any of the children of the root node. However, we can include the union of the largest independent sets of each of its grandchildren. Assume next that we don't include the root node. Then, we can include the union of the largest independent sets of each of its children. Therefore, we have the following dp relation

dp(v)=max(1+grandchildren w of udp(w),children w of udp(w))dp(v)=\mathrm{max}\left( 1+\underset{ \text{grandchildren }w\text{ of }u }{ \sum } dp(w), \underset{ \text{children }w\text{ of }u}{ \sum } dp(w) \right)

Thus, we can root the node at an arbitrary node rr and simply perform DFS with memoization (calculating in the postorder routine). Thus, the overall runtime is O(n+m)O(n+m).