Logo

9. Coping with NP-Completeness

In this section, we discuss how to "cope" with NP-Complete problems. In other words, how to tackle NP-Complete problems you encounter in real life.

9.1 Intelligent Exhaustive Search

Backtracking

Backtracking is a technique that involves recursively exploring a search space, and then backpedaling steps that were taken. More importantly, though, backtracking actually takes advantage of the observation that it is often possible to reject a solution by looking at just a small portion of it.

Consider, for instance, a SAT problem with the clause (x1x2)(x_{1}\lor x_{2}). Then all assignments with x1=x2=falsex_{1}=x_{2}=\text{false} can be eliminated without further exploration. Thus, here, we are able to quickly check and reject this partial assignment. So, how can we systematize this idea to effectively "backtrack"?

The critical idea is to incrementally grow a tree of partial solutions. Whenever we encounter a partial solution that doesn't work, we can immediately prune this subtree, and explore it no further. In other words, we backtrack out of this subtree where we know there are no valid solutions.

Formally, a backtracking algorithm requires a test function that receives a subproblem and quickly determines if

  1. Failure: the subproblem has no solution
  2. Success: a solution to the subproblem is found
  3. Uncertainty: needs further exploration

For instance, for a SAT problem, if we recurse through the subproblem tree by, for each step/variable set, (1)(1) removing all clauses that are already satisfied by setting this variable and (2)(2) removing the variable from the remaining clauses that are not satisfied by setting this variable. Consider setting x=1x=1 in the SAT problem

(xy)(xyz)(yz)(x\lor y)(\overline{x}\lor \overline{y}\lor z)(y\lor z)

Then, it would produce the subproblem

(yz)(yz)(\overline{y}\lor z)(y\lor z)

Therefore, for the SAT problem, the "test" function would declare failure if the subproblem has an empty clause, success if the subproblem has no clauses remaining, and uncertainty otherwise.

Branch-and-Bound

We now generalize this principle of pruning the search space from search problems to optimization problems. WLOG, we will consider a generic minimization problem (a maximization problem will naturally follow a similar pattern).

The critical idea of branch-and-bound (for a minimization problem) is to quickly compute a lower bound on the cost function. This allows us to reject any subproblems whose lower bound is greater than the cost of an already-encountered solution (must be a full solution, not a subproblem!)—as this implies these subproblems must certainly not be optimal.

Consider an instance of TSP. We let a partial solution be a simple path aba\to b traveling through the set of vertices SVS\subseteq V, where a,bSa,b\in S. We denote the partial solution by [a,S,b][a,S,b]. For TSP, aa will actually always be the start vertex, and the initial problem is [a,{a},a][a,\{ a \},a]. Then, the corresponding subproblem is the best completion of the tour, i.e. the minimum cost complementary path bab\to a whose intermediates nodes are precisely VSV-S.

For each iteration of branch-and-bound, we extend a partial solution [a,S,b][a,S,b] by a single edge (b,x)(b,x) where xVSx \in V-S. There are at most VS\lvert V-S \rvert ways to do this, and each branch corresponds to a new subproblem [a,S{x},x][a,S\cup \{ x \},x].

Now, the important part—how can we quickly compute a lower-bound on the completion, i.e. path bab\to a, of the partial tour [a,S,b][a,S,b]. There are some very complicated solutions, but we will consider a very simple heuristic. The completion of the tour must consist of a path through VSV-S plus edges bβVSb\to \beta\in V-S and αa, αVS\alpha\to a,\ \alpha\in V-S. Thus, its cost is at least the sum of the

Make sure you understand why before moving on! But, in essence, since computing the minimum spanning tree of a graph has an efficient algorithm, this gives us our desired "quick" lower bound, which we can use to prune branches.

9.2 Approximation Algorithms

We now consider efficient algorithms that provide good approximations of the optimal solution to an NP-Complete problem. WLOG, we will consider minimization problems. First, a little bit of notation

Then, we say the approximation ratio of algorithm A\mathcal{A} is defined as

αA=maxIA(I)OPT(I)\alpha_{\mathcal{A}}=\underset{ I }{ \mathrm{max} } \frac{\mathcal{A}(I)}{\mathrm{OPT}(I)}

In other words, 1aA<1\leq a_{\mathcal{A}}<\infty and it measures, multiplicatively, how much worse algorithm A\mathcal{A}'s solution is for the worst-case input. That is, the worst possible ratio between A\mathcal{A}'s solution and the actual optimal solution observed across all instances II.

Vertex Cover

In Section 5.4, we already observed that the Set Cover problem can be approximated within a factor of O(logn)O(\log n) by a greedy algorithm. Since the Vertex Cover problem is just a special case of the Set Cover, we already have an algorithm with approximation ratio α=O(logn)\alpha=O(\log n).

However, there exists a better approximation algorithm that is based on the idea of matching, i.e. a subset of edges that have no vertices in common. We call a matching maximal if no more edges can be added to it.

A maximal matching is not the same as a maximum matching. A maximum matching is a matching with the most number of edges amongst all matchings. A maximal matching is simply a matching that cannot be expanded further without removing edges from it.

Critically, any matching in a graph GG provides a lower bound on OPT\mathrm{OPT} for the vertex cover problem. This is because each edge of the matching must have one of its endpoints in the set of any vertex cover, by definition!

Additionally, consider a set SS that contains both endpoints of each edge in a maximal matching MM. Then SS must be a vertex cover; otherwise, it must not touch some edge eEe\in E, which would imply that MM is not maximal since we could add ee to it—a contradiction! (Since, if ee is not covered by SS, ee has no vertices in common with any edge in MM).

Note that this cover has precisely 2M2\lvert M \rvert vertices, i.e. two vertices for each edge. (And no less because no edges can share a common vertex). From before, we know that each edge of a matching must have, at minimum, one of its endpoints in the set of any vertex cover. Therefore, any vertex cover must have size at least M\lvert M \rvert. Thus, the cover we found is at most twice the size of the optimal vertex cover. Therefore, an algorithm that efficiently finds the maximal matching would have an approximation ratio of α2\alpha\leq2.

It turns out that maximal matchings are exceedingly easy to generate—just repeatedly choose edges that are disjoint from those chosen already until no more can be added. Using a disjoint set union, this can be efficiently completed in O(nlogn)O(n\log n)!

Clustering

The clustering problem asks us to divide a set of data points into groups of "close" points.

First, let's formalize the notion of distance. Let dd denote the distance function, which may vary depending on the scenario. It must satisfy the usual metric properties

Then, the kk-Cluster problem may be described as follows.

Perhaps the easiest way to visualize this is covering a set of nn points in 2D space with kk circles of equal size, and trying to determine the smallest possible diameter of the circles.

This problem is NP\mathrm{NP}-Hard, as you might've guessed. However, it has an exceedingly simple approximation algorithm!

The key idea is to pick kk data points as cluster centers and then assign the remaining points to the center closest to it, where the cluster centers are chosen by iteratively choosing the next center to be as far as possible from the centers chosen so far (until kk centers are chosen).

The algorithm clearly always returns a valid partition, and, in fact, the resulting diameter is guaranteed to be at most twice OPT\mathrm{OPT}.

Let xXx \in X be the point farthest from μ1,,μk\mu_{1},\dots,\mu_{k} (i.e. the next center we would have chosen, if this was a k+1k+1-Cluster problem). Let r=mini d(x,μi)r=\underset{ i }{ \mathrm{min} }\ d(x,\mu_{i}). Then every point in XX must be within distance rr of its cluster center. This is because the definition of xx implies that xX\nexists x'\in X such that r=min id(x,μi)>rr'=\underset{ i }{ \mathrm{min}\ }d(x',\mu_{i})>r, i.e. there existed a point that was more than rr units away from all cluster centers, and thus xx was not, in fact, the farthest from μ1,,μk\mu_{1},\dots,\mu_{k}, a clear contradiction. Thus, every point in XX must be within distance rr of its cluster center, and, by the triangle inequality, every cluster has diameter at most 2r2r. Thus, this algorithm produces a solution with diameter at most 2r2r.

But how does this solution's diameter relate to the optimal solution? Well, consider that we have identified a set of k+1k+1 points {μ1,μ2,,μk,x}\{ \mu_{1},\mu_{2},\dots ,\mu_{k},x \} that have a pairwise distance of at least rr. Then, any partition into kk clusters must place two of these points within the same cluster—thus, the optimal partition into kk clusters must have a diameter of at least rr. In other words, the algorithm A\mathcal{A} presented above has an approximation ratio αA2\alpha_{\mathcal{A}}\leq2.

TSP

We can exploit the triangle inequality to produce an efficient approximation of variants of the Traveling Salesman Problem where the triangle inequality is satisfied! As with the previous algorithms, we consider an efficiently computable structure that is somewhat related to the best TSP tour—a minimum spanning tree.

Consider that removing any edge from a traveling salesman tour produces a path through all vertices, i.e. a spanning tree (albeit one without any branches). However, this does tell us that

TSP costthis path’s costMST cost\text{TSP cost}\geq \text{this path's cost}\geq \text{MST cost}

So, an MST already provides us with a nice lower bound on the TSP cost. However, remember that the goal of an approximation algorithm is to produce a solution that approximates the optimal TSP solution, not just bound the cost. How can we do this?

One thought is to use each edge of the MST twice, which can create a tour that visits all nodes at least once, but some nodes more than once. This tour has a length at most twice the MST cost, which therefore is at most twice the TSP cost, as evinced by the aforementioned inequality; however, this tour isn't quite a legal solution to the TSP, since it visits some nodes multiple times!

We can fix this by forcing the tour to simply skip any node it is about to revisit, and directly visit the next unvisited node in the list. By the triangle inequality, these bypasses can only make the overall tour shorter; thus, this algorithm A\mathcal{A} has an approximation ratio αA2\alpha_{\mathcal{A}}\leq2.

General TSP

But what TSP variants which the triangle inequality does not apply to? Well, as it turns out, these variants provably* cannot be approximated.

*"Provably" as in there does not exist an efficient approximation provided PNP\mathrm{P}\neq \mathrm{NP}.

Recall in [[8. NP-Complete Problems#8.3 Reductions, Reductions, Reductions#Rudrata Cycle to to TSP|Section 8.3]] that we discussed a reduction from the Rudrata Cycle/Path problem to the TSP problem. In short, we can reduce any Rudrata Path problem on a graph GG to an instance I(G,α)I(G,\alpha), where αZ+\alpha \in \mathbb{Z}^{+} and can be arbitrary, of TSP such that (1)(1) if GG has a Rudrata path, then OPT(I(G,α))=n\mathrm{OPT}(I(G,\alpha))=n and (2)(2) if GG has no Rudrata path, then OPT(I(G,α))n+α\mathrm{OPT}(I(G,\alpha))\geq n+\alpha. Interestingly enough, if any efficient, approximate solution for the general TSP existed, the Rudrata Path problem becomes efficiently solvable!

Let A\mathcal{A} be an efficient approximation algorithm for TSP with approximation ratio αA\alpha_{\mathcal{A}}. Consider an instance GG of Rudrata Path. Create an instance I(G,C)I(G,C) of TSP using the constant C=nαAC=n\alpha_{\mathcal{A}}. Then, A\mathcal{A} will output one of two things on II. (1)(1) It will output a tour of length at most αAOPT(I(G,C))=nαA\alpha_{\mathcal{A}}\mathrm{OPT}(I(G,C))=n\alpha _\mathcal{A}. Or (2)(2) it will output a tour of length at least OPT(I(G,C))>nαA\mathrm{OPT}(I(G,C))>n\alpha_{\mathcal{A}}. If (1)(1) occurs, then we can conclude that GG has a Rudrata Path, and return the produced TSP as the Rudrata Path!

Therefore, if PNP\mathrm{P}\neq \mathrm{NP}, there cannot exist a polynomial-time approximation algorithm for TSP since we know the Rudrata Path problem is NP\mathrm{NP}-Hard. Note that this conclusion is drawn from the gap property hinted towards in Section 8.3.

Knapsack

Finally, we come to an approximation algorithm for a maximization problem—knapsack! Recall that the dynamic programming solution we saw to the knapsack problem in Section 6.4 had a runtime of O(nW)O(nW) or O(nV)O(nV), which slight modification, which both are, unfortunately, not polynomial in just nn, since WW and VV can be exponential in nn.

Consider the O(nV)O(nV) algorithm. What if, if VV is extremely large, we just scale down each value? For instance, if

v1=117,586,003v2=738,493,291v3=238,827,453v_{1}=117,586,003\qquad v_{2}=738,493,291\qquad v_{3}=238,827,453

we could lose some precision but make the algorithm several orders of magnitude more efficient by using

v1=117v2=738v3=238v_{1}=117\qquad v_{2}=738\qquad v_{3}=238

.

More precisely, let vmax=maxi viv_{\mathrm{max}}=\underset{ i }{ \mathrm{max} }\ v_{i}. Then we may rescale each value viv_{i} to v^i=vinϵvmax\hat{v}_{i}=\left\lfloor v_{i}\cdot \frac{n}{\epsilon v_{\mathrm{max}}} \right\rfloor, and run the dynamic programming algorithm on this rescaled version of the problem. Note that ϵ>0\epsilon>0 is some approximation factor specified in the input to the algorithm.

Since the rescaled values v^i\hat{v}_{i} are all at most nϵ\frac{n}{\epsilon}, the algorithm runs in O(nV)=O(n(nnϵ))=O(n3/ϵ)O(nV)=O\left( n\left( n\cdot \frac{n}{\epsilon} \right) \right)=O(n^{3}/\epsilon). So the algorithm itself is efficient. What about the approximation factor, though?

As you may have guessed, ϵ\epsilon is related to the approximation factor of this algorithm. I'll spare the details, but it can be shown that, the approximation factor is αA1ϵ\alpha_{\mathcal{A}}\geq1-\epsilon, i.e. the value KK output by this algorithm is related to OPT\mathrm{OPT} by KOPT(1ϵ)K\geq \mathrm{OPT}(1-\epsilon).

Approximability Hierarchy

All NP\mathrm{NP}-Complete optimization problems can be classified into one of the following categories.

Remember that this all relies on PNP\mathrm{P}\neq \mathrm{NP}.
Also, note that these algorithms often perform much better than their worst-case approximation ratio.

9.3 Local Search Heuristics

Local search is a technique inspired by evolution—incrementally introducing random mutations and keeping those that improve performance. It can be applied to nay optimization problem!

The most naive version is iteratively replacing the current solution with a better one nearby, i.e. within its neighborhood. What exactly defines the "neighborhood" of a solution is the central design decision in local search.

TSP

We'll first consider the application of local search heuristics to TSP.

The most obvious choice of definition for "closeness" in TSP is if two tours differ in only a few edges. It's impossible for two tours to differ in only one edge, so we'll first consider two tours as "close" if they differ in exactly two edges. Then, let the 2-change neighborhood of tour ss be the set of tours that can be obtained by removing two edges of ss and adding in two distinct edges.

So... how well does this do? That is, what's the runtime, and does it always return the most optimal solution?

Unfortunately, it's not too clear what the answer to this question is. Each iteration is fast, since a tour only has O(n2)O(n^{2}) neighbors, but it's unclear how many iterations will be needed, and whether or not the final tour is globally optimal or just locally optimal.

We can somewhat reduce the local optimality limitation by, say, extending the neighborhood to 3-change; however, the size of a neighborhood increases to O(n3)O(n^{3}), thus increasing runtime. Moreover, extending to 3-change does not eliminate the problem entirely; globally suboptimal but locally optimal minima still exist, though fewer. Thus, there's a balance between the algorithm's runtime and its closeness to the global optimum; the right balance is typically found via iterative experimentation.

Graph Partitioning

Graph partitioning may be described as the following problem.

Note that we actually saw a special case of this in Section 8, Balanced Cut. In fact, we will consider an even stricter variant of Balanced Cut here for designing a local search heuristic: we will focus on the special case α=12\alpha=\frac{1}{2}. (Interestingly enough, the general problem of Graph Partitioning reduces to this special case).

Now, we must define what a neighborhood is / our notion of closeness. Let (A,B)(A,B) with A=B\lvert A \rvert=\lvert B \rvert be a candidate solution. Then, we will define its neighborhood to be the set of all solutions that can be obtained by swapping one pair of vertices across the cut, i.e.

{(A{a}+{b},B{b}+{a})(a,b)A×B}\{ (A-\{ a \}+\{ b \},B-\{ b \}+\{ a \})\mid (a,b)\in A\times B \}

As with TSP, though, many globally suboptimal but locally optimal solutions exist. How can we fix this?

Dealing with Local Optima

Randomization and Restarts

Randomization adds a bit of variance to our local search! It's typically applied in two ways: (1)(1) picking a random initial solution and (2)(2) randomly choosing a local move when several are optimal.

Randomization is best used by running local search repeatedly with a different random seed on each invocation. If the probability of reaching a good local optimum on any given run is pp, then it's expected that within O(1/p)O(1/p) runs, such a solution will be found.

Simulated Annealing

Randomization is great, but it's not always effective. For some problems, as the size grows, the ratio of bad to good local optima increases, often exponentially. Simulated annealing is a different approach that allows moves within the neighborhood that actually increase the cost—the idea is that locally suboptimal moves might actually bring the search out of a bad local optima and closer to a good local optima/the global optimum. Simulated annealing introduces a new parameter: a temperature TT, where a higher temperature increases the likelihood that locally suboptimal solutions are chosen for each move.

Typically, simulated annealing will start off with a large TT and slowly reduce it to zero (at which point it is effectively the same as normal local search). This allows the local search to initially wander around freely, and gradually begin to prefer the more optimal solutions, allowing it to converge on a solution.

Simulated annealing is a great strategy; however, it's also a much more expensive technique, as higher temperatures inevitably result in more local moves until convergence. Thus, it frequently requires significant experimentation to choose the annealing schedule, i.e. the schedule for slowly reducing the temperature. Ultimately, though, the solution quality improvements that simulated annealing provides are often worth it!

Fun Fact

Simulated annealing was inspired by the physics of crystallization. Search up "annealing" if you want to know more :)