8. NP-Complete Problems

8.1 Search Problems

Throughout this book (a.k.a. throughout these notes), we have discussed efficient algorithms that frequently search for a desired solution in polynomial runtime among an exponential search space. However, not all such problems can be efficiently solved.

SAT

Consider the SAT or Satisfiability problem. We are provided a Boolean formula in conjunctive normal form (CNF), which is composed of the logical and (conjunction) of several clauses, each the logical or (disjunction) of several literals, where a literal is either a Boolean variable or the not of one. For instance, here's an example of a CNF formula.

(x\lor y\lor z)(x\lor \overline{y})(y\lor \overline{z})(z\lor \overline{x})(\overline{x}\lor \overline{y}\lor \overline{z})

The SAT problem asks us to find an assignment of the Boolean variables such that this statement is true or else report that none exists. This problem, in particular, does not have a solution. But, in general, it's difficult to efficiently find a solution: the search space is $2^{n}$ , where $n$ is the number of variables, and there is no known efficient algorithm for the general SAT problem.

Note, though, that the SAT problem is again a typical search problem. Provided an instance $I$ (CNF formula), we are asked to find a solution $S$ or report that none exists. In particular, a search problem must also have the property that $S$ 's validity as a solution for $I$ can be checked in polynomial runtime. Formally,

A search problem is specified by an algorithm $\mathcal{C}$ that takes two inputs, an instance $I$ and a proposed solution $S$ , and runs in time polynomial in $\lvert I \rvert$ . We say $S$ is a solution to $I$ $\iff$ $\mathcal{C}(I,S)=\text{true}$

There do, however, exist two notable variants of SAT that can be efficiently solved. If each clause contains at most one positive literal (i.e. $x$ is positive, $\overline{x}$ is negative), then the Boolean formula is a Horn formula. We discussed an efficient greedy solution to this problem in Section 5.3. Meanwhile, if all clauses have at most two literals, then the problem is called a 2SAT problem, and it can be solved in linear time by identifying the SCCs of a graph constructed from the instance. (Recall the efficient algorithm we derived for this in Section 3.4).

Conversely, some problems that seem easier than the general SAT problem are still hard to solve! If all clauses have at most three literals, we have what is known as a 3SAT problem—and this problem is actually just as hard as the general SAT problem!

TSP

In the Traveling Salesman Problem, we are provided an instance with $n$ vertices $1,\dots,n$ , all pairwise distances between them, and a budget $b$ . The desired solution is a tour, a cycle that passes through each vertex precisely once, of total cost at most $b$ (or else report none).

Typically, TSP is actually discussed as an optimization problem, in which the shortest possible tour is desired. However, here we frame it as a search problem. Why? So that we can compare TSP with other search problems. Search problems are such a general description of problems that other types of problems can often be framed as search problems, like for TSP. More accurately, in this instance, the optimization version of TSP (which we will call Min-TSP henceforth) reduces to TSP. In essence, this means that solving Min-TSP is at least as hard as solving TSP. For TSP, this intuitively is because we can find a solution for Min-TSP by binary searching the budget $b$ for TSP. We formalize this notion later.

Also, interestingly enough, one might note that Min-TSP seems like a difficult analogue to the minimum spanning tree problem, which we have developed efficient greedy algorithms for in Section 5.1. In essence, Min-TSP just replaces the minimum spanning tree with a more difficult constraint—a tour. This simple change results in an exponentially harder problem (literally!).

Eulerian and Rudrata/Hamiltonian Paths

Now, we turn to another sibling pair of graph problems.

Several hundred years ago, Euler sought to solve a problem—given a graph $G$ , does there exist a path in the graph such that each edge in the graph is traversed exactly once? It turns out that this problem has an exceedingly elegant solution—there exists a Eulerian path $\iff$ all vertices, except for two (start, end), must have even degree.

As before, let's now define this problem as a search problem. The Eulerian path problem provides an instance with a graph $G$ , and the desired solution is a valid Eulerian path. We won't discuss the solution in detail—but, it follows from Euler's observation that the search problem can also be solved in polynomial time.

Before Euler solved his namesake graph problem, though, Rudrata sought to solve another, very similar problem. Given a graph $G$ , does there exist a path in the graph such that each vertex in the graph is traversed exactly once? Again, we frame this problem as a search problem, too. The Rudrata path (also known as Hamiltonian path) problem provides an instance with a graph $G$ , and the desired solution is a valid Rudrata path.

This seems very similar to Euler's problem! Despite this, there is no known efficient algorithm to solve the Rudrata path problem.

You may have heard the more familiar namesakes Eulerian cycle or Hamiltonian cycle, rather than the path variants described above. Well, as we will see later, these problems are effectively equivalent to their path variants!

Cuts and Bisections

Recall our minimum cut problem from Section 7.2. As aforementioned, a cut is a set of edges whose removal leaves a graph disconnected. We can generalize the minimum cut problem to a search problem: given an instance with a graph $G$ and budget $b$ , the desired solution is a cut with at most $b$ edges.

Of course, the minimum cut problem has a polynomial time algorithm. However, there exist other cut problems that, as far as we know, cannot be efficiently solved! In particular, the Balanced Cut problem is a variant of the general cut search problem: given an instance with a graph $G$ and budget $b$ , the desired solution is a partition of vertices into sets $S,T$ such that $\lvert S \rvert,\lvert T \rvert\geq \frac{n}{3}$ and there are at most $b$ edges between $S$ and $T$ .

Balanced cuts actually have many important applications, such as clustering! However, its intractability forces such applications to use efficient approximations of solutions.

Integer LP

We discussed linear programming extensively in Section 7, and noted that, though the simplex algorithm does not always run in polynomial time, there does exist a polynomial algorithm for linear programming—at least, when the solutions are allowed to be anything in $\mathbb{R}$ . When we constrain solutions to be in $\mathbb{Z}$ , i.e. Integer Linear Programming (ILP), the problem becomes intractable!

More formally, let us frame ILP as the following search problem: given an instance of an $m\times n$ matrix $\mathbf{A}$ and an $m$ -vector $\mathbf{b}$ , the desired solution is a nonnegative integer vector $\mathbf{x}$ satisfying $\mathbf{Ax}\leq \mathbf{b}$ .

There is additionally a special case of ILP that is just as hard, known as Zero-One Equations (ZOE). In particular, we constrain $\mathbf{A}$ to have entries of only either $0$ or $1$ , $\mathbf{x}$ to be a vector of only $0$ 's and $1$ 's, and $\mathbf{b}=\mathbf{1}$ (the $m$ -vector of all $1$ 's).

Three-Dimensional Matching

Recall the bipartite matching problem we discussed in Section 7.3, which can be efficiently solved with the Edmonds-Karp algorithm. There exists an intractable generalization of this known as 3D Matching: given an instance with $n$ boys, $n$ girls, and $n$ pets, and compatibilities among them described by a set of triples of the form ( $b$ , $g$ , $p$ ), the desired solution is a set of $n$ disjoint triples that is a subset of the set of compatible triples.

Independent Set, Vertex Cover, Clique

The Independent Set problem: given an instance with a graph $G$ and an integer $k$ , the desired solution is a set of $k$ independent vertices, i.e. no pair of vertices have an edge between them. Recall from Section 6.7 that, via dynamic programming, we could efficiently solve this problem on a tree. However, for general graphs, there exists no known polynomial-time algorithm!

The Vertex Cover problem: given an instance with a graph $G$ and budget $b$ , the desired solution is a set of $b$ vertices such that each edge in the graph is adjacent to at least one vertex in this set. This is actually a special case of the Set Cover problem: given an instance with a set $E$ , $S_{1},\dots,S_{m}\subseteq E$ , and a budget $b$ , the desired solution is a set of $b$ subsets such that their union is $E$ . Note that we briefly discussed a greedy approximation of a solution to the minimum size Set Cover problem in Section 5.4. Regardless, these problems are both intractable.

There's an interesting parallel between the Independent Set and Vertex Cover problems! Try and see if you can figure it out :)

We'll discuss why this is the case soon... sorry for the suspense :P

3D Matching is also a special case of the Set Cover problem! Try and figure this one out too :>

$E$ is all $n$ boys, $n$ girls, and $n$ pets, the subsets $S_{1},\dots,S_{m}$ are precisely the triples of $(b,g,p)$ , and the budget is $n$ .

Finally, the Clique problem: given an instance with a graph $G$ and goal $k$ , the desired solution is a set of $k$ vertices such that all possible edges between them are present (i.e. these vertices form a fully connected subgraph in $G$ ).

Longest Path

Recall from Section 4 that the shortest path problem can be solved efficiently using Dijkstra's Algorithm or Bellman-Ford's Algorithm. What about the analogue, longest path? Formally, given an instance with a graph $G$ whose edges are assigned nonnegative edge weights, two vertices $s,t$ , and a goal $g$ , the desired solution is a simple path from $s\to t$ with total weight at least $g$ .

It might seem that simply negating all the edge weights, and running a shortest path algorithm that can deal with negative weights on this graph would efficiently solve this problem. However, the issue with this is that these algorithms do not actually find the shortest simple path on a graph with a negative cycle! And, a negative cycle can certainly be created since all edges in the longest path problem are nonnegative (so negating them will make them all nonpositive). And, in fact, there is no known polynomial time algorithm to solve the longest path problem.

Knapsack and Subset Sum

We now frame knapsack as a search problem: given an instance with a weight capacity $W$ , a goal $g$ , and a set of $n$ items with weights $w_{i}$ and vales $v_{i}$ , the desired solution is a set of items whose total weight is at most $W$ and whose total value is at least $g$ .

In Section 6.4, we discussed algorithms for solving the knapsack problem that run in $O(nW)$ time. Is this polynomial time, though? Well, not in $n$ , since $W$ can be arbitrarily large. If we try and remove $W$ from the time complexity, then it turns out that the best we can do is $O(2^{n})$ , checking all possible subsets of items. So, there isn't actually a polynomial time algorithm for knapsack!

Actually, we can derive a polynomial time for a somewhat contrived variant: Unary Knapsack, in which we represent, e.g., $3$ as $III$ . This isn't too useful, but, yes, this does have a polynomial time algorithm, due to the limitation this places on $W$ .

Meanwhile, an equally hard special case of knapsack is also of particular interest. The Subset Sum problem: given an instance with $n$ integers $x_{i}$ and a capacity $W$ , the desired solution is a subset of the provided integers that add up to precisely $W$ . Note that this is a special case of knapsack where $v_{i}=w_{i}$ for each item $i$ and $g=W$ , and that the constraints $\sum v_{i}\geq g$ and $\sum w_{i}\leq W$ in knapsack is equivalent to $\sum w_{i}=W$ here.

Why is this special case of knapsack interesting, if it's equally as hard as knapsack? The point is simplicity; as we'll soon see, exploring reductions between problems is much easier with simpler problems like Subset Sum.

8.2 NP-Complete Problems

Tractability

Consider the following table of sibling hard vs. easy search problems (we'll discuss what NP-Complete and P are soon).

Hard (NP-Complete)	Easy (P)
3SAT	2SAT, Horn SAT
TSP	MST
Longest Path	Shortest Path
3D Matching	Bipartite Matching
Knapsack	Unary Knapsack
Independent Set	Independent Set (Trees)
ILP	LP
Rudrata Path	Eulerian Path
Balanced Cut	Minimum Cut

On the right are easy problems, which are all easy for a variety of different reasons/algorithms (DP, greedy, etc.). On the left, though, are all hard problems, which researchers have been unable to efficiently solve over many, many years. The fascinating thing about these hard problems is that they are all difficult for the same reason! This may seem incredibly counterintuitive—they're all different problems, and vary widely. How could they possibly all be the same difficulty?! Well, as we'll soon discuss, these are actually all the same problem! That is, these problems are all equivalent; as we'll soon discuss, these problems can all be reduced to each other.

P, NP

No, P/NP does not mean Pass/No Pass here. $\mathrm{P}$ and $\mathrm{NP}$ are what's known as complexity classes. You've probably heard of these before too; perhaps in the context of the Millennium Problems—each problem within this set of problems has a $1 million bounty, and one such problem is proving (or disproving!) that $\mathrm{P}\neq \mathrm{NP}$ . But what do these actually mean?

Formally, $\mathrm{NP}$ denotes the class of all search problems. We formally defined these [[#8.1 Search Problems|above]], but, in short, a search problem must possess an efficient algorithm $C$ to check if a solution $S$ is valid for a given instance $I$ . Efficient, here, means polynomial in $\lvert I \rvert$ . Note that this includes both intractable (e.g. 3SAT) and tractable (e.g. 2SAT) problems.

$\mathrm{P}$ , meanwhile, denotes the class of all search problems that can be solved in polynomial time. Importantly, $\mathrm{P}\subseteq \mathrm{NP}$ . (And thus, the Millennium problem essentially seeks to prove the conjecture $\mathrm{P}\subset \mathrm{NP}$ ; generally speaking, we assume $\mathrm{P}\neq \mathrm{NP}$ is true. Proving the contrary would mean that all these "hard" problems are solvable in polynomial time!).

Reductions

Well, now that we've defined our complexity classes, and now that we've clarified our assumption that $\mathrm{P}\neq \mathrm{NP}$ , how do we prove that these hard problems have no efficient algorithm? This is done via reductions, transformations that show one problem is at least as hard as another. Researchers have leveraged reductions to show that all the problems on the left side of the above table are essentially the exact same problem, as aforementioned. What's more, they have shown that these problems are in fact the hardest search problems in $\mathrm{NP}$ . If any of these problems has a polynomial time algorithm, then every problem in $\mathrm{NP}$ would have a polynomial time algorithm.

Let's first formalize the notion of a reduction. A reduction from search problem $A$ to search problem $B$ is a pair of polynomial time algorithms $(f,h)$ such that

$f$ transforms any instance $I$ of $A$ into an instance $f(I)$ of $B$
$h$ transforms any solution $S$ of $f(I)$ for $B$ into a solution $h(S)$ of $I$ for $A$ . Or, if $f(I)$ has no solution for $B$ , then it can be concluded that $I$ has no solution for $A$ .

Note how this reduction from $A\to B$ effectively shows that $B$ is at least as hard as $A$ , since, if there exists a polynomial time algorithm for $B$ , there naturally exists a polynomial time algorithm for $A$ derived from $B$ 's via the reduction.

Make sure you understand that a reduction from

A\to B

means that

B\geq A

, with regards to tractability/difficulty. This is easy to accidentally mix up!

Now, we can formally define the class of the hardest search problems, too (the left side of the above table). A search problem is $\mathrm{NP}$ -Complete if all other search problems reduce to it.

Composition of Reductions
One final note about reductions is that they compose, i.e. $A\to B$ and $B\to C$ implies $A\to C$ . This should hopefully make sense intuitively.

NP-Hard?

You might have heard of the term $\mathrm{NP}$ -Hard to refer to the hardest problems in computer science. So what's this $\mathrm{NP}$ -Complete term then?

Well, $\mathrm{NP}$ -Hard is actually yet another complexity class. $\mathrm{NP}$ -Hard denotes the class of all problems at least as hard as $\mathrm{NP}$ -Complete problems.

Wait—I thought the problems in

\mathrm{NP}

-Complete were the hardest problems??

Not quite. $\mathrm{NP}$ -Complete denotes the class of the hardest search problems. There exist non-search problems at least as hard as those in $\mathrm{NP}$ -Complete—those are the ones in $\mathrm{NP}$ -Hard.

Note that, similar to how $\mathrm{P}\subseteq \mathrm{NP}$ , $\mathrm{NP}$ -Complete $\subseteq$ $\mathrm{NP}$ -Hard. Also, $\mathrm{NP}$ -Hard is not a subset of $\mathrm{NP}$ ! Remember that $\mathrm{NP}$ denotes the class of all search problems. Thus, $\mathrm{NP}$ -Complete denotes the intersection of $\mathrm{NP}$ -Hard and $\mathrm{NP}$ .

Finally, you should hopefully be able to understand the following diagram!

Factoring

Factoring is a famously difficult search problem: given an integer $n$ , the desired solution is the prime factorization of $n$ . So, does Shor's Algorithm, i.e. a polynomial time quantum algorithm to prime factorize a number, solve all of $\mathrm{NP}$ ?

Not quite. In fact, factoring isn't one of the hardest problems in $\mathrm{NP}$ , i.e. it isn't an $\mathrm{NP}$ -Complete problem! One critical difference between factoring and the $\mathrm{NP}$ -Complete problems is that, for factoring, there is no "or else" clause—there is always a prime factorization for $n$ . In any case, factoring is within $\mathrm{NP}$ , but not $\mathrm{NP}$ -Complete. So, while quantum computers are able to efficiently solve factoring, it still remains to be seen whether or not all of $\mathrm{NP}$ can be solved efficiently by quantum computers! (Though, current research seems to indicate that it is very likely that this is not the case).

8.3 Reductions, Reductions, Reductions

Rudrata $(s,t)$ -Path $\to$ Rudrata Cycle

Consider adding an additional vertex $x$ and two new edges $x\to s$ and $t\to x$ to the graph $G$ to produce $G'$ . Then, any Rudrata cycle of the graph $G$ must contain the edges $x\to s$ and $t\to x$ . Proving this reduction's validity requires showing that it works for $(1)$ when the Rudrata Cycle problem has a solution and $(2)$ when it does not. Moreover, it must be shown that $(3)$ the preprocessing and postprocessing functions take polynomial time.

1. Existing Solution
Then, the Rudrata $(s,t)$ -Path problem must also have a solution. Consider that removing the added edges from the Rudrata cycle of $G'$ produces the Rudrata path from $s\to t$ in $G$ , as desired, since the other half of the cycle, $t\to s$ , must traverse through $x$ (otherwise $x$ would not be visited by the cycle, and thus it would not be a valid Rudrata cycle for $G'$ ).

2. No Solution
It's typically easier to show the contrapositive, for this case. For this instance, it requires proving that, if there exists a Rudrata $(s,t)$ -path in $G$ , then there also exists a Rudrata cycle in $G'$ . This is trivially true; just add the edges to the Rudrata path to close the cycle.

3. Preprocessing & postprocessing
The preprocessing step just requires adding one node and two edges. The postprocessing step just requires removing two edges. Thus, these steps are clearly polynomial time.

Thus, solving the Rudrata Cycle problem is at least as hard as the Rudrata $(s,t)$ -Path problem. In other words, the Rudrata $(s,t)$ -Path problem reduces to the Rudrata Cycle. Below is a diagram that illustrates the reduction visually.

It's also possible to reduce in the other way, i.e. Rudrata Cycle $\to$ Rudrata $(s,t)$ -Path. In other words, this pair of reductions show that these two problems are essentially the same problem.

3SAT $\to$ Independent Set

This reduction is most succinctly described by the following diagram.

In essence, the idea is that precisely one literal is chosen to be true in every clause, in order to make that clause true. Moreover, each node, WLOG $x$ , is connected directly to all of its negations, i.e. all nodes $\overline{x}$ . This ensures that, if $x$ is chosen in a clause, $\overline{x}$ is never chosen in any other clause, since this would violate the property of an independent set. Note that multiple literals in a clause can actually be true, despite the graph ensuring only one is chosen for the independent set! This is okay, since we don't need to represent all literals that are true within a clause; only one needs to be true (in the independent set) to make that clause true. (Note that including one $x$ node in the independent set, i.e. marking it as true, will not immediately add all other $x$ nodes to the independent set; rather, it will simply preclude all $\overline{x}$ nodes from being part of the independent set).

SAT $\to$ 3SAT

This is an example of an interesting yet common reduction: reducing a problem to a special case of itself. In essence, this shows that the hardness of the problem is equivalent to the hardness of a special case.

The entire reduction relies on a single trick. Consider any clause in the SAT problem with more than three literals, i.e.

(a_{1}\lor a_{2}\lor\dots \lor a_{k}) \qquad(1)

where $k>3$ . Then, we can rewrite this as a set of 3SAT clauses

(a_{1}\lor a_{2}\lor y_{1})(\overline{y_{1}}\lor a_{3}\lor y_{2})(\overline{y_{2}}\lor a_{4}\lor y_{3})\dots (\overline{y_{k-3}}\lor a_{k-1}\lor a_{k}) \qquad(2)

The conversion is clearly polynomial in $k$ (and thus polynomial in the SAT instance $I$ 's size, $\lvert I \rvert$ ), and the conversion from 3SAT back to SAT is clearly $O(1)$ (since all variables were already solved for in 3SAT).

Make sure you see why these two expressions are equivalent! More formally,

(1)

is satisfied

\iff

(2)

is satisfied.

We can go even further than this, though. We can additionally reduce the 3SAT problem to a variant such that no variable appears in more than three clauses. Consider any variable $x$ that appears in $k>3$ clauses. Then, replace its $i$ th appearance with $x_{i}$ , and add the clause

(\overline{x_{1}}\lor x_{2})(\overline{x_{2}}\lor x_{3})\dots (\overline{x_{k}}\lor x_{1})

Note how $x_{i}$ appears thrice, twice in the above expression and once in its original clause.

Again, make sure you understand why the above expression ensures

x_{1}=x_{2}=\dots=x_{k}

Independent Set $\to$ Vertex Cover

We alluded to this reduction earlier in [[#8.1 Search Problems#Independent Set, Vertex Cover, Clique|Section 8.1]]. Consider that a set of nodes $S$ is a vertex cover of graph $G$ $\iff$ $V-S$ is an independent set of $G$ .

Forward Direction:
We proceed via proof by contraposition. Let $V-S$ not be an independent set of $G$ . Then, $\exists u,v\in V-S$ such that the edge $(u,v)\in E$ . Then, $u,v\not\in S$ , and thus the edge $(u,v)$ is not covered by $S$ . Therefore, $S$ is not a valid vertex cover.

Backward Direction:
Again, we can prove this by contraposition. The details are left as a (hopefully easy) exercise to the reader! >:)

Independent Set $\to$ Clique

Define the complement of a graph $G=(V,E)$ to be $\overline{G}=(V,\overline{E})$ , where $\overline{E}$ consists of all edges $(u,v)$ not in $E$ . Then a set of nodes $S$ is an independent set of $G$ $\iff$ $S$ is a clique of $\overline{G}$ . That is, the nodes in $S$ have no edges between them in $G$ $\iff$ all possible edges exist between them in $\overline{G}$ . This should hopefully make sense intuitively.

3SAT $\to$ 3D Matching

This is a very unintuitive reduction. But, I'll do my best to explain the transformation.

Consider the following 3D Matching diagram, where each triangle node represents a triple.

Suppose $b_{0},b_{1}$ and $g_{0},g_{1}$ are not involved in any other triples. Some of the pets $p_{0},\dots,p_{3}$ must belong to other triples, of course, since at most two triples can be chosen here (and thus at most two pets can be "used up" here). In fact, precisely two pets must be "used up" here, since, in order for a perfect matching, $b_{0},b_{1},g_{0},g_{1}$ must all be "used up" in this diagram, since they are not involved with any other triples.

Consider choosing the triple $(b_{0},g_{0},p_{1})$ . Then, this forces the other triple in this diagram to be $(b_{1},g_{1},p_{3})$ . Similarly, choosing the triple $(b_{0},g_{1},p_{0})$ forces the other triple to be $(b_{1},g_{0},p_{2})$ . Therefore, any matching involving this gadget must include either both $(b_{0},g_{0},p_{1})$ and $(b_{1},g_{1},p_{3})$ or both $(b_{0},g_{1},p_{0})$ and $(b_{1},g_{0},p_{2})$ . In other words, this gadget behaves like a Boolean variable!

So, to transform an instance of 3SAT into an instance of 3D Matching, we first create a gadget for each variable. Let the nodes for a variable $x$ be denoted $b_{x0},b_{x1},g_{x0},g_{x1},p_{x0},p_{x1},p_{x2},p_{x3}$ . We will let $x=\text{true}$ be denoted by the case where $b_{x0}$ is matched with girl $g_{x1}$ , and $x=\text{false}$ be denoted by the case where $b_{x0}$ is matched with $g_{x0}$ .

Now, consider a clause, e.g. $c=(x\lor \overline{y}\lor z)$ . For each clause, we introduce a new boy $b_{c}$ and new girl $g_{c}$ . We will associate each with three triples, one for each literal in the clause. In essence, we want this boy and girl pair to be matched with a single pet in one of the literals' gadgets, such that this match implies that this literal was chosen to be true.

For this clause, we can have $(1)\ x=\text{true},\ (2)\ y=\text{false},\ (3)\ z=\text{true}$ . For $(1)$ , we form the triple $(b_{c},g_{c},p_{x1})$ . This is because, if $x$ is chosen to be true, then $p_{x0},p_{x2}$ would be taken in the gadget. Then, the triple $(b_{c},g_{c},p_{x1})$ can be included in the 3D Matching, effectively marking that $x=\text{true}$ makes clause $c$ true. In contrast, if $x$ is chosen to be false, then $p_{x1}$ would be taken in the gadget $\implies$ $x$ is not used to make the clause true. We extend this to the other two literals, i.e. $y$ has the triple $(b_{c},g_{c},p_{y0})$ and $z$ has the triple $(b_{c},g_{c},p_{z1})$ .

Additionally, we have to ensure that, for every occurrence of a literal in clause $c$ , there is a different pet to match with $b_{c}$ and $g_{c}$ . For instance, if a literal is used in 5 different clauses, we won't have enough pets to match with $b_{c},g_{c}$ for each clause $c$ ! However, recall that we actually showed a reduction from 3SAT to a special case in which each literal appears in at most two clauses. Therefore, if we first reduce 3SAT to a special case, we can ensure each literal appears at most twice, in which case we actually do have enough pets—if a variable $x=\text{true}$ , we have two pets $p_{1}$ and $p_{3}$ that can each be matched in clauses, and if $x=\text{false}$ , we have $p_{0}$ and $p_{2}$ .

Note that a pet is only assigned if the corresponding literal is true in that clause. If the literal is

\overline{x}

, a pet is assigned to this literal if

x=\text{false}

. Conversely, if the literal is

x

, a pet is assigned if

x=\text{true}

But I thought the reduction showed that each literal appeared in at most three, not two, clauses?

I was confused about this too. Actually, the earlier reduction showed that each variable appeared in at most three clauses. Notably, though, in the expression derived to ensure $x_{1}=x_{2}=\dots=x_{k}$ , each variable was used once as $x_{i}$ and once as $\overline{x_{i}}$ . Meanwhile, the original clause contains one of $x_{i}$ or $\overline{x_{i}}$ . Therefore, each literal, i.e. $x_{i}$ and $\overline{x_{i}}$ are the literals, appears at most twice. Confusing, I know; I wish the book explained this.

We're almost done now. The last thing we need to ensure is that no pet is left unmatched. For instance, consider a variable that is used only in one clause—it'll be left with an unmatched pet, which isn't allowed in 3D Matching! More precisely, with $n$ variables and $m$ clauses in 3D SAT, precisely $2n-m$ pets will be left unmatched. Thus, it suffices to simply add $2n-m$ boy-girl couples that can match with every pet, and have them take up the remaining pets!

Why are exactly

2n-m

pets left unmatched?

Each variable's gadget will have two pets that are not matched within the gadget itself. Each clause will take precisely one pet. Thus, $2n-m$ pets left unmatched. Note that $2n-m>0$ always since each variable will appear at most thrice, and there are at least two literals in every clause.

3D Matching $\to$ ZOE

For each triple, create a variable $x_{i}$ , where $x_{i}=1$ means the triple was chosen. Our vector $\mathbf{x}=\langle x_{1},\dots,x_{n} \rangle$ . Then, for each boy/girl/pet, let the triples containing it are $x_{j_{1}},x_{j_{2}},\dots,x_{j_{k}}$ . Then, we can set a constraint

\sum_{i=1}^{k} x_{j_{i}}=1

since each boy/girl/pet can be included in at most one triple. Then, the rest is just solving the corresponding ZOE problem,

\mathbf{Ax}=\mathbf{1}

where each column of $\mathbf{A}$ corresponds to a triple, and each row corresponds to a boy/girl/pet.

ZOE $\to$ Subset Sum

This is a reduction between two special cases of ILP, and is literally just based on the fact that $0-1$ vectors can essentially act as binary representations of a number!

For instance, consider the ZOE problem with matrix

\mathbf{A}=\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}

Think of the columns as binary; then, effectively,

\mathbf{A}=\begin{bmatrix} 18 & 5 & 4 & 8 \end{bmatrix}

. Then,

\mathbf{Ax}=\mathbf{1}

reduces to the subset sum problem where we are searching for a subset of $\{ 18,5,4,8 \}$ that sums to $11111_{2}=31$ .

Well, not quite. There's still one more consideration—carries from addition can result in a valid solution for Subset Sum but an invalid solution for ZOE. To fix this, we can slightly modify the transformation: instead of thinking of the column vectors as integers in base $2$ , we consider them as integers in base $n+1$ . Therefore, this ensures that carries never occur, and thus a solution to Subset Sum is always a solution to ZOE.

ZOE $\to$ ILP

As aforementioned, ZOE is a special case of ILP. Then, this effectively means that ZOE reduces to ILP, since being able to solve ILP in polynomial time trivially equates to being able to solve ZOE in polynomial time, since there isn't even a transformation required.

This is a very useful way of establishing that a problem is

\mathrm{NP}

-Complete, i.e. by noting it is a generalization of a known

\mathrm{NP}

-Complete problem.

ZOE $\to$ Rudrata Cycle

This transformation is complex and fairly long, so I will attempt to explain it concisely as best as I can.

First, we will try to reduce ZOE to a generalization of Rudrata Cycle: Rudrata Cycle with Paired Edges, and then we will reduce this generalization to Rudrata Cycle. (Recall that this is valid since reductions are transitive).

ZOE $\to$ Rudrata Cycle with Paired Edges

In Rudrata Cycle with Paired Edges, a graph $G=(V,E)$ and a set $C\subseteq E\times E$ comprise the provided instance. The desired solution is a cycle that $(1)$ visits all vertices once and $(2)$ for every pair of edges $(e,e')\in C$ , traverses precisely one of the two.

Now, consider the structure of a ZOE problem. We have $n$ variables $x_{1},\dots,x_{n}$ that can either be $0$ or $1$ , and $m$ equations $x_{j_{1}}+\dots+x_{j_{k}}=1$ , where $j_{i}\in \{ 1,\dots,n \}$ . For variable $x_{i}$ , we create a component consisting of two nodes and two edges $e_{i0}$ and $e_{i1}$ between these two nodes, where the Rudrata cycle traversing $e_{i0}$ corresponds to $x_{i}=0$ , and it traversing $e_{i1}$ corresponds to $x_{i}=1$ . For equation $i$ , we create a component consisting of two nodes and $m$ edges $e_{i1}',\dots,e_{im}'$ , where e.g. the Rudrata cycle traversing $e'_{i1}$ corresponds to the variable $x_{j_{1}}=1$ in equation $i$ . Then, we compose these components as shown in the diagram below.

Then, for every equation, for every variable $x_{i}$ appearing in it, we add to $C$ the pair $(e,e')$ where $e$ corresponds to the edge representing $x_{i}$ in the equation and $e'$ corresponds to the edge represent $x_{i}=0$ . This ensures that if $x_{i}=0$ , it's never used as the value $1$ in any equation. Therefore, ZOE reduces to the Rudrata Cycle with Paired Edges problem.

Rudrata Cycle with Paired Edges $\to$ Rudrata Cycle

This transformation can be summed up in a single diagram.

In essence, by replacing every pair of edges $(e,e')$ with this gadget, we ensure that only one of $\{ e,e' \}$ is traversed. Thus, the Rudrata Cycle with Paired Edges problem reduces to the Rudrata Cycle problem, and by the transitivity of reductions, ZOE reduces to the Rudrata Cycle Problem.

What if some edge

e

is involved in multiple pairs?

Then we can simply concatenate all of its gadgets together. Make sure you see why this works!

Rudrata Cycle $\to$ TSP

Given a graph $G=(V,E)$ , construct a TSP where the set of cities is precisely the same as $V$ , the distance between cities $u$ and $v$ is $1$ if $(u,v)\in E$ and otherwise is $1+\alpha$ , for some $\alpha>1$ . The budget $b$ is $\lvert V \rvert$ .

If $G$ has a Rudrata cycle, then the same cycle is also a tour within the budget of the TSP instance, clearly. Conversely, if $G$ has no Rudrata cycle, then there is no solution, as the cheapest possible TSP tour has a cost $\geq n+\alpha$ (since it traverses at least one edge that doesn't exist in $G$ , and thus has length $1+\alpha$ ).

Why the

\alpha

parameter? Can't we just set

\alpha=1

Varying $\alpha$ can lead to two interesting results. If $\alpha=1$ , then this TSP instance actually satisfies the triangle inequality, which produces a special case of TSP that, as we will discuss in Section 9, can be efficiently approximated.

Conversely, if $\alpha$ is sufficiently large, then it has the property that it either $(1)$ has a solution of cost $\leq n$ or $(2)$ has a solution of cost $\geq n+\alpha$ . Critically, there are $0$ solutions with cost $c$ such that $n<c<n+\alpha$ . This is known as the gap property, and implies that unless $\mathrm{P}=\mathrm{NP}$ , there exists no efficient approximation algorithm.

Any Problem in $\mathrm{NP}$ $\to$ SAT

ts too hard... TL;DR is Any Problem in $\mathrm{NP}$ $\to$ Circuit SAT $\to$ SAT.

Summary

Now, you should hopefully understand the tree of reductions shown in the below diagram :)

Aside: Unsolvable Problems

For all $\mathrm{NP}$ -Complete problems, there at least exists some algorithm to solve each, even if intractably so. For such problems, there is no existing algorithm at all! One such problem is the arithmetical version of SAT, where the provided instance is a polynomial equation in many variables and the desired solution is a satisfying assignment of the variables. Another very famous unsolvable problem is the halting problem.

8. NP-Complete Problems

8.1 Search Problems

SAT

TSP

Eulerian and Rudrata/Hamiltonian Paths

Cuts and Bisections

Integer LP

Three-Dimensional Matching

Independent Set, Vertex Cover, Clique

Longest Path

Knapsack and Subset Sum

8.2 NP-Complete Problems

Tractability

P, NP

Reductions

NP-Hard?

Factoring

8.3 Reductions, Reductions, Reductions

Rudrata (s,t)(s,t)(s,t)-Path →\to→ Rudrata Cycle

3SAT →\to→ Independent Set

SAT →\to→ 3SAT

Independent Set →\to→ Vertex Cover

Independent Set →\to→ Clique

3SAT →\to→ 3D Matching

3D Matching →\to→ ZOE

ZOE →\to→ Subset Sum

ZOE →\to→ ILP

ZOE →\to→ Rudrata Cycle

ZOE →\to→ Rudrata Cycle with Paired Edges

Rudrata Cycle with Paired Edges →\to→ Rudrata Cycle

Rudrata Cycle →\to→ TSP

Any Problem in NP\mathrm{NP}NP →\to→ SAT

Summary

Aside: Unsolvable Problems

Rudrata $(s,t)$ -Path $\to$ Rudrata Cycle

3SAT $\to$ Independent Set

SAT $\to$ 3SAT

Independent Set $\to$ Vertex Cover

Independent Set $\to$ Clique

3SAT $\to$ 3D Matching

3D Matching $\to$ ZOE

ZOE $\to$ Subset Sum

ZOE $\to$ ILP

ZOE $\to$ Rudrata Cycle

ZOE $\to$ Rudrata Cycle with Paired Edges

Rudrata Cycle with Paired Edges $\to$ Rudrata Cycle

Rudrata Cycle $\to$ TSP

Any Problem in $\mathrm{NP}$ $\to$ SAT