3. Decompositions of Graphs

3.1 Graph Representations

Adjacency matrix vs. adjacency list yap... not too relevant for an Advanced Algorithms class, in my opinion.

3.2 (Undirected) Depth-First Search

Pseudocode:

dfs(G, v):
	visited(v) = true
	previsit(v)
	for edge (v, u) in E:
		if !visited(u): dfs(u)
	postvisit(v)

The vertices and edges traversed by DFS form a tree.
A disconnected graph will have multiple DFS trees $\to$ forest

Asymptotic Analysis

Two steps:

Some fixed amount of work--marking the vertex as visited and then the time complexity of pre/postvisit work.
Recurse on adjacent edges if not visited.
The total work done in step 1 across all vertices is $O(\lvert V \rvert)$ (assuming pre/postvisit work is $O(1)$ ). The total work done in step $2$ is $O(\lvert E \rvert)$ , as the algorithm will traverse all edges for each vertex. Thus, the time complexity is $O(\lvert V \rvert+\lvert E \rvert)$ ; essentially $O(\lvert E \rvert)$ .

Connected Components

Just define previsit to be

previsit(v):
	ccnum[v] = cc

Where $cc$ is the identifying index of the CC passed around by DFS.

Pre-order and Post-order

previsit(v):
	pre[v] = clock
	clock++
	
postvisit(v):
	post[v] = clock
	clock++

Note that, for any two nodes $u,v$ , for the intervals $[pre[u],post[u]]$ and $[pre[v], post[v]]$ , either

they are entirely disjoint

one is entirely within the other

3.3 (Directed) DFS

Tree edges are part of the DFS forest
Forward edges lead from a node to a descendant that is not a direct child (i.e. more than one level apart)
Back edges lead to an ancestor in the tree
Cross edges lead to a vertex that is neither a descendant nor an ancestor
We can characterize these edges by the overlap of their pre/post intervals:

\begin{align*} \underset{ u } { [ } && \underset{ v }{ [ } && \underset{ v }{ ] } && \underset{ u }{ ] } &&&\implies \text{Tree/Forward} \\ \underset{ v } { [ } && \underset{ u }{ [ } && \underset{ u }{ ] } && \underset{ v }{ ] } &&&\implies\text{Back} \\ \underset{ v }{ [ } && \underset{ v }{ ] } && \underset{ u }{ [ } && \underset{ u }{ ] } &&&\implies \text{Cross} \end{align*}

There is a special type of directed graph known as a directed acyclic graph (DAG). This just denotes a directed graph without any cycle. It is possible to test for acylicity in linear time, with a single depth-first search.

A directed graph has a cycle $\iff$ its DFS reveals a back edge.

In other words, if the DFS reaches a vertex it has already visited before, the directed graph has a cycle.

DAGs are highly applicable to many real world situations. Frequently, it's desirable to linearize or topologically sort a DAG. This means identifying an ordering of the vertices such that each edge goes from an earlier to a later vertex in the sequence. All DAGs can be linearized! This is because of a very nice property that derives from the fact that DAGs cannot have back edges:

In a DAG, every edge leads to a vertex with a lower post number.

Therefore, we have an algorithm that runs in linear time that orders the nodes of a DAG (toposort). We can also conclude that the vertices with the smallest post number must be sinks. Similarly, the vertices with the highest post number must be sources. This gives us one more property.

Every DAG has at least one source and one sink.

Thus, an alternate approach to linearization is actually iteratively finding a source, appending it to the sequence, and then deleting it.

3.4 Strongly Connected Components

Description

Strongly connected components apply only to directed graphs. For directed graphs, we state that two nodes $u,v$ are connected if there exists a path from $u$ to $v$ and a path from $v$ to $u$ . This definition allows us to partition a directed graph into disjoint sets that are denoted as strongly connected components. In each set, the vertices are pairwise connected.

Now, if we shrink each SCC into a single "meta-node," and draw an edge from each meta-node to another if there is an edge in the same direction between their underlying components, we can create a "meta-graph" that is, in fact, a DAG. That is,

The meta-graph constructed by the strongly connected components of any directed graph is always a DAG.

Proof:
Let $U,V$ be two SCCs and let $u\in U$ and $v\in V$ . If there exists a bidirectional (back) edge between $U$ and $V$ , then there exists a path from $u$ to $v$ and $v$ to $u$ . This is because, since $U$ and $V$ are SCCs, it is possible for $u$ to traverse to the vertex with an edge to $V$ and then traverse to $v$ from there. The analogue can be said for $v$ to $u$ . This would mean $U\cup V$ is an SCC, a contradiction. Therefore, the meta-graph must be a DAG.

$\square$

This gives insight into the greater structure of a directed graph. In particular, at the top level, there exists a simpler DAG of the graph. Underlying this DAG, though, is the actual directed graph with SCCs in each of the DAG's nodes.

Strongly Connected Components Decomposition

We can actually decompose a directed graph into its SCCs in linear time.

Property 1: If DFS is started at node $u$ , it will terminate precisely when all nodes reachable from $u$ have been visited.

This means that, if we start DFS on a node that lies within an SCC that is a sink on the meta-graph, it will visit exactly the SCC, and nothing more.

Property 2: The node that receives the highest post number in a DFS must lie in a source SCC.

This is a result of a more general property:

Property 3: if $C$ and $C'$ are SCCs, and there exists an edge from a node in $C$ to a node in $C'$ , then the highest post number in $C$ is larger than the highest post number in $C'$ .

Proof: There are two cases to consider. If DFS visits component $C$ before $C'$ , then all of $C$ and $C'$ will be traversed before the procedure ends. By the properties of DFS, the post number of the first visited node in $C$ will be higher than the highest post number in $C'$ . Otherwise, if $C'$ is visited first, DFS will end after visiting all of $C'$ but before visiting any of $C$ , in which case the property follows immediately.

$\square$

We can restate Property 3 as "the SCCs can be linearized by arranging them in decreasing order of their highest post numbers."

Now, consider the reverse graph $G^{R}$ , which is the same as the original graph $G$ but with all edges reversed. $G^{R}$ has the same connected components as $G$ , but the meta-graph has its edges reversed! By Property 2, if we perform DFS on $G^{R}$ , the node with the highest post number will be a source SCC for the meta-graph of $G^{R}$ , but will be a sink SCC for the meta-graph of the original graph $G$ .

Once we have identified a sink SCC of $G$ , we then delete the sink from the graph. Then, by Property 3, the node with the highest post number among the remaining nodes will belong to another sink SCC of $G$ . Therefore, we can iteratively delete the sink from the graph until there are no nodes left. Thus, we have a linear time algorithm to decompose the graph into SCC DAGs.

More formally, we describe Kosajaru's Algorithm for decomposing a directed graph into its condensation graph, i.e. the DAG of its SCCs. The following is taken from cp-algorithms,

Run DFS 1 or more times on $G$ , computing postorder number for each vertex and yielding a list of the vertices sorted by postorder number.
Build the transpose graph $G^{T}$ , and repeatedly run DFS on the vertices in descending postorder. By the properties described above, the node with the highest postorder number is part of a source SCC of $G$ , but a sink SCC of $G^{T}$ . Therefore, running DFS on the node with highest postorder number will visit only the nodes of this sink SCC of $G^{T}$ . This essentially extends, to the rest of the nodes, as the algorithm visits the SCCs of $G^{T}$ in reverse topological order. This means it visits the SCCs of $G$ in topological order.