Whilst working my way through 2015’s Advent of Code, I needed to perform a topological sort of a directed graph. Since I don’t tend to hold such algorithms in my head, I consulted the Wikipedia entry and found a description of Kahn’s algorithm. However, it was couched in terms of a procedural pseudocode, so I set about translating it to Clojure.
The essence of the process (as described in the article) is:
Create an empty list L for the sorted nodes
Find all the nodes of the graph that have no incoming edges, and bung them into another list S.
While the list S is non-empty
remove a node n from the head of S and add it to the tail of L
for each node m that is the destination of an outgoing edge e from n
remove e from the graph
if m now has no incoming edges, add it to the tail of S
If the graph still has any edges after stage 3, then it contains a cycle somewhere - so we can’t perform a topological sort. Return nil.
Otherwise, return the list L
The resulting topological sort is not unique - there may be many equally valid solutions.
The first thing to do was to choose a representation for the graph. The nodes are always going to need distinct IDs, and I might want them to contain any sort of data. Here’s the graph from the Wikipedia article in the form that I decided upon, where each node just contains a map with a label for the node:
(def test-graph
{:nodes {2 {:label "2"}
3 {:label "3"}
5 {:label "5"}
7 {:label "7"}
8 {:label "8"}
9 {:label "9"}
10 {:label "10"}
11 {:label "11"}}
:edges #{[5 11]
[11 2]
[7 11]
[7 8]
[8 9]
[11 9]
[11 10]
[3 8]
[3 10]}})
Next, we need to find all the "sources" (nodes with no incoming edges) in the graph. A straightforward way to do that is to remove all nodes that occur as the destinations of edges from the full set of nodes. For the example graph, we should expect it to return nodes 3, 5 and 7. I’ve returned the nodes as a set, for reasons that will become apparent later:
(defn sources
"Return the nodes of the graph that have no incoming edges"
[{:keys [nodes edges]}]
(set (apply dissoc nodes (map last edges))))
(sources test-graph)
;; #{[5 {:label "5"}] [3 {:label "3"}] [7 {:label "7"}]}
If the graph is a cycle, sources
ought to return an empty set, so let’s just check that:
(def cycle-graph
{:nodes {1 {:label "1"}
2 {:label "2"}
3 {:label "3"}}
:edges #{[1 2]
[2 3]
[3 1]}})
(sources cycle-graph)
;; #{}
Step 3 of the algorithm can be simplified a little. What it now boils down to is:
Remove the head h of S from the graph, along with all edges originating from it
Append h to L
Append any new sources of the resulting graph to the tail of S
We need a function to remove a node and its edges from the graph.
Strictly speaking, since S only contains sources, it doesn’t need to remove incoming edges -
but the following function does so in order to make it more useful outside the context
of the current problem. Handily, it also returns the whole graph if the node is nil
.
(defn remove-node
"Remove a node from a graph, along with the edges meeting it"
[{:keys [nodes edges]} [k _]]
{:nodes (dissoc nodes k)
:edges (into #{} (remove #((set %) k) edges))})
(clojure.pprint/pprint
(remove-node test-graph [11 {:label "11"}]))
;; {:nodes
;; {2 {:label "2"},
;; 3 {:label "3"},
;; 5 {:label "5"},
;; 7 {:label "7"},
;; 8 {:label "8"},
;; 9 {:label "9"},
;; 10 {:label "10"}},
;; :edges #{[8 9] [7 8] [3 10] [3 8]}}
This yields the desired result:
We’re now in a position to encode a step of the Kahn algorithm as a function:
(require '[clojure.set :as s])
(defn kahn-step
"Perform a step of the Kahn algorithm on result list l, sources s and graph g"
[[l s g]]
(let [node (first s)
g' (remove-node g node)
l' (if (nil? node) l (conj l node)) (1)
s' (-> (s/difference (sources g') (set s)) (2)
(into (rest s)) (3)
sort)] (4)
[l' s' g']))
1 | The first time around, the node will be nil as s is empty. |
2 | After removing the node from the graph, add any new sources we’ve introduced to s . |
3 | Omit the head of s , since we’ve now removed it from the graph. |
4 | Sort s . This isn’t strictly necessary, but otherwise the topological sort will
have a somewhat arbitrary (though correct) order. This way, we guarantee that the nodes will
appear in order of the smallest available key. |
Finally, we can perform the topological sort:
(defn topo-sort
"Perform a topological sort of the graph g"
[g]
(let [[l _ g'] (->> (rest (iterate kahn-step [[] [] g]))
(drop-while #(seq (second %)))
first)] (1)
(if (seq (:edges g')) (2)
nil
l)))
1 | Find the first iteration where the list of remaining sources is empty. |
2 | If the graph still contains edges when we’ve run out of sources, then we’ve got a cycle. |
So let’s check it out:
(clojure.pprint/pprint
(topo-sort test-graph))
;; [[3 {:label "3"}]
;; [5 {:label "5"}]
;; [7 {:label "7"}]
;; [8 {:label "8"}]
;; [11 {:label "11"}]
;; [2 {:label "2"}]
;; [9 {:label "9"}]
;; [10 {:label "10"}]]
We’d better make sure it does the right thing when the graph contains a cycle, so let’s introduce one into the test graph by reversing the direction of the arrow from 11 to 2, and adding an edge from 9 to 2. This creates a cycle containing nodes 2, 9 and 11:
(def cycle-test-graph
(assoc test-graph :edges #{[5 11]
[2 11]
[9 2]
[7 11]
[7 8]
[8 9]
[11 9]
[11 10]
[3 8]
[3 10]}))
(clojure.pprint/pprint
(topo-sort cycle-test-graph))
;; nil
To recap, here’s the source for the whole thing:
(ns nerdwick.graph
(:require [clojure.set :as s]))
(defn sources
"Return the nodes of the graph that have no incoming edges"
[{:keys [nodes edges]}]
(set (apply dissoc nodes (map last edges))))
(defn remove-node
"Remove a node from a graph, along with the edges meeting it"
[{:keys [nodes edges]} [k _]]
{:nodes (dissoc nodes k)
:edges (into #{} (remove #((set %) k) edges))})
(defn kahn-step
"Perform a step of the Kahn algorithm on result list l, sources s and graph g"
[[l s g]]
(let [node (first s)
g' (remove-node g node)
l' (if (nil? node) l (conj l node))
s' (-> (s/difference (sources g') (set s))
(into (rest s))
sort)]
[l' s' g']))
(defn topo-sort
"Perform a topological sort of the graph g"
[g]
(let [[l _ g'] (->> (rest (iterate kahn-step [[] [] g]))
(drop-while #(seq (second %)))
first)]
(if (seq (:edges g'))
nil
l)))