I have been studying the topic of graphs in python, and I came across Topological-Sort which is a graph ordering method; but I still don't fully understand how to perform this method in python, I know the theory of this ordering and how it should work but I don't know how to apply it to python, for example:
We have a degree of 6 nodes, whose unions are the following: (5,2),(5,0),(4,0),(4,1),(2,3),(3,1).
With this we can know the 'Weight' of each node that would be:
weight of 5 -> 0
weight of 4 -> 0
weight of 3 -> 1
weight of 2 -> 1
weight of > 1 -> 1
weight of 0 -> 2
then the order from smallest to largest would be: 5 4 2 3 1 0
My doubt is how the python codes that do this sorting work, since I want to implement one but I'm not sure how to do it. Here is some code that I found:
#Python program to print topological sorting of a DAG
from collections import defaultdict
#Class to represent a graph
class Graph:
def __init__(self,vertices):
self.graph = defaultdict(list) #dictionary containing adjacency List
self.V = vertices #No. of vertices
# function to add an edge to graph
def addEdge(self,u,v):
self.graph[u].append(v)
# A recursive function used by topologicalSort
def topologicalSortUtil(self,v,visited,stack):
# Mark the current node as visited.
visited[v] = True
# Recur for all the vertices adjacent to this vertex
for i in self.graph[v]:
if visited[i] == False:
self.topologicalSortUtil(i,visited,stack)
# Push current vertex to stack which stores result
stack.insert(0,v)
# The function to do Topological Sort. It uses recursive
# topologicalSortUtil()
def topologicalSort(self):
# Mark all the vertices as not visited
visited = [False]*self.V
stack =[]
# Call the recursive helper function to store Topological
# Sort starting from all vertices one by one
for i in range(self.V):
if visited[i] == False:
self.topologicalSortUtil(i,visited,stack)
# Print contents of stack
print stack
reference: topological-sort-code
I'm just asking for a little guide on how this code works so I can implement my own code
What the algorithm supposedly does is explained by the very page you referenced. I quote verbatim (translating):
There's a keyword here that you need to understand for it all to make sense, and that is recursively . This means that the same function you call to sort a graph starting from a given vertex
v
, is also called to sort sub-graphs starting at each of the neighbors ofv
. That is, the function calls itself. Although it can be difficult to keep track of the order in which the calls occur in your head, this mechanism (recursion) nonetheless provides a simpler way of describing how the problem should be solved.Now let's take a closer look at the code.
data structure
The graph is saved in a couple of class variables:
V
, which is a number indicating how many vertices the graph hasgraph
which is a dictionary whose keys are the "names" (numbers) of the vertices and the values are the list of adjacent vertices of each of them.So, for example, for the graph
(5,2),(5,0),(4,0),(4,1),(2,3),(3,1)
you mention in your statement, which would look like this:the aforementioned variables would be:
V
= 6graph={0: [], 1: [], 2: [3], 3: [1], 4: [0, 1], 5: [2, 0]}
The dictionary shows that the node
0
has no neighbors (remember that it is a directed graph, so the neighbors would be the ones pointed by arrows coming out of that node). The same thing happens to node 1. Node 2 has one neighbor, which is 3. Node 3 has one, which is 1, node 4 has two (0 and 1), and node 5 also has two (2 and 0). This structure therefore correctly represents the graph in the figure.The mission of topological sorting is to display the nodes in an order such that if there is an arrow between nodes A and B, then A appears before B in the final sorting.
The "stack" used by the algorithm has this mission. A new node is only pushed onto the stack if all its neighbors, and their neighbors, were already on the stack.
For example, imagine that the stack is empty and we are considering whether to push node 2 onto it. The answer will be "NO", because 2 has a neighbor (3) that is not on the stack. However, if we ask ourselves if node 1 should be inserted, the answer would be "YES", because node 1 has no neighbors and therefore can go to the stack without further ado.
Now imagine that the stack has one element, say 3, and only that one. Could item 2 now go on the stack? The answer is again "NO" because although its neighbor (the 2) is on the stack, the neighbors of the 2 are not (in reality this situation could not occur since the 2 should never have entered the stack before, until that is not 1).
The recursive algorithm
As you can see, it is difficult to decide in what order to iterate through the elements so that they are saved on the stack in the correct order.
The trick is, I go through them in any order (for example, in numerical order, from 0 to 5), but I keep in another variable which nodes I have already examined. For each node, if I haven't examined it, I mark it as examined and call the same function to push all its neighbors onto the stack, and then push the node in question.
That is, in pseudocode, given a node,
v
the functionOrdenar
would do the following:v
that has not been visited, it calls back toOrdenar()
, passing that neighbor to it.v
Execution example
It seems almost miraculous that such a simple algorithm can work, but it does. It's the magic of recursion, it makes it easy to write the algorithm, but difficult to imagine its execution. Go for it.
You have to call
Ordenar()
for each node in the graph, and let the recursion do its magic. At the end, print the stack.We start with node 0. I mark it as visited. Since it has no neighbors, there is no need to recursively call
Ordenar()
. We put it on the stack.We go to node 1. I mark it visited. Since it also has no neighbors, there are no recursive calls. To the stack!
Let's go to node 2. I mark it visited. This node (see the figure) has a neighbor, 3, which has not yet been visited. So I call back to
Ordenar
now passing the 3 to it. Only when this second call toOrdenar()
has returned will we continue on the original call and push 2 on the stack.Second call from
Ordenar()
, receives 3. Marks it as visited. It looks at the list of neighbors for 3, and we see 1, but 1 has already been marked as visited, so it doesn't have to call againOrdenar()
. We put the 3 on the stack.and we return
The recursive call finished, we push the 2 on the stack
We go to node 3, but it is already marked as visited. There's nothing to do
We go to node 4, mark it as visited. Its neighbor list is 0 and 1, and both have been visited, so there is no recursive call. We put the 4 on the stack
We go to node 5, we mark it as visited. Its list of neighbors is 2 and 0, both already visited. There is no recursive call, and we push 5 on the stack.
We are done and we have:
It is enough to print that stack in that order, which will fulfill the sought property.
About implementation
In the python implementation you linked to, what I've called
Ordenar()
would betopologicalSortUtil()
, while the "main program" that loops through nodes 0 to 5 and callsOrdenar()
would betopologicalSort()
.The code in question is cumbersome to read because
topologicalSortUtil()
it is passed not only the vertex being considered, but also the visited list and the stack itself to be updated.It seems to me that this implementation can be greatly simplified by taking the visited list and the stack out of object variables (eg:
self.visited
andself.stack
, so that they would no longer have to be passed as parameters, but would be "shared" between the different calls toOrdenar()
, recursive or not.Update
Below is a minimalist implementation of what was explained above. I have simplified many things regarding the implementation of the web you linked. The most important:
visitados
as superfluous, since we can still tell if a node has already been seen if it appears on the stack.The code is now as simple as this, where the operation of the algorithm is much more clearly seen as explained above.