Minimum Spanning Tree


Introduction

The minimum spanning tree is an algorithm that is fairly simple but still quite interesting.

Consider some randomly located nodes that we wish to connect with the shortest length of line. The nodes might represent anything... towns, cities, municipal street lights, houses, or offices. And lines might represent roads, electical power lines, fiber-optic cables, water or sewer lines.

The images below show 15 randomly located nodes. Each node is labeled by a letter A-Z, and to start, all have different colors. We will connect them using the minimum length of lines.

On the left is an image of the starting conditions. The 2 closest nodes are L and M and they are about to be connected. They will then take on a common color to show that they belong to the same group.

On the right we have connected all of the nodes with the minimum length of lines. All the nodes and lines are tan because colors were propagated as we connected the other nodes. They all are in the same group.

Click here to see an animation depicting the changes to the graph as the algorithm progresses. Press [Play] to start the animation. You can control speed, pause, resume etc. Use the browser’s back button to return here.

The Algorithm

The process is fairly intuitive and if you watched the animation it should be quite clear.

  1. Find the 2 (unconnected) nodes that are closest to each other.
  2. Connect them with a link unless that link would form a loop.
  3. Repeat with the next 2 closest nodes. Stop when you reach a point where you are unable to add a link without forming a loop.

Step two is a bit tricky. One way to go about it is to place nodes into groups as they are connected.

The Code

The program minspan.py will run with either Python 2.7 or Python 3. We use the graphics.py that John Zelle created. (The previous version of this project used pygame and only ran on Python 2).

The program code below is available here as minspan.py. The whole package can be downloaded here as a zip file.

The code begins the import statements including the Zelle graphics we will use. Then it sets up a rainbow of colors and creates a window for our graphics output.

 4 import random, sys, sarg
 5 from   graphics import GraphWin, Text, Point, Line
 6
 7 dbg = False
 8 colors= ("red","green","blue","orange","yellow","purple")
 9 ncolors = len(colors)
10
11 win = GraphWin("Minimum Spanning Tree",400,400)
12

Minnode

Each node is implemented as Minnode instance. The node attributes are:

There are no methods for this class. The defining code for it follows.

13 class Minnode :
14     def __init__ (self, pos, label, group) :
15         self.pos = pos
16         self.label = label
17         self.group = group
18

Minspan

The class Minspan is a singleton.

Attributes consist of:

The nodes list is produced using a list comprehension.

A list of plinks (potential links) is built in the double for loop (27). Each potential link is a tuple containing references to two nodes and a distance (actually distance squared) between them. When plinks is sorted, our list is nicely in order of shortest to longest links.

Since every node can link to any other that we have N*(N-1) potential links. The links are bidirectional. The little m+1 (line 28) in the inner for loop keeps us from generating such duplicates and also prevents nodes from linking to themselves.

The initialization code for it follows.

19 class Minspan :
20     def __init__ (self, pixels, nNodes) :
21         letters    = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
22         self.drawn = []
23         self.nodes = [Minnode(randpos(50,pixels-50), letters[i], i+1)
24                                     for i in range(nNodes)]
25         self.plinks = []          # potential links between nodes
26         self.pixels = pixels
27         for m in range(nNodes) :
28             for n in range(m+1,nNodes) :
29                 x = self.nodes[m]
30                 y = self.nodes[n]
31                 dist = euclDist2(x.pos,y.pos)
32                 self.plinks.append((dist,m*n,x,y)) # m*n unique in list
33         self.plinks.sort()  # so that sort won't try to compare x and y
34         self.links = []
35

Class Minspan has two methods, linkNodes() and display().

Minspan.linkNodes

The list, links, starts out empty and is populated from plinks using the rules below. As the links are created, frames are displayed with calls to self.display(), forming a slide show.

36     def linkNodes (self) :
37         nLinks = 0
38         while self.plinks :
39             plink = self.plinks.pop(0)
40             dst, n, nodA, nodB = plink
41             if nodA.group != nodB.group :
42                 self.links.append((nodA,nodB))
43                 masterGroup = nodA.group
44                 slaveGroup  = nodB.group
45                 self.display("Will link %s->%s" % (nodA.label, nodB.label))
46                 self.links[-1] = (nodA,nodB)
47                 for node in self.nodes :
48                     if node.group == slaveGroup :
49                         node.group = masterGroup
50                 self.display("Grouped %s with %s" % (slaveGroup,masterGroup))
51                 nLinks += 1
52         return nLinks
53

Minspan.display()

Minspan.display() creates each frame of the display. The code is very similar to the other graphics projects using the Zelle graphics.py module. Text and Line instances are drawn in the window in various colors. Once you've seen the program run it should be quite straight-forward. Objects from a previous frame are erased (undraw) and then objects are created and drawn for the new frame. An attribute list "drawn" is maintained for each frame. Once a frame is drawn the program waits for the mouse to be clicked (by you) before continuing.

In line 55, objects that were drawn in the previous frame are erased (and discarded). New objects created for the current frame are remembered as they are drawn.(lines: 59, 70, 76). All links set are drawn with their group color. When a new link is chosen, it is initially painted white (line: 62).

Then all the labels for the nodes are drawn (line: 71).

The global dbg was used for testing changes during development.

            
54     def display (self,banner) :
55         for obj in self.drawn : obj.undraw()  # erase the board
56         self.drawn = []                       # done
57         if dbg : print("Banner %s" % banner)
58         txtBanner = Text(Point(200,20), banner)
59         txtBanner.draw(win)
60         self.drawn.append(txtBanner)
61         for a,b in self.links :
62             if a.group != b.group : color = "white"
63             else                  : color = colors[a.group%ncolors]
64             if dbg : print("line: %s from %s to %s" % (color, a.pos, b.pos))
65             p1 = Point(a.pos[0], a.pos[1])
66             p2 = Point(b.pos[0], b.pos[1])
67             line = Line(p1,p2)
68             line.setFill(color)
69             line.draw(win)
70             self.drawn.append(line)
71         for node in self.nodes :
72             color = colors[node.group%ncolors]
73             txtLabel = Text(Point(node.pos[0],node.pos[1]), node.label)
74             txtLabel.setFill(color)
75             txtLabel.draw(win)
76             self.drawn.append(txtLabel)
77             if dbg : print("text: %s %s at %s" % (color, node.label, node.pos))
78         if dbg : print()
79         win.getMouse()     # wait for the click
80

Helper functions

There are two small helper functions to:

euclDis2()

The function accepts 2 points and calculates the square of the positional difference along the x-axis plus the square of the positional difference along the y-axis. Strictly speaking this is not the distance between the two points. But since these quantities are used just for comparison with each other, using them as-is is just fine

This gets important as the number of nodes increases. At 20 nodes the potential links come to 190 (20*19/2). But with 1000 nodes we need 499,500 potential links. Saving 499,500 square-root calculations is no longer trivial.

In general, a critical part of algorithm design is dealing with time and memory consumption as the problem size grows. In this case, the dominating factor is the time needed to compute the distances between all the vertices.

81 def euclDist2(a,b) :
82     return (a[0]-b[0])**2 + (a[1]-b[1])**2
83

randpos()

 
84 def randpos(low,high) :
85     from random import randint
86     return (randint(low,high), randint(low,high))
87

Minimum Tree, really?

How can we be sure that what we end up with really is a minimum tree? Here is one way to at least get some intuition on this. Take the final solution as generated and try to improve it. Remove a link to make the one final group into two. Attempt to reconnect the two groups with a shorter link somewhere else. You can't because we carefully considered potential links from shortest to longest overall. If such a shorter link actually existed we would have used it earlier.

Running the program

At the bottom of the file, main() is defined. sarg is used to get parameters from the command line or to supply defaults. You can specify the window size (pixels), the number of nodes (nodes) and a seed (seed)

 88 def main()  :
 89     seed    = sarg.Int("seed", 0)
 90     if seed : random.seed(seed)
 91     else :
 92         seed = random.randint(101,999)
 93         print("Seed is %s" % seed)
 94         random.seed(seed)
 95     pixels  = sarg.Int("pixels",400)
 96     nNodes  = sarg.Int("nodes",   5)
 97     print("Click the mouse in the window to advance frames")
 98     ms = Minspan(pixels, nNodes)
 99     nLinks = ms.linkNodes()
100
101 if __name__ == "__main__" : main()
102

You can download the zip file for minspan here. The following program invocation recreates another nice example. Clicking the mouse in the graphic window will step through the process. Try also running the program without a “seed=” for some randomness.

$ python minspan.py pixels=400 nodes=15 seed=566

If you have comments or suggestions You can email me at mail me

Copyright © 2014-2021 Chris Meyers and Fred Obermann