The following algorithm to compute a minimum spanning tree is quite simple and quite interesting. It can be done by a person as well as by a computer.

Consider some nodes randomly placed that we wish to connect together with the least expenditure of material. The nodes might represent lots of different items. Perhaps internet devices that we wish to attach together to a single line.

The image on the left shows 15 nodes randomly scattered. Each node is labeled by a letter A-Z and they are different colors. On the right the same nodes are connected by yellow lines and the nodes themselves are yellow as well.

The 14 lines form a minimum spanning tree. That is, you cannot connect the nodes with less cable, string, pipe, or whatever.

The process is simple and pretty intuitive.

- Find the 2 (unconnected) nodes that are closest to each other.
- Connect them with a link unless the link would form a loop.
- Repeat with the next 2 closest nodes. Stop when you are unable to not form a loop.

Obviously, the second condition is a little tricky. The way to do it easily is to assign each node to a group. All nodes reachable from each other belong to the same group. Watch the little video.

Animation. Return with the browser's Back button)

In this gif movie, groups are shown as colors. When each link is set, the two groups are merged into one. Now the rule for avoiding loops is easy. "Don't link two nodes that belong to the same group".

Initially, each node is in a group by itself. With each link, one group falls away and the process is done when there is just one group remaining. So it's easy to see that in a graph of N nodes we will finish with N-1 links, since we lose a group with each link created.

As with the other projects in this group, the sarg, camera and pgcon modules are described in the Visualization with Pygame If you haven't already done so, you may want to read about them first.

The program code below is available as minspan.py. The whole package can be downloaded as a zip file.

The nodes are represented by *Minnode* instances and defined with

06 class Minnode : 07 def __init__ (self, pos, label, group) : 08 self.pos = pos 09 self.label = label 10 self.group = group 11 12 def __str__ (self) : 13 return "<%s at %s in group %d>" % (self.label, self.pos, self.group) 14 15 __repr__ = __str__ 16

The attributes of a node are simply its position, label (uppercase letter) and the group (an integer) that it is part of. By overriding both the *__str__* and *__repr__* methods a nicer description of the node is produced for printing.

The class *Minspan* is a singleton. The code for it follows:

17 class Minspan : 18 def __init__ (self, pgcon, pixels, nNodes) : 19 letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 20 self.pgcon = pgcon # pgame interface 21 self.nodes = [Minnode(randpos(20,pixels-20), letters[i], i+1) 22 for i in range(nNodes)] 23 self.plinks = [] # potential links between nodes 24 self.pixels = pixels 25 for m in range(nNodes) : 26 for n in range(m+1,nNodes) : 27 x = self.nodes[m] 28 y = self.nodes[n] 29 dist = euclDist2(x.pos,y.pos) 30 self.plinks.append((dist,x,y)) 31 self.plinks.sort() 32 self.links = []

Attributes consist of a connection to our *pgcon* modules, the size of our square window in pixels and the number of nodes to produce. The nodes are produced using a list comprehension in line 21, assigning x and y position at random within the window and assigning the labels to upper case A,B,C... .

A list of *plinks* (potential links) is built in the double *for* loop (line 25). Each potential link is a tuple containing references to two nodes and a distance (actually distance squared) between them. When *plinks* is sorted our list is nicely in order of shortest to longest links.

Since every node can link to every other that we would have N*(N-1) potential links. But a link between A and B is the same as one from B to A. The little *m+1* in line 26 keeps us from generating such duplicates or even trying to link a node to itself.

The list *links* starts out empty and is populated from the *plinks* using the rules above. While the (N-1) links are found from the N*(N-1)/2 plinks the animation is created. There are 2 frames for each link made, one (line 43) showing which 2 groups are to be merged and the other (line 48) after the merge has taken place.

33 34 def linkNodes (self) : 35 nLinks = 0 36 while self.plinks : 37 plink = self.plinks.pop(0) 38 dst, nodA, nodB = plink 39 if nodA.group != nodB.group : 40 self.links.append((nodA,nodB)) 41 masterGroup = nodA.group 42 slaveGroup = nodB.group 43 self.display("Will link %s->%s" % (nodA.label, nodB.label)) 44 self.links[-1] = (nodA,nodB) 45 for node in self.nodes : 46 if node.group == slaveGroup : 47 node.group = masterGroup 48 self.display("Grouped %s with %s" % (slaveGroup,masterGroup)) 49 nLinks += 1 50 return nLinks 51

The last method of the *Minspan* class creates the display itself

52 def display (self, banner) : 53 pgcon = self.pgcon 54 pgcon.newScreen() 55 for a,b in self.links : 56 if a.group != b.group : color = WHITE 57 else : color = colors[a.group%7] 58 pgcon.lineDraw(color, a.pos, b.pos, 1) 59 for node in self.nodes : 60 color = colors[node.group%7] 61 pgcon.textDraw(color, node.pos, node.label) 62 pgcon.textDraw(WHITE, (self.pixels/2,10), banner) 63 pgcon.camera.armed = 1 64 pgcon.writeScreen(.500) 65

Each call to the *display* method creates one frame in the animation. The interaction with the *pgcon* and *camera* modules is the same as in the kmeans project. First the links are drawn in the color of their group (or white if not yet assigned). Then the labels of the nodes are drawn on top as text. The camera is armed in case png images are to be created, either for their own use or later compacted into a gif file (or both). A 1/2 second wait between frames lets us keep up with the action.

Finally, 2 small helper functions generate random x/y positions on the screen for placement of nodes and calculate the Euclidean distance for comparison of link lengths. Actually, since these distance are only used for the comparison, we don't need to take the square root to calculate the actual distance. The distance squared is just fine.

66 def euclDist2(a,b) : 67 return (a[0]-b[0])**2 + (a[1]-b[1])**2 68 69 def randpos(low,high) : 70 from random import randint 71 return (randint(low,high), randint(low,high)) 72

This starts to get important as the number of nodes increases. At 20 nodes the potential links come to 190 (20*19/2). But if it's 1000 nodes we need 499500 potential links. Saving 499500 square root calculations is no longer so trivial.

This is an important point. How time and memory are consumed as the problem size grows is a very important part of algorithm design. In this case, the dominating factor is the time needed to compute the distances between all the vertices. It is roughly the square of the number of vertices.

One more point. How can we be sure that what we end up with really is a minimum tree. Although not a formal proof here is one way to get some intuition on this. Take the final solution as generated and try to improve it. Remove a link to make the one final group into two. Attempt to reconnect the two groups with a shorter link somewhere else. You can't because we carefully consider potential links from shortest to longest. If such a shorter link actually existed we would have used it instead.

And one last point. The graph that is generated is a tree - in the computer science sense where trees grow upside down. In your imagination pick up *any* vertex and raise it off the page. Pretend that the vertices are a bit sticky but are slowly pulled up by the edges connecting them, following in turn the vertex you chose. The vertex you chose is the root of the tree and, indeed, the tree is correctly upside down.

At the bottom of *minspan.py* main is called. *sarg* is used to get parameters from the command line or to supply defaults. A *Minspan* instance *ms* is created, nodes are created, links are made generating the graphic display.

73 def main() : 74 from pgcon import Pgcon 75 pgcon = Pgcon() 76 seed = sarg.Int("seed", 0) 77 if seed : random.seed(seed) 78 else : 79 seed = random.randint(101,999) 80 print "Seed is", seed 81 random.seed(seed) 82 pixels = sarg.Int("pixels",600) 83 nNodes = sarg.Int("nodes", 20) 84 ms = Minspan(pgcon, pixels, nNodes) 85 nLinks = ms.linkNodes() 86 print "Number of links", nLinks 87 pgcon.close() 88 89 if __name__ == "__main__" : main()

The animations above were made with the following dialog in the terminal window

$python minspan.py nodes=15 seed=854 maxpics=50 prefix=mns pixels=300 gif=1 Just captured frame: 1, mns_001.png Just captured frame: 2, mns_002.png Just captured frame: 3, mns_003.png . . Just captured frame: 27, mns_027.png Just captured frame: 28, mns_028.png Number of links 14 Hit Return to exit Making uniform GIF mns.gif has 28 frames

But simply "$python minspan.py" will generate even more interesting and random displays. The arguments on the command line set the random number *seed* to get repeatable sequences, *nodes*, and *pixels* for size. Arguments *gif, maxpics, prefix* control the *pgcon* and *camera* to generate images and a gif file. These work in exactly the same way as in the kmeans project