Linked Lists and Binary Trees

Introduction

Collections of data in Python may be held in many ways. But depending on what we want to do with a collection, some approaches are better than others.

In this project we shall look briefly at simple lists, linked lists and finally binary trees. In each case we want to be able to do the following operations efficiently.

Insert a new data item (value) into the collection.
See if a given value is in the collection.
Remove a value from the collection.

It usually helps if the collection is sorted so that searching can be optimized. In general, searching for items is much more frequent that inserting or removing items.

To keep things simple we will assume that our collections are sorted. That means we must be able to compare items with the operators less than, equal to, and greater than.

Additionally, to keep things simple we will use examples where

Data values are simple integers
Values are unique.
The lists and trees are always sorted.

Simple Python Built-in Lists

The easiest way to make a collection a list is to simply use the built-in Python list. This collection integers is sorted.

>>> collection = [3,6,9,17]
>>> collection
[3, 6, 9, 17]

We can insert a new value with the append method

>>> collection.append(25)
>>> collection
[3, 6, 9, 17, 25]

We can check to see if a given item is in the collection.

>>> 9 in collection
True
>>> 8 in collection
False

And if Python did not have the in operator, we could create a function like the following. Notice that this function is recursive. It could also be written with a while or for loop. But this version can serve as an introduction to more complex structures later.

>>> def contains(collection, value) :
...     if not collection           : return False    # collection empty
...     elif value == collection[0] : return True     # value is in the first slot
...     else : return contains(collection[1:], value) # value is somewhere further?
...
>>> contains(collection, 17)
True
>>> contains(collection, 10)
False

Or, if we have know the list is sorted we can stop early

>>> def contains(collection, value) :
...     if not collection           : return False
...     elif value == collection[0] : return True
...     elif value <  collection[0] : return False     # only if sorted
...     else : return contains(collection[1:], value)

To remove a value we can splice the list around it. Here we shall use a for loop to find the split point.

>>> def remove(collection, value) :
...     for i in range(len(collection)) :
...        if value == collection[i] :
...            return collection[0:i]+collection[i+1:]  # remove and rejoin

Linked Lists

Our linked lists will be built with nodes containing both a value and a pointer to the next node.

We can make our linked list using a simple 2 element Python list for each node. The first element of a node is its value and second element is the remaining nodes. The last node in the linked list will have the value None in the second element. Here is an example.

>>> from lists import sampleList, showList, contains
>>> print sampleList
[10, [20, [30, [40, None]]]]
>>> print showList(sampleList)
10 --> 20 --> 30 --> 40
>>> print contains(sampleList, 25)
False

Here is a function contains() to find if a value is in a linked list.

15 def contains(l, value) :
16     if   l == None     : return False
17     elif l[0] == value : return True
18     else               : return contains(l[1],value)

>>> print contains(sampleList, 30)
True

The traditional way to insert a node into a sorted list is to manipulate the pointers. There are 3 possible situations. Will the new node go at the beginning of the list, at the end of the list or somewhere in the middle? Here is some code from lists.py

20 def insert1 (nod0, value) :
21     if   nod0   == None  : return [value, None]
22     elif nod0[0] > value : return [value, nod0[1]] # insert at begining
23     else :
24         nod = nod0
25         while nod[1] :
26             if nod[1][0] >= value  :   # next node in the chain?
27                 ins = (value, nod[1])  # new node built
28                 nod[1] = ins           # squeeze it in
29                 break
30             else : nod = nod[1]
31         if nod[1] == None : nod[1] = (value, None) # becomes the tail
32         return nod0

Using a more functional approach makes the code much tighter with just a single line for each of the three conditions. It is a bit trickier though.

It is important that you understand what is happening in line 37. It replaces lines 23-32 in the first version. In successive calls from line 37 the list is traversed until the insertion point is found. Then a new sub-list is made headed by the new value and followed by the rest of the original list. And finally, all the nodes that preceded the insertion point copied back to the beginning. What we then have is a new list that share a tail of nodes with the original. Take your time.

34 def insert2(l, value) :
35     if   l == None    : return [value, None]
36     elif l[0] > value : return [value, l]
37     else              : return [l[0], insert2(l[1], value)]

If we were to print the successive return values they would look like this:

Input value to either insert or remove: 35

[35, [40, None]]
[30, [35, [40, None]]]
[20, [30, [35, [40, None]]]]
[15, [20, [30, [35, [40, None]]]]]
[10, [15, [20, [30, [35, [40, None]]]]]]

10 --> 15 --> 20 --> 30 --> 35 --> 40

Deleting a node can also be done two ways.

The first approach in deleting node N is to modify the previous node’s pointer to skip over N. In lines 40 and 41 of delete1() the special cases (first and last) nodes are handled.

39 def delete1 (nod0, value) :
40     if   nod0    == None  : return None
41     elif nod0[0] == value : return nod0[1]  # if first node, just skip
42     else :
43         nod = nod0
44         while nod[1] :
45             if nod[1][0] == value  :  # next node in the chain?
46                 nod[1] = nod[1][1]    # skip the next
47                 break
48             else : nod = nod[1]
49         return nod0

delete2() is a simpler approac. It basically builds a new list with the remaining nodes still in the proper order. If we know the the values are unique and sorted then the resulting new linked list can actually just merge with the original past the point of deletion. This is very similar to insert2() above.

51 def delete2(l, value) :
52     if   l    == None  : return None
53     elif l[0] != value : return [l[0], delete2(l[1],value)]
54     else               : return delete2(l[1], value)

The function showList() prints the list with the node values are seperated by "-->", as in the example above.

07 def showList(l, accum="") :
08     if l == None : return accum
09     else :
10         if accum : sep = " --> "
11         else     : sep = ""
12         accum += sep + str(l[0])
13         return showList(l[1], accum)

The module lists.py can also be run stand-alone to experiment with the recursive versions of insert and delete

$ python lists.py
10 --> 20 --> 30 --> 40
Input value to either insert or remove: 25
10 --> 20 --> 25 --> 30 --> 40
Input value to either insert or remove: 25
10 --> 20 --> 30 --> 40
Input value to either insert or remove: 10
20 --> 30 --> 40

Tuples for linked lists

With the functional versions of insert(), contains() and delete(), we no longer must modify existing nodes. The 2 element lists can be replaced with 2 element tuples. This is a good idea for two reasons. One, tuples use much less memory than lists and two, not being able to modify nodes is a very good thing. Making modifications to list structures can lead to subtle bugs, especially if lists are shared by concurrent threads.

The python programs lists.py and tuples.py can be viewed seperately. They are quite similar.

The zip file. downloads the entire package

Binary Trees

Binary trees are basically two dimensional linked lists. Each node has a value and pointers to two sub-trees, one to the left and one the right. Both sub-trees may either be the value None or be the root node to another binary tree. The left subtree contains only nodes whose values are less than the value of the parent node. The right subtree contains only nodes whose values are greater. As this is true for every node, the whole network of nodes is sorted.

As we did above, we will use the node’s value to identify it. So "node 70" is the node containing the value of 70. Of course, just like in linked lists, values in the real world don't have to be just integers. They can take any form.

A binary tree is a recursive data structure. Here is an example. Notice that in every left subtree values are less than any nodes above and those on the right are greater.

You can see how nodes 20 and 70 have no right hand subtree and how nodes 10, 40, 60 and 90 have no subtrees at all.

The tree is balanced. Although there are nine nodes in the tree each node can be reached from the root in three steps or less.

The python representation of this tree is the following.

(50, (30, (20, (10, None, None), None), (40, None, None)),
     (80, (70, (60, None, None), None), (90, None, None)))

As with linked lists, we want to be able to do the following:

find if the tree contains a value
insert a new node and its value
delete a node value.

The code for finding a value is simple. Function contains() starts at the root node and traverses the tree with each step either to the left or right. If it lands on the node with the target value, it returns True. Reaching the end returns False. The final value is passed back through the series of returns and finally pops out at the top.

34 def contains(tree, target) :
35     if tree == None : return False
36     (value,left,right) = tree
37     if  value == target : return True
38     elif value > target : return contains(left,target)
39     else                : return contains(right,target)
40

And here it is in action…

>>> from bintree import contains, mytree
>>> print(mytree)
(50, (30, (20, (10, None, None), None), (40, None, None)),(80, (70, (60, None, None), None), (90, None, None)))

>>> contains(mytree, 60)
True
>>> contains(mytree, 65)
False

Inserting a new node

Inserting a new node somewhere in the tree requires first finding the right place for the insertion. Then we build a linked list of new nodes taking us back to a new root and keeping as much of the original as we can. With each return using lines 25,27,28,30,32 we create a new node linked to those below it.

24 def insert(tree, newValue) :
25     if not tree : return (newValue,None,None)
26     (value,left,right) = tree
27     if value == None     : return (newValue,None,None)
28     if value == newValue : return tree  # already there
29     elif newValue < value :
30         return (value, insert(left,newValue), right)
31     else :
32         return (value, left, insert(right,newValue))
33

Here is a diagram of the results of adding a value of 65. The nodes in yellow are from the original tree and you could check this by looking at their id's. The red nodes are new and lead back to the top to the tree. The original tree continues to exist. The new tree has a new root but merges with the original where it can.

So, we see that the original tree is still intact and referenced by oldtree

>>> mytree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))
>>>
>>> oldtree = mytree
>>> from bintree import insert
>>>
>>>
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, (65, None, None)), None), (90, None, None)))
>>> oldtree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))

Deleting nodes

Deleting a node is somewhat more complex. The following shows the effect of deleteing node 40.

Basically, when the root of a subtree is deleted, one of other nodes must take its place. The new root node must maintain the conditions. The minimum node in the right subtree works fine since its value is still larger than anything in the left subtree and smaller than any other node on the right. An even simpler process can be used if either subtree is empty.

If we have found the node to delete (line 44) and if both left and right branches are present then we find the minimum value on the right side using the function minval. A new node is created (line 48) and a new branch built (in red) back to the top.

If either the left or right branch is empty then the other branch can simply take the place of the node deleted (lines 49,50). Lines 51 through 54 traverse down the tree to find the node to delete. Here is the code.

41 
42 
43     (value, left, right) = tree
44     if target == value :  # delete this node
45         if left and right :
46             swp = minval(right)
47             #print("Swap val", swp)
48             return (swp, left, delete(right,swp))
49         if left  == None : return right
50         if right == None : return left
51     elif target < value :
52         return (value, delete(left,target), right)
53     elif target > value :
54         return (value, left, delete(right,target))

57 def minval(tree) :
58     (val,left,right) = tree
59     if left == None : return val
60     else : return minval(left)
61

Balancing a tree

An optimumly balanced tree can be very time efficient. In a balanced tree with 1024 nodes every node is within 10 steps of the root (2¹⁰ is 1024). Deletes and Inserts make new trees by only building new paths of length 10 of 11. But this efficiency can break down if the sequence of inserts is not random.

In the next picture 2 nodes have been inserted. They leave the tree somewhat out of balance. Instead of all nodes being within 3 step from the root, node 62 is five steps away

After rebalancing, all nodes are again within 3 steps from the root.

Below are the functions to make this happen. The idea is to convert the tree into a sorted Python list. This is done in values by simply traversing the tree left to right recursively. When the list is complete the optimum root for the binary tree is in the middle of the list. Take values to the left of our root and create a sub-tree for its left link. The values to the right create a sub-tree for its right link. Do this recursively and a balanced tree simply unfolds.

The function values builds the sorted Python list. Sorting happens automatically by the way the tree is traversed. Function balance2 is also recursive and a sub-root node is inserted at each iteration.

07
08 def values(tree) :
09     if not tree : return []
10     (value,left,right) = tree
11     return values(left) + [value] + values(right)
12
13 def balance(tree) :
14     return balance2(None, values(tree))
15
16 def balance2(btree, sorted) :
17     if not sorted  : return btree
18     split = len(sorted)/2
19     btree = insert  (btree,sorted[split])     # midpoint
20     btree = balance2(btree,sorted[0:split])   # left side
21     btree = balance2(btree,sorted[split+1:])  # right side
22     return btree
23

If the number of nodes is a power of 2 then the tree will be perfectly balanced. Otherwise there will be a partially filled in row at the bottom from right to left.

Graphics for Binary Tree

The program test.py uses bintree.py and display.py to generate graphics similar to those above (created by an earlier pygame version).

display.py, in turn, uses John Zelle's graphics.py to create windows. You need to have Tkinter installed but you no longer need pygame. The python modules in the project and graphics.py are compatible with both Python 2.7 and Python 3. You may use either.

$ python test.py (or python3 test.py)

This should hould paint a window with the tree above, 3 buttons at the bottom and text entry for a number. The Add/Del button will either insert the number or delete it. The Balance button will rebalance the tree, and the Exit button is pretty obvious.

Below is the code for the display module.

The window is fairly large (600 pixels). There are global lists for buttons and for items temporarily drawn. A single entry box lets you input a number for a node to either add or delelte.

 1 # display.py
 2 #
 3 import sarg, random, math
 4 from button   import Button
 5 from graphics import *
 6 from sys      import maxsize
 7
 8 pixels  = sarg.Int("pixels", 600)
 9
10 win = GraphWin("Binary Trees",pixels,pixels)
11 drawn   = []
12 buttons = []
13 entry = Entry(Point(300, 540), 2)

The static items in the window are the 3 buttons near the bottom and the text entry box.

14
15 def drawStatic() :
16     buttons.append(Button(Point(150,570), "Balance"))
17     buttons.append(Button(Point(300,570), "Add/Del"))
18     buttons.append(Button(Point(450,570), "Exit"))
19     for but in buttons : but.draw(win)
20     entry.draw(win)
21     entry.setText("45")
22

The wait-for-user-input function. You may add a number to the entry box. Clicking the mouse within the rectanlge of a button will return the label of the button (line 29) or the value of the text entry. (line 28). Other clicks are ignored.

23 def waitForButton() : 
24    pnt = win.getMouse() 
25    val = entry.getText() 
26    for but in buttons : 
27        if but.inside(pnt) : 
28            if but.text == \"Add/Del\": return val 
29            else : return but.text 
30    return waitForButton() \# keep trying 
31

The functions below are pretty much boiler-plate and simple if you are familiar with the Zelle graphics module. (link needed)

32 def drawTree(tree, pos, width, stepdown, ids=None) :
33     for obj in drawn : obj.undraw()
34     drawNode(tree, pos, width, stepdown, ids)
35
36 def displayOff() :
37     win.close()
38
39 def drawText(color, x,y, text) :
40     txt = Text(Point(x,y), str(text))
41     txt.setFill(color)
42     txt.draw(win)
43     drawn.append(txt)
44
45 def drawLine(color, x,y, next_xy, width) :
46     x2,y2 = next_xy
47     p1 = Point(x,y); p2 = Point(x2,y2)
48     line = Line(p1,p2)
49     line.setFill(color)
50     line.draw(win)
51     drawn.append(line)
52

drawNode() is more interesting. It recursively draws the subtrees, top to down, left and right. Lines connecting nodes and the text of a node are drawn yellow if part of a previous tree (line 66) or red if newly created (67). ids is a list of Python ids to tag the previous nodes

 53 def drawNode(node, pos, width, stepdown, ids) :
 54     if node == None : return
 55     x,y = pos; value,left,right = node
 56     halfwidth = width/2
 57     nextR = (x+halfwidth,y+stepdown)
 58     if left :
 59         nextL = (x-halfwidth,y+stepdown)
 60         drawLine("white", x,y, nextL, width)
 61         drawNode(left , nextL, halfwidth, stepdown, ids)
 62     if right :
 63         nextR = (x+halfwidth,y+stepdown)
 64         drawLine("white", x,y, nextR, width)
 65         drawNode(right, nextR, halfwidth, stepdown, ids)
 66     if ids and id(node) in ids : color = "yellow"
 67     else                       : color = "red"
 68     drawText(color, x,y, str(value))

.

Download and unzip bintree.zip

The Wrapup

These basic algorithms for manipulating binary trees are fairly straight forward and quite fast and efficient. They find uses in many places.

Finally, when possible, recursive data structures should always be handled with recursive algorithms. Modern languages, like Python, make this fairly straight-forward. And, although there is a learning curve, it is definitely worth the effort.

If you have comments or suggestions You can email me at mail me

* * *