Binary Trees


There are many ways to hold a collection of data in Python and most other modern languages. But depending on what we want to with the collection some approaches are much better than others.

In this project we'll look briefly at simple lists, linked lists and finally binary trees. In each case we want to be able to do the following efficiently.

  1. Insert a new data item (value) into the collection.
  2. See if a given value is in the collection.
  3. Remove a value from the collection.

It usually helps if the collection is sorted so that searching is optimized. Most of the time searching for items is much more frequent that inserting or removing items.

We're going mostly assume that the collection needs to be sorted. That means we must be able to compare items with the operators less than, equal to, and greater than.

For simplicity, we will also assume the following:

  1. Values in the collection are unique.
  2. Values will be simple integers. In a real situation the values might be pointers to data structures or records in a database

Simple Python List

The easiest way to make a collection is to simply use the built-in Python list.

>>> collection = [3,6,17,9]
>>> collection
[3, 6, 17, 9]

Here the data here is not sorted. We can insert a new value with the append method.

>>> collection.append(5)
>>> collection
[3, 6, 17, 9, 5]

The second requirement, to see if a given item is in the collection, is trivial.

>>> 9 in collection
>>> 8 in collection

And if Python didn't have the in operator, we could create function like the following. Notice that this is a recursive function. It could also be written with a while or for loop. But this recursion will hopefully lead into more complex structures later.

>>> def contains(collection, value) :
...     if not collection           : return False    # collection empty
...     elif value == collection[0] : return True     # value is in the first slot
...     else : return contains(collection[1:], value) # value is somewhere futher?
>>> contains(collection, 5)
>>> contains(collection, 10)
>>> def contains(collection, value) :
...     if not collection           : return False
...     elif value == collection[0] : return True
...     elif value <  collection[0] : return False     # only if sorted
...     else : return contains(collection[1:], value)

The third requirement to remove a value can be done by splicing around the value. Here we'll use a for loop to find the split point.

>>> def remove(collection, value) :
...     for i in range(len(collection)) :
...        if value == collection[i] :
...            return collection[0:i]+collection[i+1:]  # remove and rejoin

Linked Lists

Linked lists consists of nodes containing both a value and a pointer to the next node. Linked lists only make sense when they are sorted.

We can make nodes using a simple 2 element Python list for each node. The first element in the node is the value and second element points to the next node. In the last node the second element is the value None. Here is an example.

>>> import lists
>>> print lists.sampleList     # linked list is a nested
[10, [20, [30, [40, None]]]]

The Python function id provides a unique identifier for values, including structures like lists and dictionaries. It may well be the address in memory where the value is stored. Unique values are stored just once.

>>> a = 541
>>> id(a)         # shows value *541* resides in memory at 36300744
>>> b = a
>>> id(b)
36300744          # b == a since they point at the same value
>>> id(b), b
(36300744, 541)
>>> a = 542       # changing the value of *a*
>>> id(a), a
(36300768, 542)

Function lists.showLinks uses the id function to show the links. Notice that even the value None has a home at 9392928.

>>> l = lists.sampleList
>>> lists.showLinks(l)              # id of node, value, id of next node
140107872758600: 10 140107872758672
140107872758672: 20 140107872685088
140107872685088: 30 140107872683864
140107872683864: 40 9392928

Here is a function contains to walk a linked list

18 def contains(l, value) :
19     if   l == None     : return False
20     elif l[0] == value : return True
21     else               : return contains(l[1],value)

The traditional way to insert a node into a sorted list is manipulate the pointers. There are 3 possible situations. Will the node go at the beginning of the list, at the end of the list or somewhere in the middle.

23 def insert (nod0, value) :
24     if   nod0   == None  : return [value, None]    # new list created
25     elif nod0[0] > value : return [value, nod0[1]] # insert as first node
26     else :
27         nod = nod0
28         while nod[1] :
29             if nod[1][0] >= value  :   # next node in the chain?
30                 ins = (value, nod[1])  # build a new node and
31                 nod[1] = ins           # squeeze it in
32                 break                  #
33             else : nod = nod[1]
34         if nod[1] == None : nod[1] = (value, None) # becomes the tail
35         return nod0

Using a more functional approach makes the code much tighter with just a single line for each of the three conditions.

37 def insert2(l, value) :
38     if   l   == None  : return [value, None]
39     elif l[0] > value : return [value, l]
40     else              : return [l[0], insert2(l[1], value)]

It is critical that you understand what is happening in line 40. It replaces lines 26-35 in the earlier version. Take as long as you need.

The function lists.showList prints the list in a more natural way. The last 4 digits of each nodes id and the nodes value, seperated with a :. Here is the code followed by a sample run

10 def showList(l, accum="") :
11     if l == None : return accum
12     else :
13         if accum : sep = " --> "
14         else     : sep = ""
15         accum += sep + "%d:%s" % (id(l)%10000,l[0]) # include last 4 dig of id
16         return showList(l[1], accum)
>>> l1 = lists.sampleList
>>> print l1
[10, [20, [30, [40, None]]]]
>>> l2 = lists.insert2(l1, 25)
>>> print l2                    <<<<<<<<<<<<<<< need answer
>>> print lists.showList(l1)
7472:10 --> 8696:20 --> 2280:30 --> 2208:40
>>> print lists.showList(l2)
3728:10 --> 3440:20 --> 2792:25 --> 2280:30 --> 2208:40

Notice please. l1 and l2 share the same nodes after the insertion point (value 25). l2 has a new front-end. This sharing of "tails" will prove to be a key feature for the efficiency of binary trees (coming up)

There is a test function that is invoked like the following

>>> import lists
>>> lists.testFunc(lists.insert2)
<function insert2 at 0x7f5a2d3f9848>
[10, [20, [30, [40, None]]]]
7456:10 --> 7096:20 --> 3512:30 --> 2288:40

input value: 25
[10, [20, [25, [30, [40, None]]]]]
7672:10 --> 7528:20 --> 7600:25 --> 3512:30 --> 2288:40

Tuples for linked lists

With the functional versions of insert, contains and delete, we no longer modify existing nodes. The 2 element lists can be replaced with 2 element tuples. This is a good idea for two reasons. One, tuples use much less memory than lists and two, not being able to modify nodes is a very good thing. Making modifications to list structures can lead to subtle bugs, especially if lists are shared by concurrent processes.

The python programs and can be viewed seperately. They are quite similar.

zip file.

Binary Trees

Binary trees are like linked lists except that they are two dimensional. Each node has a value and two pointers to sub-trees, one left and one right. Both sub-trees can either be the value None or be a root node to another binary tree. The left subtree contains only nodes whose values are less than the parent node. The one on the right only nodes whose values are greater.

For convenience in this discussion, when referring to a node I will use its value to identify it. So "node 70" is the node containing the value of 70. Of course, just like in linked lists, values don't have to be just integers.

A binary tree is a recursive data structure. Here is an example. Notice that in every subtree values on the left are less than the parent above and those on the right are greater.


The python representation of this tree is the following.

(50, (30, (20, (10, None, None), None), (40, None, None)),
     (80, (70, (60, None, None), None), (90, None, None)))

You can see how nodes 20 and 70 have no right hand subtree and how nodes 10, 40, 60 and 90 have no subtrees at all.

The tree is balanced. Although there are nine nodes in the tree each node can be reached from the root in three steps or less.

As with linked lists, we want to be able to do the following; find if the tree contains a value, insert a new value and delete a value. Of course, I'm really refering to a node with that value.

The code for finding a value is simple and looks like the following.

34 def contains(tree, target) :
35     if tree == None : return False
36     (value,left,right) = tree
37     if  value == target : return True
38     elif value > target : return contains(left,target)
39     else                : return contains(right,target)

Function contains starts at the root node and traverses the tree with each step either to the left or right. If it lands on the node with the target value, it returns True. Reaching the end returns False. Either value is passed right back up through the series of returns popping out at the top.

>>> from bintree import contains
>>> mytree = (50, (30, (20, (10, None, None), None), (40, None, None)),
...         (80, (70, (60, None, None), None), (90, None, None)))
>>> contains(mytree, 60)
>>> contains(mytree, 65)

Inserting a new node

Inserting a new node requires finding the right place for the insertion and then building a linked list in the nested returns that mimics the original but in fact creates a new tree that will mostly merge with the old one.

24 def insert(tree, newValue) :
25     if not tree : return (newValue,None,None)
26     (value,left,right) = tree
27     if value == None     : return (newValue,None,None)
28     if value == newValue : return tree  # already there
29     elif newValue < value :
30         return (value, insert(left,newValue), right)
31     else :
32         return (value, left, insert(right,newValue))

With each return using lines 25,27,28,30,32 we create a new node linked to those below it. Here's is an example of inserting a value of 65.


The nodes in yellow are from the original tree and you can check this by looking at the id's. Only the red nodes leading from our new value back to the top are new nodes built with the values of the original returned through the recursion. As long as there is a pointer to the root of the original tree it continues to exist. The new tree has a new root and merges with the original where it can.

Now, starting again with the first tree, let's remove node 40. This requires new nodes for 30 and 50 with the yellow nodes still the original ones. Lines 30 and 32 above mimic line 26 in

>>> mytree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))
>>> oldtree = mytree
>>> from bintree import insert
>>> newtree = insert(oldtree, 65)
>>> newtree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, (65, None, None)), None), (90, None, None)))
>>> oldtree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))

So, no side-effects.

Deleting nodes

Deleting a node is somewhat more complex. The following shows the effect of deleteing node 40.


Basically, when the root of a subtree is deleted, one of other nodes must take its place. The new root node must maintain the conditions. The minimum node in the right subtree works fine. Why? Its value is still larger than anything in the left subtree and smaller than any other node on the right. An even simpler process can be used if either subtree is empty. Here is the code.

41 def delete(tree, target) :
42     if tree == None : return None
43     (value, left, right) = tree
44     if target == value :  # delete this node
45         if left and right :
46             swp = minval(right)
47             print "Swap val", swp
48             return (swp, left, delete(right,swp))
49         if left  == None : return right
50         if right == None : return left
51     elif target < value :
52         return (value, delete(left,target), right)
53     elif target > value :
54         return (value, left, delete(right,target))

57 def minval(tree) :
58     (val,left,right) = tree
59     if left == None : return val
60     else : return minval(left)

If we have found the node to delete (line 44) and if both left and right branches are present then we find the minimum value on the right side using the function minval. A new node is created (line 48) and a new branch built (in red) back to the top.

If either the left or right branch is empty then the other branch can simply take the place of the node deleted (lines 49,50). Lines 51 through 54 traverse down the tree to find the node to delete.

Balancing a tree

The optimum tree can be very time efficient. In a balanced tree with 1024 nodes every node is within 10 steps of the root. Deletes and Inserts also impact the tree by only rebuild new paths of this length. But this efficiency can be lost if the sequence of inserts is not random.

In the next picture 2 nodes have been inserted. They leave the tree somewhat out of balance. Instead of all nodes being within 3 step from the root, node 62 is five steps away


After rebalancing, all nodes are again within 3 steps from the root


Below are functions to make this happen. The idea is to convert the tree into a sorted Python list. The optimum root for the binary tree is in the middle of the list. Take values to the left of our root and create a sub-tree for its left link. The values to the right create a sub-tree for its right link. Do this recursively and tree simply unfolds.

The function values builds the sorted Python list. Sorting is automatic in how the tree is traversed recursively. Function balance2 is also recursive and a sub-root node is inserted at each iteration.

08 def values(tree) :
09     if not tree : return []
10     (value,left,right) = tree
11     return values(left) + [value] + values(right)
13 def balance(tree) :
14     return balance2(None, values(tree))
16 def balance2(btree, sorted) :
17     if not sorted  : return btree
18     split = len(sorted)/2
19     btree = insert  (btree,sorted[split])     # midpoint
20     btree = balance2(btree,sorted[0:split])   # left side
21     btree = balance2(btree,sorted[split+1:])  # right side
22     return btree

If the number of nodes is 2^n-1 for some number n then the tree will be perfectly balanced. Otherwise there will be a partially filled in row at the bottom from right to left. What change in line 18 will cause the row to be filled from left to right?

The graphics are produced from which in turn uses You can run if you have pygame installed. Download and unzip and type the following.

$ python
Type number or b or x: 25

Typing a number will insert or delete an existing node or insert a new one. Typing b will rebalance the tree. And x will exit.

The Wrapup

These basic algorithms for manipulating binary trees are fairly straight forward and quite fast and efficient. They find uses in many places.

Among other things, binary trees can work as indexes for Non SQL databases. Being able to hold multiple indexes into a database and multiple versions of the same index allows us to easily make changes and also to easily undo those changes by simply restoring an earlier pointer.

Finally, when possible, recursive data structures should always be handled with recursive algorithms. Modern languages, like Python, make this fairly straight-forward. And, although there is a learning curve, it is definitely worth the effort.

I once read a quote somewhere about functional programming that went something like

"I've never spent so much time writing so little code that does so much".

But with practice that "much time" becomes "just a little time".

If you have comments or suggestions You can email me at mail me

Copyright © 2019 Chris Meyers