Linked Lists and Binary Trees

Introduction

There are many ways to hold collections of data in Python and other modern languages. But depending on what we want to with a collection, some approaches are better than others.

In this project we'll look briefly at simple lists, linked lists and finally binary trees. In each case we want to be able to do the following efficiently.

  1. Insert a new data item (value) into the collection.
  2. See if a given value is in the collection.
  3. Remove a value from the collection.

It usually helps if the collection is sorted so that searching is optimized. Usually, searching for items is much more frequent that inserting or removing items.

To keep things simple, we will use examples with the following characteristics. To keep things simple we will assume that our collection are sorted. That means we must be able to compare items with the operators less than, equal to, and greater than.

Additionally, we will use examples where :

  1. Data values are simple integers
  2. Values are unique.
  3. The lists and trees are sorted.

Simple Python Built-in Lists

The easiest way to make a collection is to simply use the built-in Python list.

>>> collection = [3,6,9,17]
>>> collection
[3, 6, 9, 17]

The data conforms with the rules above. We can insert a new value with the append method.

>>> collection.append(25)
>>> collection
[3, 6, 17, 9, 25]

We can check to see if a given item is in the collection.

>>> 9 in collection
True
>>> 8 in collection
False

And if Python didn't have the in operator, we could create a function like the following. Notice that this function is recursiv. It could also be written with a while or for loop. But this version can serve as a guide into more complex structures later.

>>> def contains(collection, value) :
...     if not collection           : return False    # collection empty
...     elif value == collection[0] : return True     # value is in the first slot
...     else : return contains(collection[1:], value) # value is somewhere futher?
...
>>> contains(collection, 17)
True
>>> contains(collection, 10)
False

Or, since we know the list is in sort order we can stop early

>>> def contains(collection, value) :
...     if not collection           : return False
...     elif value == collection[0] : return True
...     elif value <  collection[0] : return False     # only if sorted
...     else : return contains(collection[1:], value)

To remove a value we can splice the list around it. Here we'll use a for loop to find the split point.

>>> def remove(collection, value) :
...     for i in range(len(collection)) :
...        if value == collection[i] :
...            return collection[0:i]+collection[i+1:]  # remove and rejoin

Linked Lists

Linked lists consists of nodes containing both a value and a pointer to the next node.

We can make nodes using a simple 2 element Python list for each node. The first element in the node is the value and second element will point to the next node. The last node in the linked list has the value None as a pointer. Here is an example.

>>> import links
>>> print lists.sampleList     # linked list is a nested structure
[10, [20, [30, [40, None]]]]
>>>

10 --> 20 --> 25 --> 30 --> 40   A more natural presentation

Here is a function contains to find if a value is in a linked list.

15 def contains(l, value) :
16     if   l == None     : return False
17     elif l[0] == value : return True
18     else               : return contains(l[1],value)

The traditional way to insert a node into a sorted list is manipulate the pointers. There are 3 possible situations. Will the new node go at the beginning of the list, at the end of the list or somewhere in the middle. Here is code from lists.py

20 def insert1 (nod0, value) :
21     if   nod0   == None  : return [value, None]
22     elif nod0[0] > value : return [value, nod0[1]] # insert at begining
23     else :
24         nod = nod0
25         while nod[1] :
26             if nod[1][0] >= value  :   # next node in the chain?
27                 ins = (value, nod[1])  # new node built
28                 nod[1] = ins           # squeeze it in
29                 break
30             else : nod = nod[1]
31         if nod[1] == None : nod[1] = (value, None) # becomes the tail
32         return nod0

Using a more functional approach makes the code much tighter with just a single line for each of the three conditions. It is a bit trickier though.

34 def insert2(l, value) :
35     if   l == None    : return [value, None]
36     elif l[0] > value : return [value, l]
37     else              : return [l[0], insert2(l[1], value)]

It is important that you understand what is happening in line 37. It replaces lines 23-32 in the first version. Take as long as you need. In successive calls from line 37 the list is traversed until the insertion point is found. Then a new sub-list is made heading by the new value followed by the rest of the original list. And finally, all the nodes that preceded the insertion point are tacked onto the beginning as nested returns take place at line 37.

If we were to print the successive return values they would look like this:

Input value to either insert or remove: 35

[35, [40, None]]
[30, [35, [40, None]]]
[20, [30, [35, [40, None]]]]
[15, [20, [30, [35, [40, None]]]]]
[10, [15, [20, [30, [35, [40, None]]]]]]

10 --> 15 --> 20 --> 30 --> 35 --> 40

Deleting a node can also be done two ways.

39 def delete1 (nod0, value) :
40     if   nod0    == None  : return None
41     elif nod0[0] == value : return nod0[1]  # if first node, just skip
42     else :
43         nod = nod0
44         while nod[1] :
45             if nod[1][0] == value  :  # next node in the chain?
46                 nod[1] = nod[1][1]    # skip the next
47                 break
48             else : nod = nod[1]
49         return nod0
50
51 def delete2(l, value) :
52     if   l    == None  : return None
53     elif l[0] != value : return [l[0], delete2(l[1],value)]
54     else               : return delete2(l[1], value)

The first approach in deleting node N is to modify the previous nodes pointer to skip over N. In lines 40 and 41 the special cases (first and last) nodes are handled.

Notice how much simpler delete2 is. It basically builds a new list with the remaining nodes still in the proper order. If we know the the values are unique and sorted then the resulting new linked list can actually just merge with the original past the point of deletion.

The function lists.showList prints the list with the node values are seperated by "-->"

07 def showList(l, accum="") :
08     if l == None : return accum
09     else :
10         if accum : sep = " --> "
11         else     : sep = ""
12         accum += sep + str(l[0])
13         return showList(l[1], accum)

The module lists.py can be run stand-alone to experiment with the recursive versions of insert and delete

$ python lists.py
10 --> 20 --> 30 --> 40
Input value to either insert or remove: 25
10 --> 20 --> 25 --> 30 --> 40
Input value to either insert or remove: 25
10 --> 20 --> 30 --> 40
Input value to either insert or remove: 10
20 --> 30 --> 40

Tuples for linked lists

With the functional versions of insert, contains and delete, we no longer modify existing nodes. The 2 element lists can be replaced with 2 element tuples. This is a good idea for two reasons. One, tuples use much less memory than lists and two, not being able to modify nodes is a very good thing. Making modifications to list structures can lead to subtle bugs, especially if lists are shared by concurrent threads

The python programs lists.py and tuples.py can be viewed seperately. They are quite similar.

zip file. downloads the entire package

Binary Trees

Binary trees are like linked lists except that they are two dimensional. Each node has a value and two pointers to sub-trees, one left and one right. Both sub-trees can either be the value None or be the root node to another binary tree. The left subtree contains only nodes whose values are less than the parent node. The one on the right only nodes whose values are greater. And this is true for every node which makes the whole thing sorted.

As we did above, when referring to a node I will use its value to identify it. So "node 70" is the node containing the value of 70. Of course, just like in linked lists, values don't have to be just integers. They can take any form.

A binary tree is a recursive data structure. Here is an example. Notice that in every subtree values on the left are less than its parent above and those on the right are greater.

images/tree-01.png

The python representation of this tree is the following.

(50, (30, (20, (10, None, None), None), (40, None, None)),
     (80, (70, (60, None, None), None), (90, None, None)))

You can see how nodes 20 and 70 have no right hand subtree and how nodes 10, 40, 60 and 90 have no subtrees at all.

The tree is balanced. Although there are nine nodes in the tree each node can be reached from the root in three steps or less.

As with linked lists, we want to be able to do the following:
  1. find if the tree contains a value
  2. insert a new node and its value
  3. delete a node value.

The code for finding a value is simple and looks like the following.

34 def contains(tree, target) :
35     if tree == None : return False
36     (value,left,right) = tree
37     if  value == target : return True
38     elif value > target : return contains(left,target)
39     else                : return contains(right,target)
40

Function contains starts at the root node and traverses the tree with each step either to the left or right. If it lands on the node with the target value, it returns True. Reaching the end returns False. The final value is passed back through the series of returns and finally pops out at the top.

>>> from bintree import contains, mytree
>>> print mytree
(50, (30, (20, (10, None, None), None), (40, None, None)),(80, (70, (60, None, None), None), (90, None, None)))

>>> contains(mytree, 60)
True
>>> contains(mytree, 65)
False

Inserting a new node

Inserting a new node somewhere in the tree requires finding the right place for the insertion and then building a linked list in the nested returns that mimics the original but in fact creates a new tree with its own root. that will mostly merge with the old one.

24 def insert(tree, newValue) :
25     if not tree : return (newValue,None,None)
26     (value,left,right) = tree
27     if value == None     : return (newValue,None,None)
28     if value == newValue : return tree  # already there
29     elif newValue < value :
30         return (value, insert(left,newValue), right)
31     else :
32         return (value, left, insert(right,newValue))
33

With each return using lines 25,27,28,30,32 we create a new node linked to those below it. Here's is an example of inserting a value of 65.

images/tree-02.png

The nodes in yellow are from the original tree and you could check this by looking at the id's. Only the red nodes leading from our new value back to the top are new nodes built with the values of the original returned through the recursion. As long as there is a reference to the root of the original tree it continues to exist. The new tree has a new root and merges with the original where it can.

Now, starting again with the first tree, let's remove node 40. This requires new nodes for 30 and 50 with the yellow nodes still the original ones. Lines 30 and 32 above mimic line 26 in tuples.py

>>> mytree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))
>>>
>>> oldtree = mytree
>>> from bintree import insert
>>> newtree = insert(oldtree, 65)
>>> newtree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, (65, None, None)), None), (90, None, None)))
>>> oldtree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))

So, we see that the original tree is still intact and referenced by oldtree

Deleting nodes

Deleting a node is somewhat more complex. The following shows the effect of deleteing node 40.

images/tree-03.png

Basically, when the root of a subtree is deleted, one of other nodes must take its place. The new root node must maintain the conditions. The minimum node in the right subtree works fine. Why? Its value is still larger than anything in the left subtree and smaller than any other node on the right. An even simpler process can be used if either subtree is empty. Here is the code.

41 def delete(tree, target) :
42     if tree == None : return None
43     (value, left, right) = tree
44     if target == value :  # delete this node
45         if left and right :
46             swp = minval(right)
47             print "Swap val", swp
48             return (swp, left, delete(right,swp))
49         if left  == None : return right
50         if right == None : return left
51     elif target < value :
52         return (value, delete(left,target), right)
53     elif target > value :
54         return (value, left, delete(right,target))

57 def minval(tree) :
58     (val,left,right) = tree
59     if left == None : return val
60     else : return minval(left)
61

If we have found the node to delete (line 44) and if both left and right branches are present then we find the minimum value on the right side using the function minval. A new node is created (line 48) and a new branch built (in red) back to the top.

If either the left or right branch is empty then the other branch can simply take the place of the node deleted (lines 49,50). Lines 51 through 54 traverse down the tree to find the node to delete.

Balancing a tree

An optimumly balanced tree can be very time efficient. In a balanced tree with 1024 nodes every node is within 10 steps of the root since 2**10 is 1024. Deletes and Inserts make new trees by only building new paths of length 10 of 11. But this efficiency can break down if the sequence of inserts is not random.

In the next picture 2 nodes have been inserted. They leave the tree somewhat out of balance. Instead of all nodes being within 3 step from the root, node 62 is five steps away

images/tree-04.png

After rebalancing, all nodes are again within 3 steps from the root

images/tree-05.png

Below are functions to make this happen. The idea is to convert the tree into a sorted Python list. This is done in values by simply traversing the tree left to right. When the list is complete optimum root for the binary tree is in the middle of the list. Take values to the left of our root and create a sub-tree for its left link. The values to the right create a sub-tree for its right link. Do this recursively and a balanced tree simply unfolds.

The function values builds the sorted Python list. Sorting is automatic in how the tree is traversed recursively. Function balance2 is also recursive and a sub-root node is inserted at each iteration.

07
08 def values(tree) :
09     if not tree : return []
10     (value,left,right) = tree
11     return values(left) + [value] + values(right)
12
13 def balance(tree) :
14     return balance2(None, values(tree))
15
16 def balance2(btree, sorted) :
17     if not sorted  : return btree
18     split = len(sorted)/2
19     btree = insert  (btree,sorted[split])     # midpoint
20     btree = balance2(btree,sorted[0:split])   # left side
21     btree = balance2(btree,sorted[split+1:])  # right side
22     return btree
23

If the number of nodes is a power of 2 then the tree will be perfectly balanced. Otherwise there will be a partially filled in row at the bottom from right to left.

The graphics were produced from test.py which in turn uses displayTree.py. You can run tree.py if you have pygame installed. Download and unzip bintree.zip and type the following.

$ python test.py
Type number or b or x: 25

Typing a number will insert or delete an existing node or insert a new one. Typing b will rebalance the tree. And x will exit.

The Wrapup

These basic algorithms for manipulating binary trees are fairly straight forward and quite fast and efficient. They find uses in many places.

Among other things, binary trees can work as indexes for Non SQL databases. Being able to hold multiple indexes into a database and multiple versions of the same index allows us to easily make changes and also to easily undo those changes by simply restoring an earlier pointer.

Finally, when possible, recursive data structures should always be handled with recursive algorithms. Modern languages, like Python, make this fairly straight-forward. And, although there is a learning curve, it is definitely worth the effort.

If you have comments or suggestions You can email me at mail me

Copyright © 2019 Chris Meyers

.