Simulator for a Mythical Machine

What is Machine Language

Machine language is the basic language understood by the electronics of the computer's CPU (central processing unit). It is strictly numerical rather than having any sort of syntax you are used to with Python or perhaps other languages. Each type of CPU has its own unique machine language. The machine language for a Macintosh is very different from an Intel based PC.

Machine language is much simpler, in general, than high level languages that came later. Computers were first designed around just a few basic ideas. There needed to be a store of memory containing fixed sized numbers. There would be instructions to move numbers between memory and one or more special registers which could also hold numbers of the same size as the memory store. Other instructions would apply arithmetic operations to the numbers in the registers. Finally, special instructions could alter the flow of the program allowing repetition and conditional execution of instructions. What you do in Python when you use while, for and if statements.

Originally the numeric instructions were wired in with patch cords. Later someone had the bright (and simple) idea of using the memory store to hold both data and instructions. To make this work the numeric instructions had to be the same size as the numbers being stored in memory. Because almost all computers use the binary number system, this "word size" is expressed as a certain number of bits for each number. Practial computers have had word sizes from 12 to 64 bits.

In this study we will develop the Mythical Machine Language (MML) for, of course, the Mythical Machine (MM). Since we can't build a real machine we will create a small Python program to simulate its operation. MM is much simpler than any real computer but is still be capable of doing real computations. To make our task simpler MM is a decimal based computer. Each of its memory words can hold a 6 digit decimal number. The machine language instructions use discrete digits for the parts of the instructions. This will become clearer with some examples.

Click here for the Python code for the MM Simulator

Design of MM

Our machine will have exactly 1000 words of memory, each with a address of 000 to 999. So a memory address requires exactly 3 decimal digits. Each word of memory holds a 6 digit decimal number and may be used for either data or program.

In addition our machine will have 10 general registers that also hold a 6 digit number. The registers generally hold temporary values being computed. Also there are 2 special registers. The "pReg" is the program counter. It contains the memory address of the next instruction to be executed. The "iReg" contains the instruction currently being executed.

Our simulator will let us load a program into memory and then run (execute) the program step by step. First the word of memory addressed by the pReg is copied (loaded) to the iReg. Next the instruction is carried out (executed). This process repeats until a "Halt" instruction is executed which will cause the simulator to exit. While the program is running the contents of the pReg, iReg and 10 general registers are updated and displayed.

Machine Language for MM

The following describes the instructions for MM. Each instruction will use 1 word of memory. Of the 6 digits in the word, 2 are reserved for the operation code (what the instruction will do), 1 digit will specify a general register, and the remaining 3 will specify either a memory address 000 to 999, an actual number or, depending on the instruction, a second general register.

Two digits for the operation code allows us to have 100 different instructions. We'll only need a dozen or so. These will let us move numbers to and from memory and registers, do arithmetic (add, subtract, multiply, and divide) on numbers in registers, and alter the flow of the program.

--------------- Instruction Set for MM --------------
000000       Halt
01rmmm       Load register r with contents of address mmm.
02rmmm       Store the contents of register r at address mmm.
03rnnn       Load register r with the number nnn.
04r00s       Load register r with the memory word addressed by register s.
05r00s       Add contents of register s to register r
06r00s       Sub contents of register s from register r
07r00s       Mul contents of register r by register s
08r00s       Div contents of register r by register s
100mmm       Jump to location mmm
11rmmm       Jump to location mmm if register r is zero

In the above diagram the letters "r" and "s" represent general registers, "mmm" represents arbitrary 3 digit addresses and "nnn" an arbitrary 3 digit number.

The first instruction with an opcode of 00 means the program is to halt. It doesn't matter what is in the remaining 4 digits but they are usually zero also. Executing a halt in the simulator causes it to exit but leaves the display on the screen so you can read the result of your computation in one of the general registers.

The next four instructions copy numbers between memory and the 10 general registers. The third digit specifies the register and the last 3 digits the source in memory. So 016234 means load register 6 with the number at memory address 234. The memory word itself is not changed. 023234 copies the number in register 3 to memory address 234. This also leaves the register 3 unchanged. Opcode 03 is a little different and has the name "load number". 035123 puts the number 123 into register 5. Opcode 04 uses another register as an index to memory. If register 2 contains the number 546 then the instruction 043002 loads whatever is in memory location 546 into register 3. This instruction will allow us to operate on a list (or array) of numbers.

The next four instructions operate just between the special registers. Instead of the low 3 digits specifying a memory location, they specify a second register. So 057008 adds the number in register 8 to register 7. Register 7 gets the sum and register 8 is unchanged. The same pattern is used for subract (06), multiply (07), and divide (08).

Finally, the last two instructions make it possible for our programs to do repetitive and conditional logic. The instruction 100452 puts the number 452 into the program counter (pReg). So whatever instruction is at 452 will be the next one fetched and executed. The instruction 113764 will set the number 764 into the pReg if and only if register 3 contains the number zero. These instructions are called jumps. The first is an unconditional "jump" and the second is called "jump if zero".

Our First Program

In this section we will write a little program that simply adds two numbers together.

To prepare data for the simulator we need to prepare a file that contains the address and its content for each memory location that we will use. Once the program is loaded the simulator sets the pReg to 100 and execution begins. There is nothing magic about 100. But generally programs do not start at location zero.

During each execution cycle an instruction is fetched, the pReg is advanced and the instruction is executed. If the instruction is a jump instruction the pReg may be changed. Otherwise it contains the address of the next word in memory. This cycle repeats until a Halt instruction is executed and the simulator stops.

Our program file may also contain any arbitary comments after the address and its data. The simulator will only look at the first 2 fields of each line. Here is the program to add the number 12 and 13.

100   031005   Load register one with the number 12
101   032006   Load register two with the number 13
102   051002   Add register two to register one.
103   000000   Halt. The answer is in register one.

The above 4 lines are in the file "prog1.mml". Run the simulator program as shown. You will be prompted to hit the return key before each instruction fetch and before each instruction execution. When the program stops the screen should show

$python simulator.py prog1.mml mode=-1
   --- a few screens later ---
                                                 100   31005
                                                 101   32006
             The Mythical Machine                102   51002
PC:    103      Inst:      0              -->    103       0
                                                 104       0
   Reg 0:      0     Reg 5:      0               105       0
   Reg 1:     11     Reg 6:      0               106       0
   Reg 2:      6     Reg 7:      0               107       0
   Reg 3:      0     Reg 8:      0               108       0
   Reg 4:      0     Reg 9:      0               109       0
                                                 110       0
                                                 111       0
instruction executed                             112       0
                                                 113       0
                                                 114       0

showing that the last thing that happened was that the halt instruction was retrieved from address 103 and the register 1 contains 11 modified as per instructions. Notice that the pReg was advanced to 104. It gets advanced after each instruction.

The Python Simulator for MM

Let's spend a little time with the code in simulator.py.

First of all the simulator displays its progress on your terminal screen. With each change the screen is erased and redrawn. To erase the screen in Unix, the python call os.system("clear") is used. If you are using Windows command program, then you will probably want os.system("cls") instead.

For the most part the simulator program is straight forward.

Function "main" calls "loadProgram" with the filename as passed on the command line. Each line is a memory address and instruction in the first two columns. mem elements are populated. The program counter is set to 100. Execution starts with a call to function cycle.

Function cycle retrieves instructions, decodes the fields decodeInst for opcode, register and memory address, then displays the screens. On confirmation or sleep time, it then executes the instruction and displays the screen again to show the changes. This repeats until a halt is executed.

Function displayPanel creates a left (registers) and a right (memory) frame and shown above. Two lists col1 and col2 contain text for each are merged as the screen is printed.

A more complex program

Let's now look at a program that computes the sum of several numbers. We'll put the numbers (the data) in memory starting at location 200. We'll put the program instructions at location 100. The program will add numbers to the running sum until it encounters the number zero which is used as a sign to stop.

In this program we need to use the "load indirect" (04) instruction and will use register 1 to point to the data as we add them to the sum in register 0. Here is the program with comments

100 030000   reg 0 holds the sum, set to zero
101 031200   put address (200) of 1st number in reg 1
102 032001   reg 2 holds the number one.
103 043001   next number (via reg 1) to reg 3
104 113108   if zero we're done so jump to halt inst
105 050003   otherwise add reg 3 to the sum in reg 0
106 051002   add one (in reg 2) to reg 1 so it points to next number
107 100103   jump back to 103 to get the next number
108 000000   all done so halt. the sum is in reg 0

200 000123   the numbers to add
201 000234
202 000345
203 000000   the end of the list

What's missing?

MM is really too simple to be of much use although we will use it to calculate factorials both in assember and a little high level language that we'll design for it. Several additional features would be found in any real comupter. Let's look at a few of these.

We can only store integers up to 999999 in the memory or registers. Real computers use floating point numbers, character data, bitmaps and so on. But it's all (binary) numbers. Each computer stores floating point numbers in its own way and has special machine instructions for doing arithmetic with such numbers, although early computers emualated this in software. You may already be familiar with how numbers represent characters in ascii character set and there are others as well. Character strings and lists, or arrays, are stored using sequential addresses in the memory very much like we did in the second program.

With the Jump instructions we can do loops and if kinds of logic. Along with the conditional jump if zero would be other kinds of tests as well such as jump if negative.

Also, to call a subroutine (function) requires both a jump and a way to return to the instruction after the jump to subroutine. This was often awkward until machine code used stacks to keep track of return addresses. Stacks are simply sections of memory where a general register addresses the top element. Stacks make recursive functions possible even in machine language.

Finally, every computer needs some way to communicate with the outside world. We gave MM a panel showing its registers but nothing else. Real computers, even the earliest ones had keyboards, screens or paper prinouts, and so on. Typically the electronics are designed so that special addresses reference registers in these devices instead of main memory.

We'll continue to use MM in the next two sections on assembler language and the small compiler.

You can download the zip file for this project here.

If you have comments or suggestions You can email me at mail me

* * *