Ex Nihilo

If you had a blank computer, could you ever do anything with it? Say you were on a desert island with only a source of electricity, a computer with a blank hard drive, and luckily, a copy of the Intel X86 Instruction Set Architecture manual, could you bootstrap a useful system?

Rules

For practical reasons, this project cheats a little. First, all the work is done in a VirtualBox VM rather than a physical machine. Second, the initial disk image (containing just the monitor program that follows) is created outside of VirtualBox, using xxdand VBoxManage. And finally, I haven't strictly limited the resources I'm willing to use. Notably, OSDev Wiki and FORTH Standard.

In the future, I may try to use a real device, emulate an IDE or SATA device "by hand" for the initial monitor program, and limit the reference material to only instruction set reference (or perhaps no reference at all). But not yet.

Monitor

First, we need a way to get any data at all into memory, so it can be executed. A monitor program is placed in the boot sector so that is executes on boot. We'll skip hand-emulated SATA for now.

This monitor accepts hex input from the keyboard and writes it to the location given by [di]. Absolutely no error handling is done, so it is critical to type without mistakes. It's important for this initial program to be as small and simple as possible, since in a real desert island computer emergency, it would need to be inputted by emulating a disk drive. This won't take long to explain, since there isn't much of it.

First, it sets the segment registers cs, ds, and es to zero (es turned out to be unnecessary).

Then, it processes characters from the keyboard one by one using BIOS interrupt 16h to read a character, then interrupt 10h to echo it. Each character is first checked to see if it is a command character.

If the character is a ., the last two bytes are interpreted as an address, and the write pointer is set to that address:

mov di, [di - 2]

If the character is !, the last two bytes are interpreted as an address, and that address is called.

dec di; dec di
call [di]

An earlier version used jmp rather than call, but calling is much more useful since it allows multiple functions to be called, each time returning to the monitor for further data entry or function calling. Callees must save the registers they modify!

If the current character is not a command character, then it is interpreted as a nibble in a byte that will be written to [di]. Within loop_outer, cx is set to 2, since there are two hex characters in a byte. dl is used as an accumulator for the byte value.

Since 0-9 and a-f are not adjacent in ASCII, and we would like to write as little hand-assembled machine code as possible, the math to convert a character into a nibble value looks a little strange:

First, dl is shifted left by 4 to move the last nibble up and out of the way of the nibble being read now. Then, the value of : is subtracted. Since : immediately follows 9, after the subtraction the value is negative if it is in 0-9, and non-negative if it is in a-f. No attempt is made to handle invalid characters.

If the character is 0-9, 10 is added to bring it back above zero, and it is added

If the character is a-f, then the value has ('a' - ':') subtracted from it, to bring it to 0-5. Execution then falls through to the digit handling, where 10 is added and accumulation occurs.

On the second loop in loop_inner, stosb is used to store the accumulated byte value to [di] and increment di. Then a space is emitted using interrupt 10h, to make it easier to see what has been typed.

Finally, a short test is provided. It moves di to 0x8000, inputs a short function that prints ~, and then calls that function.

Persistence

Now that we can enter functions and call them, the most critical thing is to be able to save our work. Since the monitor does no input validation, and we are assembling instructions by hand, mistakes will be common. We don't want to start from scratch every time we hang the machine (which will be often).

BIOS interrupt 13h is used to read and write using LBA. Two configuration structures are built: one that writes the boot sector, and another to read and write 63 sectors at 0x8000. "Why 63?" you ask. Because 64 caused a hang. I didn't bother to diagnose why, but possibly because 0x8000 + 64 * 512 is too large to fit in a 16-bit integer.

One important thing to note is that these functions are all defined in the memory occupied by the boot sector. It is dangerous to rewrite the boot sector, but it's important for these persistence functions. We need them to be available at boot so they can be used to load everything else.

FORTH

Writing out assembly by hand, translating it to hex, and keying it without mistakes is tedious and awful. So, as soon as possible, it would be nice to work at a higher level of abstraction. FORTH is a good candidate for several reasons: it's easy to parse, straighforward to bridge with native code, and—crucially—suitable for bootstrapping itself.

This first implementation of FORTH is as simple as possible, and consists only of six global variables, six dictionary entries, and a dozen or so functions to implement the interpreter.

TODO: Describe globals.

TODO: Describe dictionary layout.

TODO: Describe initial dictionary entries.

TODO: Describe docol.

TODO: Describe interpret. Talk about the initial version not being powerful enough to bootstrap. and having to re-implement it as interpret.2.

TODO: Describe word

TODO: Describe find

TODO: Describe NEXT/PUSHRS/POPRS macros.

TODO: Describe create

TODO: Describe >cfa

TODO: Describe machine

TODO: Describe end_machine

TODO: Describe hexword

TODO: Describe cold_start

TODO: Describe goforth

TODO: Describe interpret.2

Bootstrapping Forth

TODO: Describe describe the bootstrapping process, especially the rationale for the order of implementation, and incremental improvements like [tab] as a makeshift return key.

More to come...