This is the second follow-up of my initial text about the FPGA Computer.

I use a fork of the customasm project for my FPGA-based CPU. It is on the github here:

https://github.com/milanvidakovic/FPGAcustomasm

This 16-bit CPU has 8 general-purpose registers (r0 – r7), pc (program counter), sp (stack pointer), ir (instruction register), and h (higher word when multiplying, or remainder when dividing). Each register is 16-bits wide.

The address bus is 16 bits wide, addressing 65536 addresses. Data bus is also 16 bits wide, but all the addresses are 8-bit aligned. 

There are eleven groups of instructions:


Group number
Group name
Group members
Group description
0
NOP/MOV/
IN/OUT/PUSH/
POP/RET/IRET/
HALT/SWAP
nop
mov reg, xx
mov reg, reg
in reg, [xx]
out [xx], reg
push reg
push xx
pop reg
ret
iret
swap
halt
The most general group. Deals with putting values into registers, exchanging values between registers, I/O operations, stack operations, returning from subroutines, and register content swapping. NOP and HALT are also in this group.
1
JUMP
j xx
jc xx
jnc xx
jz xx
jnz xx
jo xx
jno xx
jp xx
jnp xx
jg xx
jge xx
js xx
jse xx
Jump to the given location.

2
CALL
call xx
callc xx
callnc xx
callz xx
callnz xx
callo xx
callno xx
callp xx
callnp xx
callg xx
callge xx
calls xx
callse xx
Calling subroutine. Puts the return address on the stack before jumping to the subroutine. Needs to call RET when returning from the subroutine.
3
LOAD/STORE
ld reg, [xx]
ld reg, [reg]
ld reg, [reg + xx]
ld.b reg, [xx]
ld.b reg, [reg]
ld.b reg, [reg + xx]
st [xx], reg
st [reg], reg
st [reg + xx], reg
st.b [xx], reg
st.b [reg], reg
st.b [reg + xx], reg
Load from memory into the register
destination: register
source: memory address given by the number, or by the register, or by the register+number.
Store the given register into the memory location
destination: memory location given by the number, or by the register, or by the register+number.
4
ADD/SUB
add reg, reg
add reg, xx
add reg, [reg]
add reg, [xx]
add reg, [reg + xx]
add.b reg, [reg]
add.b reg, [xx]
add.b reg, [reg + xx]
sub reg, reg
sub reg, xx
sub reg, [reg]
sub reg, [xx]
sub reg, [reg + xx]
sub.b reg, [reg]
sub.b reg, [xx]
sub.b reg, [reg + xx]
 Add and sub group.
5
AND/OR
and reg, reg
and reg, xx
and reg, [reg]
and reg, [xx]
and reg, [reg + xx]
and.b reg, [reg]
and.b reg, [xx]
and.b reg, [reg + xx]
or reg, reg
or reg, xx
or reg, [reg]
or reg, [xx]
or reg, [reg + xx]
or.b reg, [reg]
or.b reg, [xx]
or.b reg, [reg + xx]
 And / or group.
6
XOR
xor reg, reg
xor reg, xx
xor reg, [reg]
xor reg, [xx]
xor reg, [reg + xx]
xor.b reg, [reg]
xor.b reg, [xx]
xor.b reg, [reg + xx]
 Xor group.
7
SHL/SHR
shl reg, reg
shl reg, xx
shl reg, [reg]
shl reg, [xx]
shl reg, [reg + xx]
shl.b reg, [reg]
shl.b reg, [xx]
shl.b reg, [reg + xx]
shr reg, reg
shr reg, xx
shr reg, [reg]
shr reg, [xx]
shr reg, [reg + xx]
shr.b reg, [reg]
shr.b reg, [xx]
shr.b reg, [reg + xx]
 Shift group.
8
MUL/DIV
mul reg, reg
mul reg, xx
mul reg, [reg]
mul reg, [xx]
mul reg, [reg + xx]
mul.b reg, [reg]
mul.b reg, [xx]
mul.b reg, [reg + xx]
div reg, reg
div reg, xx
div reg, [reg]
div reg, [xx]
div reg, [reg + xx]
div.b reg, [reg]
div.b reg, [xx]
div.b reg, [reg + xx]
Multiply / divide group.
9
INC/DEC
inc reg
inc [reg]
inc [xx]
inc [reg + xx]
inc.b [reg]
inc.b [xx]
inc.b [reg + xx]
dec reg
dec [reg]
dec [xx]
dec [reg + xx]
dec.b [reg]
dec.b [xx]
dec.b [reg + xx]
Increment and decrement group.
10
CMP/NEG
cmp reg, reg
cmp reg, xx
cmp reg, [reg]
cmp reg, [xx]
cmp reg, [reg + xx]
cmp.b reg, [reg]
cmp.b reg, [xx]
cmp.b reg, [reg + xx]
neg reg
neg [reg]
neg [xx]
neg [reg + xx]
neg.b [reg]
neg.b [xx]
neg.b [reg + xx]
 Compare / negate group.

All the instructions are two or four bytes long. Since the data bus is 16-bits wide, the complete instruction is fetched in either one or two memory reads. This means that, since the SRAM is used, the complete instruction is fetched, decoded, and executed in three or more clock cycles.

All the instructions have the similar format:


from
to
what
group
bbbb
0-7: r0-r7
8-sp
9-h
bbbb
0-7: r0-r7
8-sp
9-h
0000
0=>mov regx, regy
0000

The first byte has lower four bits used to designate the destination register (to), while upper four bits  are used for the source register (from) identification. The second byte has lower four bits for the instruction group identification (group) and upper four bits for the type of the instruction in that group (what).

For example, the  mov r2, r1  instruction is encoded as:
binary: 0001 0010 0000 0000
hex: 12 00

The Source is r1 (0001), the Destination is r2 (0010), the group is 0 (0000) and the type is move regx, regy (0000).

Second example is the  mov r1, 0x0f  instruction:
binary: 0000 0001 0010 0000, 0000 0000 0000 1111
hex: 01 20, 00 0f


The Load instructions are used to load the value from the memory into the register. The Store instructions store the value of the register into the given memory location. Memory location is given as number (ld  r1, [0x0a] - load the content of the 0x0a location into the r1 register), or as a value of a register (ld  r1, [r2] - load the content of the memory location to which r2 points), or as a sum of number and register (ld  r1, [0x0f + r2]). 

ld r1, [0x0a] loads two bytes from the 0x0a location. The address (0x0a) must be even if we work with 16-bit values.

If we want to load a byte from a location, we need to use the ".b" suffix:
ld.b r1, [0x0a]

The code above will load a byte from the 0x0a location into the r1 register.

Hello World example


Let's look at the Hello World example:

; this program will print HELLO WORLD
#addr 0x400
VIDEO_0 = 2400 ; beginning of the text frame buffer

mov r2, 0      ; r1 is the index
mov r1, hello  ; r1 holds the address of the "HELLO WORLD" string

again:
ld.b r0, [r1]          ; load r0 with the content of the memory location to which r1 points (current character)
cmp r0, 0              ; if the current character is 0 (string terminator),
jz end                 ; go out of this loop 
st [r2 + VIDEO_0], r0  ; store the character at the VIDEO_0 + r2 
inc r1                 ; move to the next character
add r2, 2              ; move to the next location in the video memory
j again                ; continue with the loop

end:
halt
hello:

#str "HELLO WORLD!\0"

First we define the constant VIDEO_0 with the valuer of 2400. This is the address of the text-based frame buffer. It points to the first character in the video memory.

Then we set the r2 to 0 and r1 to the address of the hello string. Note that the mov instruction is used to move the number into the register (for example, mov r2, 0), or to move a value of the source register to the destination register (for example, mov r1, r2).

Next, we enter the loop. The loop starts with the again label, and in the loop we load the byte value from the current address (starts with the first character of the hello string), then we compare that byte with the zero (checking the end of the string), and then we store that byte in the current address of the video memory.

When all the characters are printed on the screen, the CPU halts (halt instruction).


Interrupts


Let's look at the UART echo demo. This demo waits for the character to arrive via serial UART (115200 baud, one start bit, one stop bit, no partiy), then prints that character on the screen, and finally, echoes that character back to the UART:

#addr 0x400
; ########################################################
; REAL START OF THE PROGRAM
; ########################################################
mov sp, 1000

mov r0, 14
st [cursor], r0

; set the IRQ handler for UART to our own IRQ handler
mov r0, 1
mov r1, 16
st [r1], r0
mov r0, irq_triggered
mov r1, 18
st [r1], r0

halt

The code above sets the interrupt handling routine (irq_triggered) for the UART. This is the IRQ1 and its handling routine is at the address 16 (0x0010). This means that whenever the serial  UART subsystem receives a byte, the CPU will jump to the 0x0010 address. At that address, we have placed the JUMP instruction (j irq_triggered), having at the address 0x0010 value of 0x0001 (the JUMP instruction opcode - 0x0001) and at the address 0x0012 the address of the irq_triggered routine (st [r1], irq_triggered).

That way, we have prepared the UART interrupt routine and the main program halts. The rest of the program is in the interrupt routine. Let's look at the interrupt routine:

; ##################################################################
; Subroutine which is called whenever some byte arrives at the UART
; ##################################################################
irq_triggered:
push r0
push r1
push r2   
push r5
push r6

in r1, [64] ; r1 holds now received byte from the UART (address 64 decimal)
ld r6, [cursor]
st [r6 + VIDEO_0], r1    ; store the UART character at the VIDEO_0 + r2 
add r6, 2       ; move to the next location in the video memory
st [cursor], r6

loop2:
in r5, [65]   ; tx busy in r5
cmp r5, 0     
jz not_busy   ; if not busy, send back the received character 
j loop2
not_busy:
out [66], r1  ; send the received character to the UART
skip:
pop r6
pop r5
pop r2
pop r1                 
pop r0
iret
When the interrupt happens, the irq_triggered routine first pushes some registers on the stack, obtains the received byte from the UART (in r1, [64]), prints it on the screen, and then sends back that character through UART (out [66], r1). If the UART is busy sending some character, the in r5, [65] will have r5 set to 1; otherwise, the r5 will have 0. Finally, the routine pops the registers from the stack and returns (iret instruction). 

The difference between iret and ret is that ret pops the return address from the stack and jumps to the obtained address (return from the call subroutine), while the iret pops the return address, pops the flags, and then jumps to the obtained address (interrupt routine might have changed flags,so they need to be saved before interrupt routine is invoked, and restored during the iret execution).

All the examples are stored in the FPGACustomasm project on the github:
https://github.com/milanvidakovic/FPGAcustomasm/tree/master/examples/FPGA/raspbootin