FPGA Computer Assembler
This is the second follow-up of my initial text about the FPGA
Computer.
I use a fork of the customasm project for my FPGA-based CPU. It is on the github here:
https://github.com/milanvidakovic/FPGAcustomasm
The address bus is 16 bits wide, addressing 65536 addresses. Data bus is
also 16 bits wide, but all the addresses are 8-bit aligned.
There are eleven groups of instructions:
All the instructions have the similar format:
For example, the mov r2, r1 instruction is encoded as:
The Source is r1 (0001), the Destination is r2 (0010), the group is 0 (0000) and the type is move regx, regy (0000).
Second example is the mov r1, 0x0f instruction:
ld r1, [0x0a] loads two bytes from the 0x0a location. The address (0x0a) must be even if we work with 16-bit values.
If we want to load a byte from a location, we need to use the ".b" suffix:
ld.b r1, [0x0a]
The code above will load a byte from the 0x0a location into the r1 register.
Let's look at the Hello World example:
; this program will print HELLO WORLD
#addr 0x400
VIDEO_0 = 2400 ; beginning of the text frame buffer
mov r2, 0 ; r1 is the index
mov r1, hello ; r1 holds the address of the "HELLO WORLD" string
again:
ld.b r0, [r1] ; load r0 with the content of the memory location to which r1 points (current character)
cmp r0, 0 ; if the current character is 0 (string terminator),
jz end ; go out of this loop
st [r2 + VIDEO_0], r0 ; store the character at the VIDEO_0 + r2
inc r1 ; move to the next character
add r2, 2 ; move to the next location in the video memory
j again ; continue with the loop
end:
halt
hello:
#str "HELLO WORLD!\0"
Next, we enter the loop. The loop starts with the again label, and in the loop we load the byte value from the current address (starts with the first character of the hello string), then we compare that byte with the zero (checking the end of the string), and then we store that byte in the current address of the video memory.
When all the characters are printed on the screen, the CPU halts (halt instruction).
Let's look at the UART echo demo. This demo waits for the character to arrive via serial UART (115200
baud, one start bit, one stop bit, no partiy), then prints that character on the screen, and finally, echoes
that character back to the UART:
#addr 0x400
; ########################################################
; REAL START OF THE PROGRAM
; ########################################################
mov sp, 1000
mov r0, 14
st [cursor], r0
; set the IRQ handler for UART to our own IRQ handler
mov r0, 1
mov r1, 16
st [r1], r0
mov r0, irq_triggered
mov r1, 18
st [r1], r0
halt
All the examples are stored in the FPGACustomasm project on the github:
https://github.com/milanvidakovic/FPGAcustomasm/tree/master/examples/FPGA/raspbootin
I use a fork of the customasm project for my FPGA-based CPU. It is on the github here:
https://github.com/milanvidakovic/FPGAcustomasm
This 16-bit CPU has 8 general-purpose registers (r0 – r7), pc (program
counter), sp (stack pointer), ir (instruction register), and h (higher word when multiplying, or
remainder when dividing). Each register is 16-bits wide.
There are eleven groups of instructions:
Group number
|
Group name
|
Group members
|
Group description
|
0
|
NOP/MOV/
IN/OUT/PUSH/
POP/RET/IRET/
HALT/SWAP |
nop
mov reg, xx
mov reg, reg
in reg, [xx]
out [xx], reg
push reg
push xx
pop reg
ret
iret
swap halt
|
The most general group.
Deals with putting values into registers, exchanging values between registers, I/O
operations, stack operations, returning from subroutines, and register content swapping. NOP
and HALT are also in this group.
|
1
|
JUMP
|
j xx
jc xx
jnc xx
jz xx
jnz xx
jo xx
jno xx
jp xx
jnp xx
jg xx jge xx js xx jse xx |
Jump to the given
location.
|
2
|
CALL
|
call xx
callc xx
callnc xx
callz xx
callnz xx
callo xx
callno xx
callp xx
callnp xx
callg xx callge xx calls xx callse xx |
Calling subroutine. Puts
the return address on the stack before jumping to the subroutine. Needs to call RET when
returning from the subroutine.
|
3
|
LOAD/STORE
|
ld reg, [xx]
ld reg, [reg]
ld reg, [reg + xx]
ld.b reg, [xx]
ld.b reg, [reg]
ld.b reg, [reg + xx]
st [xx], reg
st [reg], reg
st [reg + xx], reg
st.b [xx], reg
st.b [reg], reg
st.b [reg + xx], reg
|
Load from memory into
the register
destination: register
source: memory address
given by the number, or by the register, or by the register+number.
Store the given register
into the memory location
destination: memory
location given by the number, or by the register, or by the register+number.
|
4
|
ADD/SUB
|
add reg, reg
add reg, xx
add reg, [reg]
add reg, [xx]
add reg, [reg + xx]
add.b reg, [reg]
add.b reg, [xx]
add.b reg, [reg +
xx]
sub reg, reg
sub reg, xx
sub reg, [reg]
sub reg, [xx]
sub reg, [reg + xx]
sub.b reg, [reg]
sub.b reg, [xx]
sub.b reg, [reg +
xx]
|
Add and sub group.
|
5
|
AND/OR
|
and reg, reg
and reg, xx
and reg, [reg]
and reg, [xx]
and reg, [reg + xx]
and.b reg, [reg]
and.b reg, [xx]
and.b reg, [reg +
xx]
or reg, reg
or reg, xx
or reg, [reg]
or reg, [xx]
or reg, [reg + xx]
or.b reg, [reg]
or.b reg, [xx]
or.b reg, [reg + xx]
|
And / or group.
|
6
|
XOR
|
xor reg, reg
xor reg, xx
xor reg, [reg]
xor reg, [xx]
xor reg, [reg + xx]
xor.b reg, [reg]
xor.b reg, [xx]
xor.b reg, [reg +
xx]
|
Xor group.
|
7
|
SHL/SHR
|
shl reg, reg
shl reg, xx
shl reg, [reg]
shl reg, [xx]
shl reg, [reg + xx]
shl.b reg, [reg]
shl.b reg, [xx]
shl.b reg, [reg +
xx]
shr reg, reg
shr reg, xx
shr reg, [reg]
shr reg, [xx]
shr reg, [reg + xx]
shr.b reg, [reg]
shr.b reg, [xx]
shr.b reg, [reg +
xx]
|
Shift group.
|
8
|
MUL/DIV
|
mul reg, reg
mul reg, xx
mul reg, [reg]
mul reg, [xx]
mul reg, [reg + xx]
mul.b reg, [reg]
mul.b reg, [xx]
mul.b reg, [reg +
xx]
div reg, reg
div reg, xx
div reg, [reg]
div reg, [xx]
div reg, [reg + xx]
div.b reg, [reg]
div.b reg, [xx]
div.b reg, [reg +
xx]
|
Multiply / divide group.
|
9
|
INC/DEC
|
inc reg
inc [reg]
inc [xx]
inc [reg + xx]
inc.b [reg]
inc.b [xx]
inc.b [reg + xx]
dec reg
dec [reg]
dec [xx]
dec [reg + xx]
dec.b [reg]
dec.b [xx]
dec.b [reg + xx]
|
Increment and decrement
group.
|
10
|
CMP/NEG
|
cmp reg, reg
cmp reg, xx
cmp reg, [reg]
cmp reg, [xx]
cmp reg, [reg + xx]
cmp.b reg, [reg]
cmp.b reg, [xx]
cmp.b reg, [reg +
xx]
neg reg
neg [reg]
neg [xx]
neg [reg + xx]
neg.b [reg]
neg.b [xx]
neg.b [reg + xx]
|
Compare / negate group.
|
All the instructions are two or four bytes long. Since the data bus is
16-bits wide, the complete instruction is fetched in either one or two memory reads. This means that,
since the SRAM is used, the complete instruction is fetched, decoded, and executed in three or more
clock cycles.
All the instructions have the similar format:
from
|
to
|
what
|
group
|
bbbb
0-7: r0-r7
8-sp
9-h
|
bbbb
0-7: r0-r7
8-sp
9-h
|
0000
0=>mov regx, regy
|
0000
|
The first byte has lower four bits used to designate the destination register
(to), while upper four bits are used for the source register (from)
identification. The second byte has lower four bits for the instruction group identification
(group) and upper four bits for the type of the instruction in that group (what).
For example, the mov r2, r1 instruction is encoded as:
binary: 0001 0010 0000 0000
hex: 12 00The Source is r1 (0001), the Destination is r2 (0010), the group is 0 (0000) and the type is move regx, regy (0000).
Second example is the mov r1, 0x0f instruction:
binary: 0000 0001 0010 0000, 0000 0000 0000 1111
hex: 01 20, 00 0f
The Load instructions are used to load the value from the memory into
the register. The Store instructions store the value of the register into the given memory location.
Memory location is given as number (ld r1, [0x0a] - load the content of the
0x0a location into the r1 register), or as a value of a register
(ld r1, [r2] - load the content of the memory location to which r2
points), or as a sum of number and register (ld r1, [0x0f +
r2]).
ld r1, [0x0a] loads two bytes from the 0x0a location. The address (0x0a) must be even if we work with 16-bit values.
If we want to load a byte from a location, we need to use the ".b" suffix:
ld.b r1, [0x0a]
The code above will load a byte from the 0x0a location into the r1 register.
Hello World example
Let's look at the Hello World example:
; this program will print HELLO WORLD
#addr 0x400
VIDEO_0 = 2400 ; beginning of the text frame buffer
mov r2, 0 ; r1 is the index
mov r1, hello ; r1 holds the address of the "HELLO WORLD" string
again:
ld.b r0, [r1] ; load r0 with the content of the memory location to which r1 points (current character)
cmp r0, 0 ; if the current character is 0 (string terminator),
jz end ; go out of this loop
st [r2 + VIDEO_0], r0 ; store the character at the VIDEO_0 + r2
inc r1 ; move to the next character
add r2, 2 ; move to the next location in the video memory
j again ; continue with the loop
end:
halt
hello:
#str "HELLO WORLD!\0"
First we define the constant VIDEO_0 with the valuer of 2400.
This is the address of the text-based frame buffer. It points to the first character in the video
memory.
Then we set the r2 to 0 and r1 to
the address of the hello string. Note that the mov instruction is used to move
the number into the register (for example, mov r2, 0), or to move a value of the source
register to the destination register (for example, mov r1, r2).
Next, we enter the loop. The loop starts with the again label, and in the loop we load the byte value from the current address (starts with the first character of the hello string), then we compare that byte with the zero (checking the end of the string), and then we store that byte in the current address of the video memory.
When all the characters are printed on the screen, the CPU halts (halt instruction).
Interrupts
#addr 0x400
; ########################################################
; REAL START OF THE PROGRAM
; ########################################################
mov sp, 1000
mov r0, 14
st [cursor], r0
; set the IRQ handler for UART to our own IRQ handler
mov r0, 1
mov r1, 16
st [r1], r0
mov r0, irq_triggered
mov r1, 18
st [r1], r0
halt
The code above sets the interrupt handling routine
(irq_triggered) for the UART. This is the IRQ1 and its handling routine is at the address 16
(0x0010). This means that whenever the serial UART subsystem receives a byte, the CPU will
jump to the 0x0010 address. At that address, we have placed the JUMP instruction (j
irq_triggered), having at the address 0x0010 value of 0x0001 (the JUMP
instruction opcode - 0x0001) and at the address 0x0012 the address of the
irq_triggered routine (st [r1], irq_triggered).
That way, we have prepared the UART interrupt routine and the main program
halts. The rest of the program is in the interrupt routine. Let's look at the interrupt routine:
; ##################################################################
; Subroutine which is called whenever some byte arrives at the UART
; ##################################################################
irq_triggered:
push r0
push r1
push r2
push r5
push r6
in r1, [64] ; r1 holds now received byte from the UART (address 64
decimal)
ld r6, [cursor]
st [r6 + VIDEO_0], r1 ; store the UART character at the VIDEO_0 + r2
add r6, 2 ; move to the next location in the video memory
st [cursor], r6
loop2:
in r5, [65] ; tx busy in r5
cmp r5, 0
jz not_busy ; if not busy, send back the received character
j loop2
not_busy:
out [66], r1 ; send the received character to the UART
skip:
pop r6
pop r5
pop r2
pop r1
pop r0
iret
When the interrupt happens, the irq_triggered routine first
pushes some registers on the stack, obtains the received byte from the UART (in r1, [64]),
prints it on the screen, and then sends back that character through UART (out [66], r1). If the
UART is busy sending some character, the in r5, [65] will have r5 set to 1; otherwise, the r5
will have 0. Finally, the routine pops the registers from the stack and returns (iret instruction).
The difference between iret and ret is that
ret pops the return address from the stack and jumps to the obtained address (return from the
call subroutine), while the iret pops the return address, pops the flags, and then jumps to the
obtained address (interrupt routine might have changed flags,so they need to be saved before interrupt
routine is invoked, and restored during the iret execution).
All the examples are stored in the FPGACustomasm project on the github:
https://github.com/milanvidakovic/FPGAcustomasm/tree/master/examples/FPGA/raspbootin
Comments