Going 32-bit

There are follow-ups:
- added DMA controller for SD card,
- added floating point support in GCC port,
- added network support,
- added PS/2 mouse,
- implemented BLIT instruction,
- adding SPI interface to my FPGA computer,
- making BASIC interpreter for my FPGA platform,
- using GCC on my FPGA platform,
- added cache controller,
- new VGA display mode,
- booting from the SD card.


I have upgraded my FPGA-based computer from 16-bit to 32-bit. It now has 16 registers, each 32-bit. It uses 32MB SDRAM which exists on the DE0-NANO board, but it also uses static RAM for the video memory (frame buffer), for both text and graphics mode. It is approx. 40 KB of static RAM.

FPGA Computer Schematics

Memory management

If was quite painful to make the computer work with the SDRAM. The 32MB SDRAM needs a special controller to be used. I have found one useful controller on the github:
https://github.com/stffrdhrn/sdram-controller

Since there are two types of memory in this computer (dynamic and static), I had to make a decision how to layout the memory. First 40KBs are used for the static RAM (all interrupt vectors, text and graphics video RAM and sprite definition memory). After that, the rest of the memory is in the SDRAM (up until 32MB).

If there is a need to read from the memory, this is how it is done. Let's suppose that we need to read 16 bits from the PC + 2 address:

addr <= (pc + 2) >> 1;
next_state <= EXECUTE;
state <= READ_DATA;

We need to set the next_state register to the state to which we want to return, when the read is done. Then, the CPU goes to the READ_DATA state.

READ_DATA: begin
if (addr >= SDRAM_START_ADDR) begin
waiting_sdram <= 1;
addr_o <= addr;
rd_enable_o <= 1'b1;
if (busy_i) begin
state <= READ_WAIT;
end
end
else begin
memrd <= 1'b1;
memwr <= 1'b0;
state <= READ_WAIT;
end
end

In this READ_DATA state, the CPU puts the address to the SDRAM address bus (addr_o), and sets the rd_enable to 1. Then it waits until the SDRAM is ready to read (busy_i is 1). When the SDRAM controller starts reading, the CPU goes to the READ_WAIT state. 

READ_WAIT: begin
if (addr >= SDRAM_START_ADDR) begin
rd_enable_o <= 1'b0;
if (rd_ready_i) begin
waiting_sdram <= 0;
data_r <= rd_data_i;
state <= next_state;
end
end
else begin
memrd <= 1'b0;
memwr <= 1'b0;
data_r <= data;
state <= next_state;
end
end

The READ_WAIT state finishes when the data is obtained from the memory (the actual data is in the data_r register).  It takes approx. 6 cycles (at 100 MHz) to fully obtain data from the memory (from READ_DATA to READ_WAIT, both to be finished). Then, the CPU goes to the next_state, as being set before this reading operation has been started.

Regarding writing to the SDRAM memory, let's suppose that we want to put something on the stack:

addr <= (regs[SP] - 2'd2) >> 1;
data_to_write <= regs[ir[11:8]][15:0];
// move sp to the next location
regs[SP] <= regs[SP] - 2'd2;
next_state <= EXECUTE;
state <= WRITE_DATA;

We need to set the next_state register to the state to which we want to return, when the write is done. Then, the CPU goes to the WRITE_DATA state.

WRITE_DATA: begin
if (addr >= SDRAM_START_ADDR) begin
waiting_sdram <= 1;
addr_o <= addr;
wr_data_o <= data_to_write;
wr_enable_o <= 1'b1;
if (busy_i)
state <= WRITE_WAIT;
end
else begin
memrd <= 1'b0;
memwr <= 1'b1;
state <= WRITE_WAIT;
end
end

In the WRITE_DATA state, the CPU would set the address to be written (addr_o), data to be written (wr_data_o), and would set the wr_enable_o to 1. Then it would wait for the controller to notify that it is ready to write (busy_i is 1). Then the CPU goes to the WRITE_WAIT state.

WRITE_WAIT: begin
if (addr >= SDRAM_START_ADDR) begin
wr_enable_o <= 1'b0;
if (~busy_i) begin
waiting_sdram <= 0;
state <= next_state;
end
end
else begin
memrd <= 1'b0;
memwr <= 1'b0;
state <= next_state;
end
end

The WRITE_WAIT state finishes when the data is saved to the memory.  It takes approx. 6 cycles (at 100 MHz) to fully write data to the memory (from WRITE_DATA to WRITE_WAIT, both to be finished). Then, the CPU goes to the next_state, as being set before this writing operation has been started.

CPU redesign

The CPU itself was redesigned, too. It now has quite rich instruction set, 32-bit, 16-bit and 8-bit instructions, floating point (32-bit, single precision), and three interrupts:
- IRQ0 is the timer interrupt (triggered when a given number of milliseconds have been counted),
- IRQ1 is the UART interrupt (triggered when a byte has arrived), and
- IRQ2 is the PS/2 interrupt (triggered, whenever a key is pressed on the PS/2 keyboard).

The timer IRQ was made this way: there is a counter which is incremented every millisecond. There is a timer port which initially holds zero. The programmer needs to set the number of milliseconds to be counted after which the interrupt would occur. It is done using the OUT instruction:

mov.s r0, 0x0001 ; JUMP opcode
mov.s r1, TIMER_HANDLER_ADDR ; timer vector address
st.s [r1], r0
mov.w r0, timer_triggered
mov.s r1, TIMER_HANDLER_ADDR + 2
st.w [r1], r0 ; the timer IRQ handler has been set


move.w r0, 50  ; set the timer interrupt for every 50 milliseconds
out 129, r0

The assembler code above would set the internal timer register to the given value (50). Every millisecond the CPU would increase another internal register, named timer_counter, and when the timer_counter reaches the timer, that would trigger the timer interrupt:

if (timer && (timer_counter < timer)) begin
timer_counter <= timer_counter + 1'b1;
end
else if (timer && (timer_counter == timer)) begin
irq[0] <= 1;
timer_counter <= 0;
end 

At the end of each instruction execution, there is a check for the interrupts:

if (irq_r[0]) begin
// timer
pc <= 16'd8;
addr <= 16'd4;
irq_r[0] <= 0;
end 

If there is a timer interrupt, the CPU would jump to the TIMER_HANDLER_ADDR, which is 8.

FPGA Raspbootin loader

I have modified the FPGA Raspbootin loader so it would now load the FPGA itself, instead of relying on the Quartus II studio for that. This means that I can now control the Computer from a single application - FPGA Raspbootin:


The loader now first loads the design into the FPGA (unless it is flashed - then no loading the design file is needed), and then it loads the selected binary into the computer. Here is the Java code for loading the design into the FPGA (by starting the quartus_pgm.exe program):

public static void runFpga() {
Process process;
try {
process = new ProcessBuilder(qpfPath,
"-c", "usb-blaster",
"-m", "jtag",
"-o", "P;" + sofPath).start();
InputStream is = process.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line;
while ((line = br.readLine()) != null) {
  System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}

The qpfPath points to the quartus_pgm.exe file, which acutally loads the design into the FPGA. Usually it is something like: C:\altera\13.0\quartus\bin\quartus_pgm.exe

The design file has the *.sof extension, and it is loaded into the FPGA using the quartus_pgm.exe program. The *.sof file is built during the compilation of the design inside the Quartus II studio. In my program, the path to the *.sof file is in the sofPath variable.

More details about loading FPGA design on the DE0-NANO FPGA board can be found here:
https://blog.vidakovic.xyz/posts/2019/10/flashing-de0-nano-fpga-board

Conclusion

The 32-bit rework took more time than I expected, mainly because I wanted to use the built-in 32MB SDRAM. Then I added the floating-point instructions and now it looks quite stable. I have used about 80% of the FPGA, so I could try to do something more later.

The CPU is on the github:
https://github.com/milanvidakovic/FPGAComputer32

The assembler examples are on the github:
https://github.com/milanvidakovic/Assembler32

The Raspbootin64 boot loader is on the github:
https://github.com/milanvidakovic/FPGARaspbootin64Client

The Emulator is on the github:
https://github.com/milanvidakovic/FPGAEmulator32