Adding hardware sprites

This is a follow-up of the FPGA computer post. 

I have added hardware sprites to the graphic mode of my FPGA computer. It now supports up to 16 sprites, each one being 16x16pixels in size. Here is how it looks on the monitor:
In emulator, it looks the same:



Each sprite is defined by the 8-byte structure:
The sprite structure for the first sprite starts at address of 56 decimal. Each next sprite structure starts 8 bytes later. 

Sprite definition data consists of 16 lines, each line described by 16 pixels, each pixel defined by 4 bits: xrgbThis means that one sprite line consists of 8 bytes (two pixels per byte), so total bytes needed for the sprite definition is 8x16 bytes == 128 bytes.

Here is the example of showing one sprite at (25, 25) in assembler language:

  mov r0, sprite_def
  mov r1, 56
  st [r1], r0  ; sprite definition is at sprite_def address
  mov r0, 25
  st [r1 + 2], r0  ; x = 25  at addr 58
  mov r0, 25
  st [r1 + 4], r0  ; y = 25  at addr 60
  mov r0, 0
  st [r1 + 6], r0  ; transparent color is black (0) at addr 62
  ; sprite definition
sprite_def:
  #d16 0x0000, 0x0000, 0x0000, 0x0000  ; 0
  #d16 0x0000, 0x000f, 0xf000, 0x0000  ; 1
  #d16 0x0000, 0x000f, 0xf000, 0x0000  ; 2
  #d16 0x0000, 0x000f, 0xf000, 0x0000  ; 3
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 4
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 5
  #d16 0x0000, 0x044f, 0xf440, 0x0000  ; 6
  #d16 0x0000, 0x444f, 0xf444, 0x0000  ; 7
  #d16 0x0004, 0x444f, 0xf444, 0x4000  ; 8
  #d16 0x0044, 0x444f, 0xf444, 0x4400  ; 9
  #d16 0x0400, 0x004f, 0xf400, 0x0040  ; 10
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 11
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 12
  #d16 0x0000, 0x041f, 0xf140, 0x0000  ; 13
  #d16 0x0000, 0x4111, 0x1114, 0x0000  ; 14
  #d16 0x0004, 0x4444, 0x4444, 0x4000  ; 15

How this stuff works? First of all, I had to decide how to implement sprites. I have decided to fetch all sprite data during the vertical blanking interval (VBI). During VBI, the video subsystem starts fetching sprite data by reading the 8-byte sprite structure starting from the address of 56 decimal (the address and data bus are 16-bit, so the computer is word-oriented (reads two bytes at the same time), and the actual address is set to
56 >> 1 == 28):

if ((x >= 640) && (y == 479) && (state == IN_LINE)) begin
  // when we start the vertical blanking, 
  // we need to fetch in advance the first sprite data
  state <= READ_SPRITES;
  sprite_counter <= 4'b0;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  addr <= 16'd28;    // prepare to read sprite definition address
end

In the next clock cycle, the system is in the READ_SPRITES state. The first thing that we do in the READ_SPRITES state is fetching the sprite definition address which is present at the data bus, since we have initiated a memory read from within the previous state.

Then we need to prepare the address bus for the next state in which we will fetch the x coordinate of the sprite. We do that by setting the address bus to (58 + (sprite_counter << 3)) for all sprites, having the sprite_counter iterating from 0 to 15:

READ_SPRITES: begin
  sprite_addr[sprite_counter] <= data;
  state <= READ_SPRITE_X;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  // prepare to read x coordinate of the sprite
  addr <= (16'd58 + (sprite_counter << 3)) >> 1;    
end

In the READ_SPRITE_X state, we fetch the x coordinate of the sprite which was ready at the data bus, and then we prepare to read the y coordinate in the next state:

READ_SPRITE_Y: begin
  sprite_y[sprite_counter] <= data;
  state <= READ_SPRITE_TRANSPARENT_COLOR;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  // prepare to read transparent color of the sprite  
  addr <= (16'd62 + (sprite_counter << 3)) >> 1;    
end

In the READ_SPRITE_Y state, we fetch the y coordinate of the sprite which was ready at the data bus, and then we prepare to read the sprite transparent color in the next state:

READ_SPRITE_TRANSPARENT_COLOR: begin
  sprite_transparent_color[sprite_counter] <= data[3:0];
  state <= READ_SPRITE_DATA;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  line_counter <= 16'b0;
  word_counter <= 4'b0;
  // read sprite definition bytes
  addr <= sprite_addr[sprite_counter] >> 1;    
end

In the READ_SPRITE_TRANSPARENT_COLOR state, we fetch the transparent color of the sprite, and then put the address of the sprite definition to the address bus so we can fetch it in the next state:

READ_SPRITE_DATA: begin
  if (line_counter < 16) begin
    case (word_counter) 
    0:  sprite_pixels[sprite_counter][line_counter][63:48] <= data;
    1:  sprite_pixels[sprite_counter][line_counter][47:32] <= data;
    2:  sprite_pixels[sprite_counter][line_counter][31:16] <= data;
    3:  sprite_pixels[sprite_counter][line_counter][15:0]  <= data;
    endcase
    state <= READ_SPRITE_DATA;
    rd <= 1'b1;
    wr <= 1'b0;
    mem_read <= 1'b1;
    if (word_counter < 3) begin
      word_counter = word_counter + 1'b1;
    end
    else begin
      word_counter = 1'b0;
      line_counter = line_counter + 16'b1;
    end
    // read sprite definition bytes
    addr = (sprite_addr[sprite_counter] + ((word_counter +
           (line_counter << 2)) << 1) ) >> 1;    
  end
  else 
  begin
    if (sprite_counter < SPRITE_NUM) begin
      sprite_counter = sprite_counter + 1'b1;
      state <= READ_SPRITES;
      rd <= 1'b1;
      wr <= 1'b0;
      mem_read <= 1'b1;
      // read next sprite definition address
      addr <= (16'd56 + (sprite_counter << 3)) >> 1;   
    end
    else begin
      sprite_counter <= 4'b0;
      rd <= 1'b1;
      wr <= 1'b0;
      mem_read <= 1'b1;
      addr <= VIDEO_MEM_ADDR + 0;
      state <= V_BLANK;
    end
  end
end

In the READ_SPRITE_DATA state we start reading sprite definition from the memory. We do it for each line of the sprite (16 lines per sprite), and within the line, for each word containing four pixels of the sprite line definition.

When we finish loading all sprite definition data for the current sprite, then we do the same for other sprite until we read all sprite definition data. Then we then set the address bus to load the pixel data at the (0, 0) position on the screen, and move to the V_BLANK state:

V_BLANK: begin
  pixels <= data;
  state <= SCAN_IDLE;
  rd <= 1'bz;
  wr <= 1'bz;
  mem_read <= 1'b0;
end

In the V_BLANK state we read the pixels of the frame buffer at the (0, 0) coordinate, and then set the all the control signals to high impedance and set the state to SCAN_IDLE. We will leave the SCAN_IDLE state when the the time comes to start displaying pixels starting from the (0, 0) coordinate.

Displaying sprite data

During the scanline processing, we need to display both original pixels from the frame buffer as well as the sprite data, and we need to make sure that the original pixels must be displayed through the transparent sprite color.

This is done in the following code:

if (valid) begin
  for (i = 0; i < SPRITE_NUM; i = i+1) begin
    if ((sprite_addr[i] != 16'b0) &&
       (xx >= sprite_x[i]) &&
       (xx < (sprite_x[i] + 16)) &&
       (yy >= sprite_y[i]) &&
       (yy < (sprite_y[i] + 16))) begin

      sprite_found = 1'b1;
      if (
        sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 0] != sprite_transparent_color[i][0] ||
        sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 1] != sprite_transparent_color[i][1] ||
        sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 2] != sprite_transparent_color[i][2]
      ) begin
        r <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 0] == 1'b1;
        g <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 1] == 1'b1;
        b <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 2] == 1'b1;
      end 
      else begin
        r <= pixels[12 - ((xx & 3) << 2) + 0] == 1'b1;
        g <= pixels[12 - ((xx & 3) << 2) + 1] == 1'b1;
        b <= pixels[12 - ((xx & 3) << 2) + 2] == 1'b1;
      end
    end 
  end
  if (!sprite_found) begin
    r <= pixels[12 - ((xx & 3) << 2) + 0] == 1'b1;
    g <= pixels[12 - ((xx & 3) << 2) + 1] == 1'b1;
    b <= pixels[12 - ((xx & 3) << 2) + 2] == 1'b1;
  end
  else begin
    sprite_found = 1'b0;
  end
end
else begin
  // blanking -> no pixels
  r <= 1'b0;
  g <= 1'b0;
  b <= 1'b0;
end
end

The most interesting thing is the "for loop". It is not a loop - it actually repeats the Verilog code SPRITE_NUM times. That is the most important thing to understand about "loops". You don't have the linear code to be executed multiple times. Instead, everything is a giant state machine that pulses with the clock signals and the "for loop" just unwraps the code multiple times, and all that unwrapped code "works" at the same time.

So, when we have this Verilog code:
 for (i = 0; i < SPRITE_NUM; i = i+1) begin
    if ((sprite_addr[i] != 16'b0) &&
       (xx >= sprite_x[i]) &&
       (xx < (sprite_x[i] + 16)) &&
       (yy >= sprite_y[i]) &&
       (yy < (sprite_y[i] + 16))) begin

It actually does this:
    if ((sprite_addr[0] != 16'b0) &&
       (xx >= sprite_x[0]) &&
       (xx < (sprite_x[0] + 16)) &&
       (yy >= sprite_y[0]) &&
       (yy < (sprite_y[0] + 16))) begin

...
    end
    if ((sprite_addr[1] != 16'b0) &&
       (xx >= sprite_x[1]) && 
       (xx < (sprite_x[1] + 16)) && 
       (yy >= sprite_y[1]) && 
       (yy < (sprite_y[1] + 16))) begin
...
    end
...


The code with the "for loop" does the same thing for all sprites:
  1. if the spite definition address is not zero, and current x and y coordinates of the scanline are within sprite coordinates, then we put the current sprite pixel color to the output r, g and b signals, or we put the original frame buffer pixel colors, if the current sprite pixel is transparent one (the color of the current sprite pixel is the transparent color).
  2. else, if the current x and y coordinates of the scanline are outside of the sprite coordinates, we put the frame buffer pixel data to the r, g and b output signals.
  3. else, it must be blanking interval, so put zeros to r, g and b to output signals.

Conclusion

This implementation of sprites requires that the vga module has its own internal memory which is filled with the sprite data from the main memory. Then, during the scanline processing, sprite pixels are combined with frame buffer pixels in a way that sprite pixels are placed "over" the frame buffer pixels, unless the current sprite pixel is the transparent one. If that is the case, then the frame buffer pixel is "shown" through the sprite.

The great thing about hardware sprites is that they do not consume processor time at all. Everything is done in hardware and showing sprites actually requires just to have the sprite definition address set to non-zero value.