Adding hardware sprites
This is a follow-up of the FPGA computer post.
I have added hardware
sprites to the graphic mode of my FPGA computer. It now supports up to 16 sprites, each one being
16x16pixels in size. Here is how it looks on the monitor:
In emulator, it looks the
same:
Each sprite is defined by the 8-byte
structure:
- sprite definition data address (2 bytes)
- x coordinate (2 bytes)
- y coordinate (2 bytes)
- transparent color (2 bytes).
The sprite structure for the first sprite starts at address of 56 decimal. Each next sprite structure
starts 8 bytes later.
Sprite definition data consists
of 16 lines, each line described by 16 pixels, each pixel defined by 4 bits:
xrgb. This means that one sprite line consists
of 8 bytes (two pixels per byte), so total bytes needed for the sprite definition is 8x16 bytes == 128
bytes.
mov r0, sprite_def
mov r1, 56
st [r1], r0 ; sprite definition is
at sprite_def address
mov r0, 25
st [r1 + 2], r0 ; x = 25 at addr
58
mov r0, 25
st [r1 + 4], r0 ; y = 25 at addr
60
mov r0, 0
st [r1 + 6], r0 ; transparent color
is black (0) at addr 62
; sprite definition
sprite_def:
#d16 0x0000, 0x0000, 0x0000, 0x0000
; 0
#d16 0x0000, 0x000f, 0xf000, 0x0000
; 1
#d16 0x0000, 0x000f, 0xf000, 0x0000
; 2
#d16 0x0000, 0x000f, 0xf000, 0x0000
; 3
#d16 0x0000, 0x004f, 0xf400, 0x0000
; 4
#d16 0x0000, 0x004f, 0xf400, 0x0000
; 5
#d16 0x0000, 0x044f, 0xf440, 0x0000
; 6
#d16 0x0000, 0x444f, 0xf444, 0x0000
; 7
#d16 0x0004, 0x444f, 0xf444, 0x4000
; 8
#d16 0x0044, 0x444f, 0xf444, 0x4400
; 9
#d16 0x0400, 0x004f, 0xf400, 0x0040
; 10
#d16 0x0000, 0x004f, 0xf400, 0x0000
; 11
#d16 0x0000, 0x004f, 0xf400, 0x0000
; 12
#d16 0x0000, 0x041f, 0xf140, 0x0000
; 13
#d16 0x0000, 0x4111, 0x1114, 0x0000
; 14
#d16 0x0004, 0x4444, 0x4444, 0x4000
; 15
How this stuff works? First of all, I had to decide how to implement sprites.
I have decided to fetch all sprite data during the vertical blanking interval (VBI). During VBI, the video
subsystem starts fetching sprite data by reading the 8-byte sprite structure starting from the address of 56
decimal (the address and data bus are 16-bit, so the computer is word-oriented (reads two bytes at the same
time), and the actual address is set to
56 >> 1 == 28):
if ((x >= 640) && (y == 479) && (state == IN_LINE))
begin
// when we start the vertical blanking,
// we need to fetch in advance the first sprite data
state <= READ_SPRITES;
sprite_counter <= 4'b0;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
addr <= 16'd28; // prepare to read sprite definition
address
end
In the next clock cycle, the system is in the READ_SPRITES state. The first
thing that we do in the READ_SPRITES state is fetching the sprite definition address which is present at the
data bus, since we have initiated a memory read from within the previous state.
Then we need to prepare the address bus for the next state in which we will
fetch the x coordinate of the sprite. We do that by setting the address bus to (58 + (sprite_counter
<< 3)) for all sprites, having the sprite_counter iterating from 0 to 15:
READ_SPRITES: begin
sprite_addr[sprite_counter] <= data;
state <= READ_SPRITE_X;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
// prepare to read x coordinate of the sprite
addr <= (16'd58 + (sprite_counter << 3)) >> 1;
end
In the READ_SPRITE_X state, we fetch the x coordinate of the sprite which was
ready at the data bus, and then we prepare to read the y coordinate in the next state:
READ_SPRITE_Y: begin
sprite_y[sprite_counter] <= data;
state <= READ_SPRITE_TRANSPARENT_COLOR;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
// prepare to read transparent color of the sprite
addr <= (16'd62 + (sprite_counter << 3)) >> 1;
end
In the READ_SPRITE_Y state, we fetch the y coordinate of the sprite which was
ready at the data bus, and then we prepare to read the sprite transparent color in the next state:
READ_SPRITE_TRANSPARENT_COLOR: begin
sprite_transparent_color[sprite_counter] <= data[3:0];
state <= READ_SPRITE_DATA;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
line_counter <= 16'b0;
word_counter <= 4'b0;
// read sprite definition bytes
addr <= sprite_addr[sprite_counter] >> 1;
end
In the READ_SPRITE_TRANSPARENT_COLOR state, we fetch the transparent color of
the sprite, and then put the address of the sprite definition to the address bus so we can fetch it in the
next state:
READ_SPRITE_DATA: begin
if (line_counter < 16) begin
case (word_counter)
0: sprite_pixels[sprite_counter][line_counter][63:48] <=
data;
1: sprite_pixels[sprite_counter][line_counter][47:32] <=
data;
2: sprite_pixels[sprite_counter][line_counter][31:16] <=
data;
3: sprite_pixels[sprite_counter][line_counter][15:0] <=
data;
endcase
state <= READ_SPRITE_DATA;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
if (word_counter < 3) begin
word_counter = word_counter + 1'b1;
end
else begin
word_counter = 1'b0;
line_counter = line_counter + 16'b1;
end
// read sprite definition bytes
addr = (sprite_addr[sprite_counter] + ((word_counter +
(line_counter << 2)) << 1) ) >> 1;
end
else
begin
if (sprite_counter < SPRITE_NUM) begin
sprite_counter = sprite_counter + 1'b1;
state <= READ_SPRITES;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
// read next sprite definition address
addr <= (16'd56 + (sprite_counter << 3)) >> 1;
end
else begin
sprite_counter <= 4'b0;
rd <= 1'b1;
wr <= 1'b0;
mem_read <= 1'b1;
addr <= VIDEO_MEM_ADDR + 0;
state <= V_BLANK;
end
end
end
In the READ_SPRITE_DATA state we start reading sprite definition from the
memory. We do it for each line of the sprite (16 lines per sprite), and within the line, for each word
containing four pixels of the sprite line definition.
When we finish loading all sprite definition
data for the current sprite, then we do the same for other sprite until we read all sprite definition data.
Then we then set the address bus to load the pixel data at the (0, 0) position on the screen, and move to
the V_BLANK state:
V_BLANK: begin pixels <= data; state
<= SCAN_IDLE; rd <= 1'bz; wr <= 1'bz;
mem_read <= 1'b0;end
In the V_BLANK state we read the pixels of the frame buffer at the (0, 0) coordinate, and
then set the all the control signals to high impedance and set the state to SCAN_IDLE. We will leave the
SCAN_IDLE state when the the time comes to start displaying pixels starting from the (0, 0) coordinate.
Displaying sprite data
During the scanline processing, we need to display both original pixels from
the frame buffer as well as the sprite data, and we need to make sure that the original pixels must be
displayed through the transparent sprite color.
This is done in the following code:
if
(valid) begin for (i = 0; i < SPRITE_NUM; i = i+1) begin if
((sprite_addr[i] != 16'b0) &&
(xx >= sprite_x[i]) &&
(xx <
(sprite_x[i] + 16)) &&
(yy >= sprite_y[i]) &&
(yy <
(sprite_y[i] + 16))) begin sprite_found = 1'b1; if
( sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 0]
!= sprite_transparent_color[i][0] || sprite_pixels[i][yy - sprite_y[i]][60-(((xx
- sprite_x[i]) << 2) ) + 1] != sprite_transparent_color[i][1] ||
sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 2] !=
sprite_transparent_color[i][2] ) begin r <=
sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 0] == 1'b1;
g <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 1] ==
1'b1; b <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i])
<< 2) ) + 2] == 1'b1; end else
begin r <= pixels[12 - ((xx & 3) << 2) + 0] ==
1'b1; g <= pixels[12 - ((xx & 3) << 2) + 1] ==
1'b1; b <= pixels[12 - ((xx & 3) << 2) + 2] ==
1'b1; end end end if
(!sprite_found) begin r <= pixels[12 - ((xx & 3) << 2) + 0] ==
1'b1; g <= pixels[12 - ((xx & 3) << 2) + 1] == 1'b1;
b <= pixels[12 - ((xx & 3) << 2) + 2] == 1'b1; end
else begin sprite_found = 1'b0;
endendelse begin // blanking -> no
pixels r <= 1'b0; g <= 1'b0; b <=
1'b0;endend
The most interesting thing is the "for loop". It is not a loop - it actually repeats the Verilog code
SPRITE_NUM times. That is the most important thing to understand about "loops". You don't have the
linear code to be executed multiple times. Instead, everything is a giant state machine that pulses with
the clock signals and the "for loop" just unwraps the code multiple times, and all that unwrapped code
"works" at the same time.
So, when we have this Verilog code:
for (i = 0; i <
SPRITE_NUM; i = i+1) begin
if ((sprite_addr[i] != 16'b0) &&
(xx >= sprite_x[i]) &&
(xx < (sprite_x[i] + 16)) &&
(yy
>= sprite_y[i]) &&
(yy < (sprite_y[i] + 16))) begin
It actually does this:
if ((sprite_addr[0] != 16'b0) &&
(xx >=
sprite_x[0]) &&
(xx < (sprite_x[0] + 16)) &&
(yy >=
sprite_y[0]) &&
(yy < (sprite_y[0] + 16)))
begin... end if ((sprite_addr[1] != 16'b0)
&& (xx >= sprite_x[1]) &&
(xx < (sprite_x[1] + 16)) && (yy >=
sprite_y[1]) && (yy < (sprite_y[1] + 16)))
begin... end...
The code with the "for loop" does the same thing for all sprites:
- if the spite definition address is not zero, and current x and y coordinates of the scanline are
within sprite coordinates, then we put the current sprite pixel color to the output r, g and b
signals, or we put the original frame buffer pixel colors, if the current sprite pixel is
transparent one (the color of the current sprite pixel is the transparent color).
- else, if the current x and y coordinates of the scanline are outside of the sprite coordinates,
we put the frame buffer pixel data to the r, g and b output signals.
- else, it must be blanking interval, so put zeros to r, g and b to output signals.
Conclusion
This implementation of sprites requires that the vga module has its own internal memory which is filled
with the sprite data from the main memory. Then, during the scanline processing, sprite pixels are
combined with frame buffer pixels in a way that sprite pixels are placed "over" the frame buffer pixels,
unless the current sprite pixel is the transparent one. If that is the case, then the frame buffer pixel
is "shown" through the sprite.
The great thing about hardware sprites is that they do not consume processor time at all. Everything is
done in hardware and showing sprites actually requires just to have the sprite definition address set to
non-zero value.