Floating-point implementation in GCC for my FPGA computer

This is a followup of my original post.

My FPGA computer supports floating-point instructions. GCC port doesn't. If we try to make the following code:

float d = 0.5;
float e = 0.2;
float f;

f = d - e;

The compiler will generate the following assembly code:

# small.c:10:   float d = 0.5;
mov.w   r0, 1056964608  # tmp43,
st.w    [r13 + (-12)], r0   # d, tmp43
# small.c:11:   float e = 0.2;
mov.w   r0, 1045220557  # tmp44,
st.w    [r13 + (-16)], r0   # e, tmp44
# small.c:14:   f = d - e;
ld.w    r0, [r13 + (-16)]   # tmp45, e
st.w    [sp + (4)], r0  #, tmp45
ld.w    r0, [r13 + (-12)]   # tmp46, d
st.w    [sp], r0    #, tmp46
call    __subsf3        #
mov.w   r1, r0  # tmp47,
mov.w   r0, r1  # tmp48, tmp47
st.w    [r13 + (-20)], r0   # f, tmp48

We can see that when the C code has floating-point subtraction, the generated assembly code places two operands on the stack and then calls the __subsf3 function. What is __subsf3? It is a software implementation of the floating point subtraction operation. You can find the list of all soft-float functions here:

https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

You can see that there are not only the arithmetic functions. There are also conversion functions, comparison functions, and other functions. For example, if we write this in C:

printf("%f\n", f);

we will get this:

ld.w   r0, [r13 + (-20)]   # tmp51, f
st.w    [sp], r0    #, tmp51
call    __extendsfdf2       #
...

The __extendsfdf2 function extends float to double, in order to pass it as an argument to the printf function. It seems that functions with variable number of arguments always receive double arguments, if floating point argument was expected. When I tried to get one of the arguments for printf as float, compiler complained.

Next, if we multiply float with int like this:

int x = 5;
f = d * x;

we will get this:

ld.w   r0, [r13 + (-8)]    # tmp61, x
st.w    [sp], r0    #, tmp61
call    __floatsisf     #
mov.w   r1, r0  # _3,
# small.c:20:   f = d * x; // 0.5 * 5 -> 2.5
mov.w   r0, r1  # tmp62, _3
st.w    [sp + (4)], r0  #, tmp62
ld.w    r0, [r13 + (-12)]   # tmp63, d
st.w    [sp], r0    #, tmp63
call    __mulsf3        #
mov.w   r1, r0  # tmp64,
mov.w   r0, r1  # tmp65, tmp64
st.w    [r13 + (-20)], r0   # f, tmp65

The __floatsisf function converts int to float in order to multiply d (float value) with the x (the int value). The result of the conversion is stored as a temporary variable tmp62 and then floating-point multiplication of d and tmp62 is done (__mulsf3 function) and the result is stored in the f variable.

Writing my own soft-float implementation

Now, the obvious step was to write the implementation of all the soft-float functions that compiler calls. Arithmetic functions were by far the most simple ones, since my FPGA computer has built-in floating point instructions (fadd, fsub, fmul, fdiv). For example, here is the _subsf3 function (floating point subtraction):

float __subsf3(float a, float b)
{
    asm(
        "ld.w r0, [r13 + (8)]\n \
        push r1\n \
        ld.w r1, [r13 + (12)]\n \
        fsub r0, r1\n \
        pop r1\n"
    );
}

As you can see, all I had to do was to get the arguments and then to call the built-in fsub r0, r1 instruction which does floating point subtraction.

However, the conversion and comparison functions were not that simple, since my FPGA computer doesn't have built-in instructions for those operations. For example, conversion from float to int goes like this:

int __fixsfsi(float a)
{
    union { fp_t f; rep_t u; } fb;
    fb.f = a;

    int e = ((fb.u & 0x7F800000) >> 23) - 127;
    if (e < 0)
        return 0;

    rep_t r = (fb.u & 0x007FFFFF) | 0x00800000;
    if (e > 23)
        r <<= (e - 23);
    else
        r >>= (23 - e);
    if (fb.u & 0x80000000)
        return -r;
    return r;
}

And, in the other direction (from int to float):

fp_t __floatsisf(int a)
{
    const int aWidth = sizeof a * 8;

    // Handle zero as a special case to protect clz
    if (a == 0)
        return fromRep(0);

    // All other cases begin by extracting the sign and absolute value of a
    rep_t sign = 0;
    if (a < 0) {
        sign = signBit;
        a = -a;
    }

    // Exponent of (fp_t)a is the width of abs(a).
    const int exponent = (aWidth - 1) - __clzsi2(a);
    rep_t result;

    // Shift a into the significand field, rounding if it is a right-shift
    if (exponent <= significandBits) {
        const int shift = significandBits - exponent;
        result = (rep_t)a << shift ^ implicitBit;
    } else {
        const int shift = exponent - significandBits;
        result = (rep_t)a >> shift ^ implicitBit;
        rep_t round = (rep_t)a << (typeWidth - shift);
        if (round > signBit) result++;
        if (round == signBit) result += result & 1;
    }

    // Insert the exponent
    result += (rep_t)(exponent + exponentBias) << significandBits;
    // Insert the sign bit and return
    return fromRep(result | sign);
}

Conclusion

As you can see, my modified GCC compiler does not generate native assembly code for floating-point operations. Instead, it generates soft-float function calls which I had to write on my own (or to borrow from the Internet).

Comments

Comments powered by Disqus