samedi 8 février 2020

Microcode Optimization: IMEM (Part2)

G_MOVEWORD

This immediate command simply stores a word in a specific DMEM address, which will be use as parameters for other commands. For instance such a command may be used to store the address of a segment, the number of lights, etc.


Hereby the original code managing such a command:

0x288    LBU    AT, 0xFFFB(K1)
0x28C    LHU    V0, 0xFFF9(K1)
0x290    LH    A1, 0x030E(AT)
0x294    ADD    A1, A1, V0
0x298    J    0x0A8
0x29C    SW    T8, 0x0000(A1)

0xBC000406 (loaded in register T9)
0x002D4CD0 (loaded in register T8)

It takes only 6 instructions, which is relatively reasonable. Let’s then go through the code step by step:

0x288    LBU    AT, 0xFFFB(K1)
0x28C    LHU    V0, 0xFFF9(K1)

The first instruction loads a byte in register AT and the second a half word in register V0 from the current command, meaning in our example AT = 0x06 and V0 = 0x0004 (half word).

0x290    LH    A1, 0x030E(AT)

The code then load from DMEM a half word in the register A1 which will be actually another DMEM address where the base address of the word to be stored.

0x294    ADD    A1, A1, V0

The offset AT is added to the base address A1.

0x29C    SW    T8, 0x0000(A1)

Finally the word of the command is stored at the proper DMEM address. In this respect it must be noticed that only the lower12 bits (due to the size of the memory) are considered by the SW instruction.

Let’s try to reduce the code necessary to perform this mere action. In this respect, the exact same trick used for G_TRI1 can be used actually, meaning that the adequate DMEM address could inserted directly in the command.

The moveword indices (AT) and offset (V0) are included in gbi.h, yet not the base DMEM address. Consequently it is necessary to list such DMEM address in order to include them as well. In order to do so, it is necessary to determine them by checking out the “mapping” of the DMEM memory for Fast3D in a microcode source file called GDMEM.H.

Having dressed all the base address related to G_MOVEWORD, we can simply add them to gbi.h and adapt the macros used such an immediate command (as for instance gSPSegment).

/* MOVEWORD indices */

#define G_MW_MATRIX        0x3E0   
#define G_MW_NUMLIGHT        0x12C
#define G_MW_CLIP            0x074
#define G_MW_SEGMENT        0x160
#define G_MW_FOG            0x330
#define G_MW_LIGHTCOL        0x1F0
#define    G_MW_PERSPNORM        0x110


#define gSPSegment(pkt, segment, base)       
{                                           
    Gfx *_g = (Gfx *)(pkt);                   
                                           
    _g->words.w0 = _SHIFTL(G_MOVEWORD , 24, 8) | _SHIFTL((G_MW_SEGMENT+((segment)*4)), 0, 16);

    _g->words.w1 = (unsigned int)(base);       
}

#define gsSPSegment(segment, base)       
{{                                       
    (_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_SEGMENT+((segment)*4), 0, 16)),       
    (unsigned int)(base)                   
}}


In this example we have of course to multiply by 4 the segment number to align the DMEM address to a word :)

From there the command could structure as follow:

0xBC000AAA
0xBBBBBBBB

AAA is the DMEM address where the word should be stored and BBBBBBBB is the word to be stored.

The command can be reduced to only two simple instructions!!!! :)

J    0x0A8
SW    T8, 0x0000 (T9)

AAA being 12 bits, the higher bits of T9 are ignored.

And voila! We reduce the size of the code from 6 instructions to 2 instructions.

G_SETGEOMETRYRMODE/ G_CLEARGEOMETRYMODE

The code of G_SETGEOMETRYRMODE is the following:

0x348    LW    V0, 0x0004(SP)
0x34C    OR    V0, V0, T8
0x350    J    0x0A8
0x354    SW    V0, 0x0004(SP)

Here an example of a command:

0xB7000000 (loaded in T9)
0x00020000 (loaded in T8)

The code of G_CLEARGEOMETRYMODE is the following:

0x358    LW    V0, 0x0004(SP)
0x35C    ADDI    V1, R0, 0xFFFF
0x360    XOR    V1, V1, T8
0x364    AND    V0, V0, V1
0x368    J    0x0A8
0x36C    SW    V0, 0x0004(SP)

Here an example of a command:

0xB6000000 (loaded in T9)
0x001F3204 (loaded in T8)

Both commands together take 10 instructions. Let’s go quickly through the code of those commands.

G_SETGEOMETRYRMODE

0x348    LW    V0, 0x0004(SP)

The code loads from a specific place in DMEM the current graphic geometry mode (i.e. fog, lighting, etc.). It must be noticed that the SP register holds constantly a base DMEM address.

0x34C    OR    V0, V0, T8
0x350    J    0x0A8
0x354    SW    V0, 0x0004(SP)

The current geometry mode is then ORed by the bits set in the command before being stored back.

G_CLEARGEOMETRYMODE

0x35C    ADDI    V1, R0, 0xFFFF
0x360    XOR    V1, V1, T8
0x364    AND    V0, V0, V1
0x368    J    0x0A8
0x36C    SW    V0, 0x0004(SP)

After loading the current graphic geometry mode, the code will do something XOR the bits to be cleared contained in the second word of the command (0x loaded in T8) by 0xFFFFFFFF. So actually the code applies a NOR, which exists in the set of instructions for RSP!!!  Fun fact is that this “mistake” was noticed and in F3DEX the code uses as it should be NOR

But even with the NOR, it should be possible to do much better. I arrived to the following solution:

0x348    LW    V0, 0x004(SP)
0x34C    J    0x35C
0x350    AND     T8, V0, T8
0x354    LW    V0, 0x004(SP)
0x358    OR    T8, V0, T8
0x35C    J    0x0A8
0x360    SW    T8, 0x000(T9)

It looks like it takes now 7 instructions, which is already 3 instructions less. Now you may notice something interesting. The two last instructions are the one of the G_MOVEWORD, meaning that the actual implementation takes only 5 instructions!

The structure of the command would be as follow:

0xCC000DDD
0xAAAAAAAA

CC would be the command header.

DDD would be the DMEM address where the current geometry mode is stored.

AAAAAAAA would be the bits to be set or cleared by the command.   

How would all of this would then works?

G_CLEARGEOMETRYMODE

0x348    LW    V0, 0x004(SP)
0x34C    J    0x35C
0x350    AND     V0, V0, V1

It is not unnecessary to NOR the bits by the microcode as it can be achieved in gbi.h. By the way the DMEM address of the geometry mode has also to be included in the first word of the command in gbi.h

Then the code will then load the current graphic geometry mode, AND the already NORed bits to be cleared in register T8 and jump to 0x35C, which is the G_MOVEWORD code. And as the DMEM address of the geometry mode has been included in the first of the command.

G_SETGEOMETRYRMODE

0x354    LW    V0, 0x004(SP)
0x358    OR    T8, V0, T8
0x35C    J    0x0A8
0x360    SW    T8, 0x000(T9)

The current graphic geometry mode is loaded, then OReds by the bits to be set and will then naturally continue into the instructions of G_MOVEWORD.

Now of course we can create a macro to emulate gSPGeometryMode from F3DEX2 by sending both G_CLEARGEOMETRYMODE and G_SETGEOMETRYRMODE consecutively:

#define gSPGeometryMode(pkt, c, s)              
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_CLEARGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16);           
    _g->words.w1 = (~((unsigned int)(c)));       
};                                               
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_SETGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16);           
    _g->words.w1 = (unsigned int)(word);       
};

#define    gsSPGeometryMode(c, s)              
{{                                               
    (_SHIFTL(G_CLEARGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16)), (~((unsigned int)(c)))   
}},                                               
{{                                               
    (_SHIFTL(G_SETGEOMETRYMODE, 24, 8)| _SHIFTL(G_MW_GEOMODE, 0, 16)), (unsigned int)(s)           
}}


Bonus:

Let’s also implement the macro gSPModifyVertex and thus more efficiently than F3DEX, without a single RSP instruction or a dedicated immediate command as Nintendo did.

What this macro does? It simply replace a word in a selected transformed vertex data, which is composed of 40 bytes as already explained in my previous article.

We have already the code to replace a word. We just need to get the required DMEM address where the word has to be stored. This can be easily compute in gbi.h, as below:

# define gSPModifyVertex(pkt, vtx, where, val)   
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
    _g->words.w0 = (_SHIFTL(G_MOVEWORD,24,8)| _SHIFTL(((INPUTBUFFER) + ((vtx)*40) + (where)),0,16));   
    _g->words.w1 = (unsigned int)(val);           
}
# define gsSPModifyVertex(vtx, where, val)       
{{                                               
    _SHIFTL(G_MOVEWORD,24,8)| _SHIFTL(((INPUTBUFFER) + ((vtx)*40) + (where)),0,16),   
    (unsigned int)(val)                           
}}


And voila!

We will next time continue by optimizing further immediate commands. Stay tuned! :)

3 commentaires:

  1. Love these articles! Can't wait to see more. What is your end goal with creating an optimized microcode? Would it benefit things like ROM hacks, or new N64 home brew? Would it be possible to make it a drop in replacement in *all* games to help keep their FPS up?

    RépondreSupprimer
    Réponses
    1. That's an interesting question: Making real games to use those ucode so they draw things faster on original hardware.

      Supprimer
  2. It cannot help for ROM hacks. Upon release, N64 homebrews could use this microcode.

    RépondreSupprimer