dimanche 23 février 2020

Microcode Optimization: IMEM (Part3)

G_TEXTURE

This command simply set the parameters related to textures (texture off/on, tile, level and scale).
The code related to the immediate is the following:

0x2A0    SW    T9, 0x0010(SP)
0x2A4    SW    T8, 0x0014(SP)
0x2A8    LH    V0, 0x0006(SP)
0x2AC    ANDI    V0, V0, 0xFFFD
0x2B0    ANDI    V1, T9, 0x0001
0x2B4    SLL    V1, V1, 0x1
0x2B8    OR    V0, V0, V1
0x2BC    J    0x0A8
0x2C0    SH    V0, 0x0006(SP)

Command example:

0xBB000101 (loaded in T9)
0xFFFFFFFF (loaded in T8)

Let’s analyze quickly the code:

0x2A0    SW    T9, 0x0010(SP)
0x2A4    SW    T8, 0x0014(SP)

The two words of the command is stored in a specific place in DMEM.
0x2A8    LH    V0, 0x0006(SP)

We load the lower bytes of the geometry mode in register V0. Here the related geometry flags as per gbi.h:

#define G_ZBUFFER            0x00000001
#define G_SHADE            0x00000004   
# define G_TEXTURE_ENABLE        0x00000002     /* Microcode use only */
# define G_SHADING_SMOOTH    0x00000200   
# define G_CULL_FRONT        0x00001000
# define G_CULL_BACK        0x00002000
# define G_CULL_BOTH        0x00003000   

0x2AC    ANDI    V0, V0, 0xFFFD

The code simply “clears” G_TEXTURE_ENABLE” flag (it simply becomes 0) potential set in the last byte of register V0.

0x2B0    ANDI    V1, T9, 0x0001

The code takes very last byte of the first word of the command in register V1.

0x2B4    SLL    V1, V1, 0x1

V1 is multiplied by 2.

0x2B8    OR    V0, V0, V1
       
Register V0, containing the cleared lower bytes of the geometry mode, is ORed by V1, containing the last byte of the 1st command multiply by 2.

0x2BC    J    0x0A8
0x2C0    SH    V0, 0x0006(SP)

Before exiting the command, the lower bytes of the geometry mode is stored back in DMEM.
Now some may say: what would be the underlying reasons to have the texture flag set in the last byte of the geometry mode?

Technically speaking when 3 transformed vertex are turned into an actual triangle RDP commands, this byte is used to construct the command header of such a command.

Let’s check out gbi.h:

#define G_TRI_FILL            0xc8 /* fill triangle:           
#define G_TRI_SHADE        0xcc /* shade triangle:          
#define G_TRI_TXTR            0xca /* texture triangle:        
#define G_TRI_SHADE_TXTR        0xce /* shade, texture triangle: 
#define G_TRI_FILL_ZBUFF        0xc9 /* fill, zbuff triangle:    
#define G_TRI_SHADE_ZBUFF        0xcd /* shade, zbuff triangle:   
#define G_TRI_TXTR_ZBUFF        0xcb /* texture, zbuff triangle: 
#define G_TRI_SHADE_TXTR_ZBUFF    0xcf /* shade, txtr, zbuff trngl:

For instance OR 0x02 (G_TEXTURE_ENABLE)  by 0xCC (G_TRI_SHADE) you get 0xCE, which is G_TRI_SHADE_TXTR, the textured version of  G_TRI_SHADE.

Now why not simply using the geometry mode flag? I would widely guess to for consistency purpose the intention was to keep texture parameters in the G_TEXTURE command. Nevertheless technically it does not make any sense! So let’s use as from now on the geometry mode flag for enabling/disabling textures. It would mean a tiny change for programmers.

Doing so would lead to all the below code to be useless:

0x2A8    LH    V0, 0x0006(SP)
0x2AC    ANDI    V0, V0, 0xFFFD
0x2B0    ANDI    V1, T9, 0x0001
0x2B4    SLL    V1, V1, 0x1
0x2B8    OR    V0, V0, V1
0x2BC    J    0x0A8
0x2C0    SH    V0, 0x0006(SP)

What remains would be:

0x2A0    SW    T9, 0x0010(SP)
0x2A4    SW    T8, 0x0014(SP)

It is simply storing two words in DMEM. We do have already a command to do so, G_MOVEWORD.

We simply have to have gSPTexture macro sending to two G_MOVEWORD commands. It does mean of course to create a new moveword indice, G_MW_TEXTURE.

#define G_MW_TEXTURE    0x120

#define gSPTexture(pkt, s, t, level, tile, on)   
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16);       
                                           
    _g->words.w1 = _SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8);       
};                                               
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16);   
                                               
    _g->words.w1 = _SHIFTL((s),16,16) | _SHIFTL((t),0,16);                           
};                                               
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_CLEARGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16);   

    _g->words.w1 = (unsigned int)(0xFFFFFFFD);   
};                                               
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_SETGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16);   

    _g->words.w1 = (unsigned int)((on)<<1);       
};

#define gsSPTexture(s, t, level, tile, on)   
{{                                           
    (_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16)),                   
    (_SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8))   
}},                                           
{{                                           
    (_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16)),                   
    (_SHIFTL((s),16,16) | _SHIFTL((t),0,16))                                                   
}},                                           
{{                                           
    (_SHIFTL(G_CLEARGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16)),               
    (unsigned int)(0xFFFFFFFD)               
}},                                           
{{                                           
    (_SHIFTL(G_SETGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16)),                   
    (unsigned int)((on)<<1)                   
}}

What does that mean? The complete G_TEXTURE is useless and can be scrapped, meaning that we get rid of 9 RSP instructions.

Finally we can create separate macros to update separately the 1st and the 2nd word of the gSPTexture.

#define gSPSetTextureTile(pkt, level, tile)       
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16);                                                           
    _g->words.w1 = _SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8);
 }   

#define gSPSetTextureScale(pkt, s, t)           
{                                               
    Gfx *_g = (Gfx *)(pkt);                       
                                               
    _g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16);   
                                               
    _g->words.w1 = _SHIFTL((s),16,16) | _SHIFTL((t),0,16);                           
}   

#define gsSPSetTextureTile(level, tile, on)       
{{                                               
    (_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16)),                   
    (_SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8))   
}}

#define gsSPSetTextureScale(s, t)               
{{                                               
    (_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16)),                   
    (_SHIFTL((s),16,16) | _SHIFTL((t),0,16))                               
}}   

Finally we can notice that storing the tile and the level of the texture requires only a mere byte. When we will start optimizing the DMEM, this point will have to be checked out.

RDRAM  <-> DMEM

The next step of our IMEM optimization concerns way the code moves data from RDRAM to DMEM or from DMEM to RDRAM. In order to do so, the code set a flag in a mere RSP register to inform in which the direction the data is moved from/to.

Let’s see a little bit how it works:

0x148    MTC0    S4, SP memory address
0x14C   BGTZ     S1, 0x015C
0x150   MTC0    S3, SP DRAM DMA address
0x154   JR            RA
0x158   MTC0    S2, SP read DMA length
0x15C   JR            RA
0x160   MTC0    S2, SP write DMA length

The above code is called as a subroutine in various part of the code.

Register S4 becomes in COP0 the DMEM address from which/where to the data are to be retrieved.
Register S3 becomes in COP0 the RDRAM address from which/where to the data are to be retrieved
Register S2 becomes in COP0, depending on whether S1 is or not greater than 0, the length from which is read/write from/to S4 the data.

Some may say it is an efficient code. Not really…

The issue is that you have to set register S1, depending on the direction of the data, as 0 or 1. Doing so requires an instruction before calling the subroutine so it is not really an efficient solution as it is of course possible to set the direction immediately after the return from the subroutine.

The code could be changed simply to:

0x148    MTC0    S4, SP memory address
0x14C   JR            RA
0x150   MTC0    S3, SP DRAM DMA address

On top on saving 4 instructions and free a register, it prepares the ground for some deeper future changes in the code.

And voila!

We will next time start working on the matrix related immediate commands :)
 

1 commentaire:

  1. Hard Rock Hotel & Casino - Mapyro
    나주 출장샵 2021/04/14 › hard-rock › 2021/04/14 경상북도 출장샵 › hard-rock Hard Rock Hotel & Casino. 공주 출장샵 Casino in 김제 출장안마 Rockford, MI. Find map, reviews and information for Hard 동해 출장샵 Rock Hotel & Casino in Rockford, MI.

    RépondreSupprimer