This command simply set the parameters related to textures (texture off/on, tile, level and scale).
The code related to the immediate is the following:
0x2A0 SW T9, 0x0010(SP)
0x2A4 SW T8, 0x0014(SP)
0x2A8 LH V0, 0x0006(SP)
0x2AC ANDI V0, V0, 0xFFFD
0x2B0 ANDI V1, T9, 0x0001
0x2B4 SLL V1, V1, 0x1
0x2B8 OR V0, V0, V1
0x2BC J 0x0A8
0x2C0 SH V0, 0x0006(SP)
Command example:
0xBB000101 (loaded in T9)
0xFFFFFFFF (loaded in T8)
Let’s analyze quickly the code:
0x2A0 SW T9, 0x0010(SP)
0x2A4 SW T8, 0x0014(SP)
The two words of the command is stored in a specific place in DMEM.
0x2A8 LH V0, 0x0006(SP)
We load the lower bytes of the geometry mode in register V0. Here the related geometry flags as per gbi.h:
#define G_ZBUFFER 0x00000001
#define G_SHADE 0x00000004
# define G_TEXTURE_ENABLE 0x00000002 /* Microcode use only */
# define G_SHADING_SMOOTH 0x00000200
# define G_CULL_FRONT 0x00001000
# define G_CULL_BACK 0x00002000
# define G_CULL_BOTH 0x00003000
0x2AC ANDI V0, V0, 0xFFFD
The code simply “clears” G_TEXTURE_ENABLE” flag (it simply becomes 0) potential set in the last byte of register V0.
0x2B0 ANDI V1, T9, 0x0001
The code takes very last byte of the first word of the command in register V1.
0x2B4 SLL V1, V1, 0x1
V1 is multiplied by 2.
0x2B8 OR V0, V0, V1
Register V0, containing the cleared lower bytes of the geometry mode, is ORed by V1, containing the last byte of the 1st command multiply by 2.
0x2BC J 0x0A8
0x2C0 SH V0, 0x0006(SP)
Before exiting the command, the lower bytes of the geometry mode is stored back in DMEM.
Now some may say: what would be the underlying reasons to have the texture flag set in the last byte of the geometry mode?
Technically speaking when 3 transformed vertex are turned into an actual triangle RDP commands, this byte is used to construct the command header of such a command.
Let’s check out gbi.h:
#define G_TRI_FILL 0xc8 /* fill triangle:
#define G_TRI_SHADE 0xcc /* shade triangle:
#define G_TRI_TXTR 0xca /* texture triangle:
#define G_TRI_SHADE_TXTR 0xce /* shade, texture triangle:
#define G_TRI_FILL_ZBUFF 0xc9 /* fill, zbuff triangle:
#define G_TRI_SHADE_ZBUFF 0xcd /* shade, zbuff triangle:
#define G_TRI_TXTR_ZBUFF 0xcb /* texture, zbuff triangle:
#define G_TRI_SHADE_TXTR_ZBUFF 0xcf /* shade, txtr, zbuff trngl:
For instance OR 0x02 (G_TEXTURE_ENABLE) by 0xCC (G_TRI_SHADE) you get 0xCE, which is G_TRI_SHADE_TXTR, the textured version of G_TRI_SHADE.
Now why not simply using the geometry mode flag? I would widely guess to for consistency purpose the intention was to keep texture parameters in the G_TEXTURE command. Nevertheless technically it does not make any sense! So let’s use as from now on the geometry mode flag for enabling/disabling textures. It would mean a tiny change for programmers.
Doing so would lead to all the below code to be useless:
0x2A8 LH V0, 0x0006(SP)
0x2AC ANDI V0, V0, 0xFFFD
0x2B0 ANDI V1, T9, 0x0001
0x2B4 SLL V1, V1, 0x1
0x2B8 OR V0, V0, V1
0x2BC J 0x0A8
0x2C0 SH V0, 0x0006(SP)
What remains would be:
0x2A0 SW T9, 0x0010(SP)
0x2A4 SW T8, 0x0014(SP)
It is simply storing two words in DMEM. We do have already a command to do so, G_MOVEWORD.
We simply have to have gSPTexture macro sending to two G_MOVEWORD commands. It does mean of course to create a new moveword indice, G_MW_TEXTURE.
#define G_MW_TEXTURE 0x120
#define gSPTexture(pkt, s, t, level, tile, on)
{
Gfx *_g = (Gfx *)(pkt);
_g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16);
_g->words.w1 = _SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8);
};
{
Gfx *_g = (Gfx *)(pkt);
_g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16);
_g->words.w1 = _SHIFTL((s),16,16) | _SHIFTL((t),0,16);
};
{
Gfx *_g = (Gfx *)(pkt);
_g->words.w0 = _SHIFTL(G_CLEARGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16);
_g->words.w1 = (unsigned int)(0xFFFFFFFD);
};
{
Gfx *_g = (Gfx *)(pkt);
_g->words.w0 = _SHIFTL(G_SETGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16);
_g->words.w1 = (unsigned int)((on)<<1);
};
#define gsSPTexture(s, t, level, tile, on)
{{
(_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16)),
(_SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8))
}},
{{
(_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16)),
(_SHIFTL((s),16,16) | _SHIFTL((t),0,16))
}},
{{
(_SHIFTL(G_CLEARGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16)),
(unsigned int)(0xFFFFFFFD)
}},
{{
(_SHIFTL(G_SETGEOMETRYMODE, 24, 8) | _SHIFTL(G_MW_GEOMODE, 0, 16)),
(unsigned int)((on)<<1)
}}
What does that mean? The complete G_TEXTURE is useless and can be scrapped, meaning that we get rid of 9 RSP instructions.
Finally we can create separate macros to update separately the 1st and the 2nd word of the gSPTexture.
#define gSPSetTextureTile(pkt, level, tile)
{
Gfx *_g = (Gfx *)(pkt);
_g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16);
_g->words.w1 = _SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8);
}
#define gSPSetTextureScale(pkt, s, t)
{
Gfx *_g = (Gfx *)(pkt);
_g->words.w0 = _SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16);
_g->words.w1 = _SHIFTL((s),16,16) | _SHIFTL((t),0,16);
}
#define gsSPSetTextureTile(level, tile, on)
{{
(_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL((G_MW_TEXTURE), 0, 16)),
(_SHIFTL(0x0000,16,16) | _SHIFTL((level),11,3) | _SHIFTL((tile),8,3)| _SHIFTL(0x00,0,8))
}}
#define gsSPSetTextureScale(s, t)
{{
(_SHIFTL(G_MOVEWORD, 24, 8) | _SHIFTL(((G_MW_TEXTURE) + 4), 0, 16)),
(_SHIFTL((s),16,16) | _SHIFTL((t),0,16))
}}
Finally we can notice that storing the tile and the level of the texture requires only a mere byte. When we will start optimizing the DMEM, this point will have to be checked out.
RDRAM <-> DMEM
The next
step of our IMEM optimization concerns way the code moves data from RDRAM to
DMEM or from DMEM to RDRAM. In order to do so, the code set a flag in a mere
RSP register to inform in which the direction the data is moved from/to.
Let’s see a
little bit how it works:
0x148 MTC0 S4,
SP memory address
0x14C BGTZ S1, 0x015C
0x150 MTC0 S3, SP DRAM DMA address
0x154 JR RA
0x158 MTC0 S2, SP read DMA length
0x15C JR RA
0x160 MTC0 S2, SP write DMA length
The above
code is called as a subroutine in various part of the code.
Register S4
becomes in COP0 the DMEM address from which/where to the data are to be
retrieved.
Register S3
becomes in COP0 the RDRAM address from which/where to the data are to be
retrieved
Register S2
becomes in COP0, depending on whether S1 is or not greater than 0, the length
from which is read/write from/to S4 the data.
Some may say
it is an efficient code. Not really…
The issue is
that you have to set register S1, depending on the direction of the data, as 0
or 1. Doing so requires an instruction before calling the subroutine so it is
not really an efficient solution as it is of course possible to set the
direction immediately after the return from the subroutine.
The code
could be changed simply to:
0x148 MTC0 S4,
SP memory address
0x14C JR RA
0x150 MTC0 S3, SP DRAM DMA address
On top on
saving 4 instructions and free a register, it prepares the ground for some
deeper future changes in the code.
And voila!
We will next time start working on the matrix related immediate commands :)
We will next time start working on the matrix related immediate commands :)