mercredi 30 décembre 2020

Microcode Optimization: IMEM (Part7) - Updated

It has been quite long that I did not share any update on the project. I must admit that for the last six months, I could not find enough time and motivation to focus on it. I do not impose on myself any pressure or deadline when it comes to a hobby. Yet I am quite perseverant so be reinsured that I will finish what I have started 😀

 Let’s try to summarize the achievements of the recent weeks:

G_GL  & G_ENDDL

I have significantly changed the code related to the way the display lists are managed. Even though command G_GL in the original Fast3D microcode is quite efficient in this respect, there were few points which, at least from my point of view, could be improved.

First of all, 3 general registers of the RSP were exclusively dedicated to the display lists in original Fast3D microcode:

K0 ($26) for the place in RDRAM of the next command to be executed in the current running display list

K1 ($27) for the place in DMEM of the next command to be executed in the current running display list

GP ($28) used a counter to ensure that the space dedicated in DMEM for running display list was not exceeded.

I came to the conclusion that K0 and GP were not really necessary. With K1 we can determine whether the limit of the place allocated in DMEM for the display list. As well with K1, it is possible to compute at any time the RDRAM address of the next command in the current running display list by simply using the base address of the segment used by such a display list.

Though it took me quite a while to find the right way to do it, my new implementation does not require at all the usage of K0 and GP. However as a side effect, the command ending a display list (G_ENDDL) has been merged with G_GL.

G_SETGEOMETRYMODE, G_CLEARGEOMETRYMODE

Having freed 2 registers, I decided, as it is done for the microcode of Factor 5 used by Indiana Jones and the Infernal Machine, to use one of them to hold the current geometry mode. It does mean that there is no necessity to load or store such a geometry mode from/to DMEM. Doing so saves 3 IMEM instructions, one word in DMEM but more importantly allows various parts of the microcode to access immediately the current geometry mode.

Commands of 4 words

In Fast3D, every command is composed of 2 words. I do understand the underlying reason of SGI/Nintendo in this respect: RDP commands are usually 2 words and the DMA engine of the N64 is 64 bits. With a fixed size, managing memory is much easier than with a size that may change for each command. As some commands may be composed of 4 words, I circumvented this this limitation by simply adding at the end of the displaylist a scratch space for the potential two additional words of the commands. I use this feature for textured rectangles for the time being.

Meanwhile I got rid of G_RDPHALF_1, G_RDPHALF_2 and G_RDPHALF_CONT as they became completely useless.

"New" commands

 
G_BRANCH_Z has been implemented by in a completed different way than F3DEX2. Actually I renamed the command to G_COMPARE as it is now not only Z than you may compare by any information stored in DMEM with the data provided by the programmers. Such comparison can be done for a byte, a half word or a word.

Additionally on the contrary of G_BRANCH_Z which is limited to branch to another displaylist, the programmer can decide what he want to set as result of the command. It is simply done by the way of skipping or not the following commands in the displaylist, depending on the results of the G_COMPAREcommand.

For instance you may deactivate point lighting when the vertex dealt with are too far away from the camera.

I also created a new command which can be very useful with this  G_COMPARE, G_SKIP.

G_SKIP is very simple, it skips a number of commands in the displaylist. For instance you want not only to deactivate point lighing but actually you want to skip the activation of the lighting, the action of texturing, the RDP commands loading textures in TMEM, etc.

I cannot imagine all the possibilities with those two commands but I feel that you may play a lot with those :)

G_CULLDL, still inspired from F3DEX2, have exactly the same feature than G_COMPARE. You don't automatically close the displaylist. it will skip or not the following command. It can be indeed closing the displaylist but it can branching to another one, etc.

All the above changes took a very limited number instructions compared to my previous update. At current stage my customized Fast3D microcode has now more features than F3DEX microcode and I have still numerous ideas which I would like to implement.

I hope next time to be able to update the lighting part of the microcode, even if it is quite a challenge.

Finally I would like to wish you a happy new year 2021!!! 😋

2 commentaires:

  1. Always happy to read these updates. Happy New Years!

    RépondreSupprimer
  2. Would you replace ucodes in comercial games with your custom ucode?
    I'm thinking about decompiled games like super Mario 64 or Perfect dark (when
    it will finished httpss://ryandwyer.gitlab.io/pdstatus/)

    In your custom code can you change the fps sync for avoid slowdowns from 30 to 20, when fps only can get 28 for example?

    RépondreSupprimer