dimanche 26 janvier 2020

The masterpiece graphic microcode behind the Nintendo 64 version of Indiana Jones and the Infernal Machine and Star Wars Episode I: Battle for Naboo

 

Introduction 

 

Factor 5 used to be one of the best game development studio at the very end of the 90’s/early 2000’s. Its programmers, coming from the German hacking and demo scene, were very talented and, thanks to a collaboration with both Nintendo and LucasArts, were able to deliver stunning graphic achievements on the N64 and GameCube. (https://en.wikipedia.org/wiki/Factor_5)

After the success of the Star Wars: Rogue Squadron in 1998, Factor 5 started to work two new games, Indiana Jones and the Infernal Machine and Star Wars Episode I: Battle for Naboo, with the intention to squeeze the Nintendo 64 to its best. In order to do so, Factor 5 decided to develop a “new” graphic microcode (and optimize its audio microcode called Musyx). (https://www.ign.com/articles/2000/11/10/bringing-indy-to-n64) (https://www.ign.com/articles/2000/11/11/interview-battling-the-n64)

At the very end of the commercial life of the N64, both games were released, too late to attract the favor of the critics and a public already focused on the next generation of consoles, meaning the technical achievements injected of those games went unfortunately unnoticed.

GlideN64, as its predecessor Glide64, has mainly been developed as a High Level Emulation (HLE) plugin. A microcode is a library of high level macro commands (high level in compare with processor op-codes). LLE way is to run these commands as actual hardware would do, instruction by instruction. HLE implements what commands in that library do as fast as possible without running actual command’s code.

As explained by Sergey in his blog few years ago (http://gliden64.blogspot.com/2014/11/a-word-for-hle.html), few games could not be emulated graphically in HLE as the documentation necessary to do so was unavailable. Yet few years ago Sergey and I started on reverse engineering and implementing customs graphic microcodes.

When we started our journey to decipher the one used by Indiana Jones and the Infernal Machine and Star Wars Episode I: Battle for Naboo, we could only be impressed on how well written and optimized it was and though we are unable to understand its intricacy in details, we do believe that its prowess deserves to be documented for the posterity.

Please note that in order to understand the below documentation, you must be familiar with graphic programming (OpenGL), N64 hardware and the Fast3D microcode. (https://en.wikipedia.org/wiki/Nintendo_64_technical_specifications) (https://en.wikipedia.org/wiki/Reality_Coprocessor)

On the contrary of what could be suggested by Factor5, the graphic microcode of both Indiana Jones and Battle of Naboo was NOT developed from scratch. It was obviously developed with the source code of the F3DEX microcode supplied by Nintendo in its SDK, which is just an optimized version of the Fast3D microcode used by Super Mario 64. Of course most of the graphic commands were rewritten along with many major additions (the size of the microcode is about twice as large as the other ones) but its core is quite similar with the very first game of the console.

What remains mainly is the way to transform input vertices into a specific output data structure stored into a buffer within Data Memory (DMEM) of the RSP (Reality Signal Processor). Those structures may be then selected by a triangle command, formatted into a low level triangle command and then used by the Reality Display Processor (RDP). The code used in this respect is comparable to the code used in the F3DEX microcode where there is no correspondence with the code of other types of microcode (Turbo3D, ZSort) to do such an operation.

Another evidence in this respect is that the header of the command (i.e. 0xBC for the MoveWord command, 0xBF for the TRI1 triangle command, etc.) are the same than the F3DEX microcode.

As an interesting consequence, it means that actually none of the N64 microcodes were ever written from scratch by 3rd party developers but only by SGI or by Nintendo. However it is true that only around 20% of the code is somehow related to F3DEX, meaning that this latter was used as a workable environment to develop the new commands filled with plenty of unique features.

I/ Few outstanding features compared to F3DEX microcode

A. DMEM address is directly integrated into the graphic command

 

DMEM is a small 4kb memory which the RSP microcode uses to store the data retrieved from RDRAM. The data is used by the RSP to do the computations according to the specification of a command.

In the F3DEX microcode, the DMEM address is not part of the command itself but it is rather an index. The actual DMEM address used by this index is stored in DMEM itself at the boot of the microcode, hidden to end users. In this so called Indy/Naboo microcode, the DMEM address is set directly in the command. This unique feature which helps to save some space in DMEM for other purpose.

Let’s take a simple example:
0xBC command is a MoveWord command, which simply store a 32 bits words in DMEM. As you may be aware of, moved words will be used later by other commands as parameters.

0xBC00014C
0x00000715

0xBC is the header of the command, 0x14C is the DMEM address and 0x00000715 is the word stored at this address. For comparison, in F3DEX:

0xBC000406
0x002D4CD0

0x06 is a mere flag which is used by the microcode to locate in DMEM a base DMEM address. Such a flag can vary depending on the type of word to be stored (i.e. a segment RDRAM address). 0x0004 is added to this base DMEM address. In our example base DMEM address is 0x160 + 0x004 = 0x164. So the word 0x002D4CD0 will be stored at DMEM address 0x164.

B. Geometry mode is a mere RSP register

 

In the F3DEX microcode, the Geometry mode is stored in DMEM and each time it has to be checked out or changed, it has to be loaded in a RSP register, changed and then stored back. In the Indy/Naboo microcode, there is simply an entire RSP register dedicated to the geometry mode. It does mean that you have one register less but an easy way to manage the geometry mode and as well saving a bit of space in DMEM.

C. Playing with display lists

 

In Indy/Naboo microcode, the way the display lists are managed are relatively different than in F3DEX.
  1. Execution of multiple display lists at the same hierarchical level
In the original F3DEX display lists are relatively straight forward, in a way that they are, as in OpenGL, managed hierarchically. You first call a new display list, then a nested display lists, which can as well call a nested display lists.
As for Star Wars: Rogue Squadron, a clever mechanism was developed to “jump” from a display list to another display list. At the beginning of the each display list, instead of having the first immediate command to be executed, you have simply a RDRAM address which may be called at the end of the display list by the immediate command 0xB5. In such a case, a display list will be retrieved from specified RDRAM address. This loaded display list also contains a RDRAM address at its start. So instead of nesting the display lists, you move from one display list to another display list at the same level of hierarchy.
  1. Execution of a display list under condition
This mechanism is incredible. MoveWord command (0xBC00058C) may provide a RDRAM address. When running some immediate commands which may lead to draw some graphics (textured rectangles, triangles), BEFORE being actually drawn, a nested display list is retrieved and executed from such a RDRAM address. It is the case for the immediate RDP command 0xE4/0xE5, for non-rejected triangles, for particles. Such a mechanism links the actual drawing a RDP primitive to a display lists which set the parameters of such drawing (i.e. Combine Mode, Other Modes, Texture Image, Tile, etc.). And such a mechanism occurs ONLY when the primitive is to be indeed drawn. So for instance If a triangle is out of viewport and thus rejected, this display list for pixel pipeline setup will not be executed. In this way the N64 RDP is only used where necessary!

D. Clever memory segmentation

 

In F3DEX, the code manages the segmentation of the memory through the usage of the MoveWord command. With this method, the number of segments is limited to 16 and requires to store the words in DMEM. In the Indy/Naboo microcode, the segmentation of the memory is managed differently and somehow as the “jump” mechanism, already explained for the display list. At first immediate command 0x02 provides a RDRAM address. This is the RDRAM address of the first segment. The very first word at this RDRAM address is the RDRAM address of the following segment. The size of each segment is limited to 0x100 bytes and when an immediate command requests data beyond the segment, the remaining data is retrieved from the following segment where again the first word provides the RDRAM address of the following segment.

E. Intricate command structure

 

In F3DEX the size of the command is fixed to two mere 32 bits words with the same exact purpose. In Indy/Naboo, the size of a command can be much larger. For instance the biggest ever immediate command of the microcode, which is used to generate the terrain polygons from a height map, is composed of 16 32 bits words!!!!

0x05000000
0x217A800F
0x045ED4E3
0x00000000
0x000001E4
0xD642FE9B
0x00000000
0x00000000
0x00000000
0x00000000
0xFF000000
0xFF000000
0x00000000
0x00000000
0x00000000
0xEF5B0017

The very same immediate command may have its size varying according to a flag within such a command. For instance the TRI command 0xB4 for textured polygons is larger than the one for shaded polygons
.
TEXTURED TRIANGLES

0xB4000600
0x06000628
0x0C000408
0x06500678
0x07DD062C
0x07DD062C
0x0787062C
0x07DD062C

SHADED TRIANGLES

0xB4000400
0x06000628
0x0C000408
0x06500678

Command’s flag also may change its behavior. For example, TRI command 0xB4 with flag 0x06 is used for textured triangles with regular textures, but flag 0x0E turns on texture coordinates generation for reflective textures.

0xB4000E00
0x099809C0
0xACA0A4A8
0x09700948
0x7F0000B8
0x7F0000C0
0x7F0000B0
0x7F0000A8

Sometimes the same command may have completely different purpose. For instance command 0x05 can be used to generate texture rectangles, generate particles, generate terrain from a height map, generate specific vertex. Except for generating graphics, there is nothing in common between those immediate commands actually.

II/ Listing of the commands with high level explanation

 

Even if we were able to decipher the microcode, it does not mean that all parameters of those commands were fully apprehended. Some, of course, are quite obvious but some are used in very large and complex commands and therefore abstruse.

Command 0x01

 

The main purpose of this immediate command is to retrieve data from RDRAM and potentially use such data to compute new data.
The structure of the command is the following:

0x01ODDDLL
0xBBAAAAAA

0x01 Header of the command
O Option of the command: for the same DDD, the option can lead to other part of code
DDD DMEM address where the data will be stored. ALSO the code will check it and depending on it, will route the code to a specific portion of code, which will then route again by O.
LL number of bytes to be retrieve from memory
BB it is a mere byte which can be used according to the route the command will take after having retrieving the data (i.e. For the number of lights)
AAAAAA When is not zero, data is retrieved from this RDRAM memory, when there is only 0x000000, data is retrieved from a segment.

Therefore the very same command 0x01 can be used for numerous scenarios, such as:

When O is 0, the command is as the F3DEX command MoveMem, meaning that it simple retrieve data from RDRAM and store them in DMEM.

When O is 1, the command is like the F3DEX VTX command, meaning that vertices are retrieved from RDRAM to DMEM, transformed and then stored into a buffer.

When O is 2, lighting calculations performed. The command retrieves data from RDRAM and computes colors for vertices. The colors are stored into a buffer from where the triangle commands retrieve them later. The lighting system is very complex, with many options.

When O is 3, the command sets the number of light and potentially retrieves the light structure.

When O is 5, the command retrieve vertices from RDRAM in a specific way to DMEM, transformed them and then stored them into a buffer. Such an option concerns 2D graphic where triangles are used.

Command 0x02

 

It set the RDRAM address of the 1st segment usable by command 0x01. As already explained the RDRAM address of the next segment is the 1st word of the RDRAM address of the previous segment.

Command 0x05

 

This command generates primitives in various way (from a mere texture rectangle to particles to triangles used for field, etc.). The last byte of the second word determines command’s mode.

0x18 (2 words)

0x05XXXXXX
0xXXXXXX18

This sub-command transforms selected vertices by Modelview matrix (0x010E403F) and store them into a vertex buffer.

0x27 (6 words)

0x05XXXXXX
0xXXXXXX27

The sub-command generates vertices and some flags used by another 0x05 sub-command.

0x24 (10 words)

0x05XXXXXX
0xXXXXXX24

The sub-command generates particles (actually small texture rectangles) from the vertices/flags obtained by previous 0x05 sub-command.

0x15 (8 words)

0x05XXXXXX
0xXXXXXX15

The sub-command generates a mere texture rectangle using a vertex as center of such a rectangle. It is mainly used for explosions.

0x4F (2 words)

0x05XXXXXX
0xXXXXXX4F
Such a sub-command is a mere flag used by sub-command 0x0F to switch it between its short (4 words and long version (16 words)

0x0F (4 or 16 words)

0x05XXXXXX
0xXXXXXX0F

Such a sub-command is used to set parameters for sub-command 0x09 and 0x0C. There is a short and long version, activated by the above sub-command.
0x09 and 0x0C (6 words)

0x05XXXXXX
0xXXXXXX0C

0x05XXXXXX
0xXXXXXX09

Both sub-commands generate the triangles for the ground in Star Wars Episode I: Battle for Naboo (up to 32 per sub-commands). It seems to be generated from a colored height map.
As you understand, command 0x05 is a set of 8 sub-commands!

Command 0x06

 

This command is the same than gSPDisplayList command in F3DEX.

Command 0x07

 

This command is the same than gSPBranchList command in F3DEX.

Command 0xB4 (4 or 8 words)

 

It is similar to the TRI2 command in F3DEX, except that texture coordinates and indices in the colors buffer provided as command’s parameters.

Command 0xB5

 

This command triggers a “jump” to another display list previously set as the first word of the current display list.

Command 0xB6

 

This command is the same than gSPClearCeometryMode command in F3DEX.

Command 0xB7

 

This command is the same than gsSPSetGeometryMode command in F3DEX.

Command 0xB8

 

This command ends a display list.

Command 0xB9

 

This command is similar to gSPSetOtherMode command in F3DEX (lower half)

Command 0xBA

 

This command is similar to gSPSetOtherMode command in F3DEX (higher half)

Command 0xBB

 

This command is similar to gSPTexture command in F3DEX.

Command 0xBC

 

This command is similar to the Moveword command in F3DEX.

Command 0xBD

 

Only used in Battle of Naboo. It sets the OtherModes for the triangles used by the commands generating the ground.

Command 0xBE

 

Only used in Battle of Naboo. It sets the OtherModes (along with immediate command 0xBD) and texture coordinates for each triangle generating the ground. In conjunction with parameters set by sub-commands 0x05 -0x09 and 0x0C, it can also generate such triangles (up to 32 of them).

Command 0xBF (4 or 8 words)

 

It is similar to the TRI1 command in F3DEX, except that texture coordinates and indices in the colors buffer provided as command’s parameters.

Conclusion:

With such an advanced microcode it is clear that not a lot of N64 games can compete graphically with Indiana Jones and Star Wars Episode I: Battle for Naboo.
As pictures speak louder than words, here some stunning examples of the outstanding graphic exploit.

The lighting is absolutely splendid in Indiana Jones and the Infernal Machine.
 For instance at the start of level named Volcano, 12 lights are used at the same time!!



In another level called Palawan Temple, splendid point lights are displayed.



Particles are used in many places without any slowdown in both Indiana Jones and the Infernal Machine and Star Wars Episode I: Battle for Naboo. One example, between many, would be the snow.




The terrain in Star Wars Episode I: Battle for Naboo is huge and without any fog.



Some great explosions effects are often shown in Star Wars Episode I: Battle for Naboo.



Splendid reflection mapping effects are sometimes used.



As you may see, both games are simply amazing for a Nintendo 64. Having in mind that the games uses as well the Musyx audio microcode, the console has most likely shown with them the best of what it could offer. As well it proves that skilled programmers can overcome limitation of a machine far beyond expectations!!!

1 commentaire: