EDIT: It looks like this post might have originally gone up without content! Evidently I didn’t publish it with text before its publish date, or maybe my login cookie expired in the meantime. Hopefully it’s up now!
Kaze Emanuar, an expert on Super Mario 64’s code who I’ve linked multiple times before, tends to bang on this drum, but they’ve now done a 20-minute video that treats the issue with detail. They tell us that the Ninendo 64 is a rendering monster, and Nintendo’s use of it isn’t really optimal, especially in the subject of his fixation.
The problem, they say, isn’t triangle count, but cache misses. The N64, we’re told, can really motor (“vroom vroom” is the phrase they use), but fetching code and data tends to bog down the system while the data bus gets the necessary data. If that information is already in the cache, then access is much faster, as in, it directly affects the frame rate.
According to his data, unrolled loops, a traditional optimization measure, are actually bad, because all those extra instructions cause extra data fetches to read them. It’s better to use the loop instructions to run through the same code repeatedly, because it can run completely from the processor’s internal memory. Nintendo’s culling system actually hurts performance in most areas, because the extra data needed to implement their system results in more cache misses. And their culling system only considers data that’s out of sight horizontally, which is such a big problem on the vertical area Tick Tock Clock that there’s a kludge in the engine to reduce draw distance on that one level to make up for it.
I know! I link a lot of technical stuff here. It’s of interest to my diseased brain! But it’s got to be interesting to some of you, right? Well for those readers to whom it is of interest, here it is: