Here's a video I took a week ago, showing the implementation of the purple rocks and the bridge.

The rocks actually have a slight visual glitch in this video, that took approximately 4 hours of searching to solve. In the end, the culprit was an unconverted art asset that was lurking in the build. It was the wrong size, and as a result was trashing some of the transparency (mask) data needed for the rock, when loaded into memory. So an element I wasn't even using or displaying resulted in a few hours of debugging and searching! Hah!

I wanted to implement these new scenery elements to begin investigating rendering optimisations related to them... rather than because rocks are the most exciting thing I could be doing! Namely, I'm keen to bake some of the static scenery into the playfield. This means the scenery would only gets drawn once, as it scrolls onto screen, in a similar way to the tile data scrolling on screen. There are a number of static elements in Green Hill Zone that would benefit from this: spikes, rocks and parts of the bridge.

It's a simple concept, and you'd think it would be easy given this approach is already implemented for tiles, but is anything but... because of a combination of double buffering, partial screen updates for speed, clipping the sprites, and the way in which the playfield wraps internally. Despite having 4 copies of the display in memory (2 per buffer), many of the optimisations rely on a precise sequence of rendering events, and each buffer behaving in a very particular way.

I was hoping to get this optimisation finished by the end of the month, but its proving more traumatic than I first expected - I almost got had it working, before realising I'd violated a very subtle edge case which caused it to... erm break quite badly. Mostly because it's an optimisation attempting to integrate into a series of existing optimisations! And if it gets too complex and costly... well, it stops being an optimisation!

In terms of overall performance, the baseline speed is higher than before. I'm often hitting full-frame rate with a modest amount of BOBs. The footage above is from WinUAE and not locked to A1200 cycle accurate speed. It's slower than the video at obvious points. I'll grab some genuine hardware footage for next week's YouTube video.

One cool change was that I upgraded the sprite caching engine to be 'bucket based'. This is really just a fancy way of saying that it's around 4x quicker to search for a cached sprite than it used to be, because it has to search a smaller space to find the cached sprite. I also went through the existing rendering code and optimised it significantly. Most of these optimisations are meaningful, but aren't an architectural level; a series of small improvements making a difference as opposed to a revolutionary change. This includes: optimising stack usage, register usage, converting costly calculations into lookup tables and more.

The downside is that these improvements are somewhat negated as I add further level elements. Aspects like the bridge cause slowdown as each element can move individually as you cross it, and as such each log is rendered independently. So it's one step forward and one step back as is often the case with this work! Ultimately, it's rendering that is the bottleneck, less so the code.

I do feel I'm approaching the end phase of generic major rendering optimisations. There aren't many left I can think of. And the ones that are left are proving very complex and prone to problems.... So once this phase concludes, we'll have a better idea of how close to the original we can get on the A1200, and where we'll need to make sacrifices. Either way, anything that bounces out of the vanilla A1200 version can land in a FastRAM style build for those who want it!

My plan, for the next video, is to run through the speed improvements I've made so far in detail - and talk about what is still to come! Thanks for reading.