System Fault Devlog 6

One trap I’d like to avoid falling into for this devlog is that of regurgitating the pithy advice you find over and over again in these sorts of posts. For better or worse, that isn’t happening here.

A couple weeks ago, I added the first major new mechanic. Until now, System Fault rooms were all large open spaces of the same width and height. Now we have server rooms.

A server room contains destroyable obstacles that block visibility. The only shape currently implemented is a hollow square, though more irregular shapes are coming soon. If you leave server rooms intact, they can be used strategically. Shoot out a tile or two, then hide behind intact tiles and pick off robots in nearby areas.

If enough of a server room’s tiles are destroyed, the entire server room explodes. In addition to wiping away all the visibility/motion-blocking tiles, server room destruction summons all robots from a large area around the server room to avenge its destruction. I’ll likely add an additional mechanic giving each robot permanent accuracy/speed penalties for each server room destruction on a given level. Leave them intact as a hiding place, or blow them up and hide from or fight the oncoming swarm to make all remaining robots on a level easier to defeat.

I had lots of fun implementing this, until the performance problems began. My long static paths through wide open spaces suddenly had cramped corridors and destroyable environments to contend with. Some robots couldn’t even calculate a path due to spawning in the middle of enclosed server rooms. I implemented raycasting for sound sources and occlusion, and suddenly those rays were bouncing through fragmented spaces. Destroying server rooms could easily cause 20 robots to initiate complex multi-frame calculations at once, and suddenly a fun mechanic was bogged down in real-world complexity.

I set about trying to determine why things were slowing down. Again, I have no accessible profiling tools that break things down by system, so my role was less developer and more detective. I started by disabling robot AI. There are a few semi-complex calculations robots run to determine if they’ve seen a bullet, where they first spotted it, and how to reasonably safely approach the spot without walking gleefully into the player’s gunsights. Disabling the AI did seem to make many performance issues go away. Long story shorter, I dug into the AI library I was using and determined that it might be behaving in a way that caused “thinkers” to hang, thus contributing to the performance hit. I won’t call out the specific library because that’s not the point of this. Ultimately I rewrote my AI from scratch and got rid of a bit of unnecessary complexity, and while that helped performance some, it didn’t resolve everything.

Initially I was going to hold my nose and write a post about not reaching for more complex tools before they proved necessary. For better or worse, that isn’t happening here.

Next I looked into pathfinding. If I couldn’t calculate a path, or couldn’t traverse one I’d already created, I just ditched it and let the system recalculate it. This worked fine in wide open spaces, but not so well when a robot could get stuck in a server room. I like the idea that server rooms could just be built around a robot, and that you might be surprised by a hidden bot when cracking one open. So I left that mechanic in place, and spent time determining how a data-oriented ECS could communicate that no path was possible back to a caller and let it determine what to do. Ultimately I settled on applying a NoPath tag, aborting the pathfinding operation, and punting the reaction back to the initiating system. I also revamped the mechanism by which asynchronous tasks were handled. Ultimately, those modifications helped some as well. But again, the performance problems lingered.

This post was going to be some silly, wise-sounding platitude about premature optimization that I hadn’t quite figured out yet. For better or worse, that ain’t why we’re here.

I went down a number of other rabbit holes. The AI paths were tuned better than a concert violin. The raycasts in the environmental audio systems were tweaked. Can you tell that I’m running out of words for “I threw everything I had at this problem and changed everything I could think of?”

A few days ago I noticed something strange. If I disabled basically everything in the game until it was nothing but a series of empty rooms through which you could run, performance worked well until the end of the level was nearby. Then it tanked. Bullets slowed to a crawl. The player barely moved. Even informational commands stopped responding.

Longer story (slightly) shorter, there is a collider in the last room of each level that plays a sound, with logic attached that determines if leveling up is possible. The system in which this collider and sound are created is designed to run once. It didn’t. Instead, it ran a hundred or more times per second, every second. By the time the end of the level was reached, hundreds of thousands of these colliders existed. More importantly, hundreds of thousands of sounds existed. I have a system which attaches a sound to a specific coordinate or object, then causes the sound to play when that object becomes visible. So when the level exit became visible, the game gleefully tried to create and play a hundred thousand or more sounds at once.

I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were suddenly silenced.

Needless to say, I had not accounted for suddenly spawning and playing hundreds of thousands of sounds in my design. And even without having to actually play the sound, creating unnecessary physics objects a hundred times a second caused all sorts of issues all across the board.

Performance is now great. Cracking a server room summons hordes of robots eager to take you down, and picking them off is fast and lots of fun.

And this would have all been incredibly obvious if the profiling tools available to sighted developers using this particular engine were accessible to me. I don’t want to point fingers or assign blame. I’m a niche in a niche, and I know that. If there’s one platitude I could reduce this post to, it’s that accessibility is important, even if it isn’t immediately apparent why, say, a blind person might care about tools normally used for making graphical games. I find myself making that case, or ones similar eough to it, again and again. It’s exhausting.

I also implemented mines as powerups, as well as a slot and inventory system for implementing more items.

Now that performance is (finally!) acceptable, I can’t imagine why I wouldn’t have a new gameplay recording to share next week. This week I ran out of steam wracking my brain against these performance issues, but hopefully next week I’ll record what it sounds like to load up on powerups, strategically plant a few mines, and take down a server room. Stay tuned!