Archives of Genesis8 Amstrad Page




Bug hunting on Shinobi for Amstrad CPC by Richard Aplin

-

Richard Aplin is the coder of the Amstrad CPC port (1989) of the arcade game Shinobi (1987). While programming it, he searched for two weeks a rare bug of Shinobi, which he is writing about it on Twitter. With his agreement, here is all the text and screens present on Twitter (screens coming a bit later : more work to do).

Back in the days of 8-bit micros, in this case the Amstrad CPC (a popular - in Europe - Z80-based home computer). I worked for a games company, I was doing a conversion [i.e. rewrite] of Sega's "Shinobi" arcade machine onto this machine.

There were all sorts of geeky, tweaky technical tricks to make it run fast and look nice (coded in assembly language), and all went well, game turned out pretty nice, ran fast and played pretty well, until one day, when nearly done, play testers reported that it occasionally - crashed on one level. Boom! Reset. It was really hard to reproduce.

Nobody could come up with anything they actually _did_ to make it crash (or not); was about one in ~20 times(IIRC), when fighting a boss.

I had nothing fancy like a logic analyzer or in-circuit-emulator or other hardware debugging tools, just a regular retail computer. So it just crashed, once in a while, when playing one specific part of the game.

Sure, just some coding bug, like any other, not uncommon. But WHERE and WHY and HOW!? ..and why so hard to reproduce? You could play for hours and no problem, but just when it seemed like it was a ghost or maybe fixed for no known reason.... BOOM reset.

This went ON and ON and I was utterly mystified. I just could _not_ induce this bug to happen more often [the key to find/fixing it], no matter what I tried, I could not find even a way to reproduce reliably, let alone the root cause.

I started doing stuff like checksumming the RAM every frame, looking for some sort of random corruption, putting all sorts of checks in there that slowed it down to a crawl, and still nothing. The bug seemingly came and went as it pleased, never in quite the same place.

Until one day I got lucky. I caught it in the act! One single byte in the middle of my program code got trashed - and this time I caught it _before_ the whole thing blew up. But how?! What on earth was causing this? [OH for a hardware logic analyzer !

Finally, _finally_, after probably two weeks of solid bug-hunting and hair-tearing I found it.

So back in those days, it was customary for the game's music to be written by someone else, and provided as a binary blob of code plus data (e.g. 4Kbytes) that you would just call once a frame, and it took care of controlling the sound chip and playing whatever music tracks.

And it turned out that the music player (I didn't have source code) had a bug in it. Not a big bug. A teeeeeny little bug. It didn't audibly affect the music at all, but one _single_ note, on one channel, in one of the tunes on one of the levels, used a wrong data byte.

And normally, when that single duff musical note played, nothing bad happened, it was a fairly harmless bug, however it caused music player to read wrong byte of RAM; not just off by one byte, but off by tens of KBytes... in fact, it ended up reading a byte of the display ram.

If I recall correctly, it ended up - at that instant - reading a single byte of the display right around where this green circle is, in the upper left corner.

And when that single bad musical note in the tune played - IF the top bit (and only the top bit) of that pixel was a 1, it would then take that as a memory address ELSEWHERE in RAM and increment that location which corrupted a single byte inside of my program code, leading to a subsequent - but not quite immediate; a couple of seconds later, when a baddie was decided to shoot at you - crash.

This was a 2D scrolling game of course, so you were constantly jumping around and doing stuff fighting ninjas- the crash only happened if ONE pixel of the display was a certain color at the instant that one single note of ONE background tune played, and it wasn't in my code.

I vividly remember finally discovering the root cause (disassembling and patching the music player, and finally catching it 'in the act', and figuring out the long chain of events..) .. To this day I have never had a bug as hard as that one.. was SO rewarding to find.

I've had a bunch of equally obscure bugs in the decades since then, but the tools got so much better - protected memory, logic analyzers, CPUs that don't just explode on contact with bad data - nothing has ever been as difficult to track down as that one. Was a great lesson which is that if you persevere for long enough you will win in the end against a computer - bugs can't hide forever. Ever since then I've known it's just a matter of time, and patience. Nowadays I relish a good bug, I rub my hands and chuckle - the game is ON! ;-).


For more news, Go to home page