My lizard is the Lizard of DOS
The last few days I've taken a bit of a diversion from NES development, and ported Lizard to another old familiar platform: DOS.
I'll get into some details about how this was accomplished below, but if you've got an old PC lying around, and feel like playing Lizard on it, you can try it out right now! If you backed this game, you can use the download key you were given to download the DOS version now as well. If you have an itch.io account and claimed that download, the download link should be conveniently at the top of the game's page there:
To run this, I think you probably need at least a 486 to have a decent framerate. It should technically run on a 386 with 4MB of RAM, but maybe not very well. Vintage Sound Blaster audio devices are supported (remember to SET BLASTER), and so is the Gravis Gamepad or similar 4-button joystick.
Note that a lot of modern PCs are very capable of running some version of DOS, especially FreeDOS, so you can even probably run it on your regular machine, but most likely without sound. While Sound Blaster was the ubiquitous computer audio system of the 90s, you're not likely to find one in a recently built computer.
Why did you do this?
This is a fair question. Please allow me to indulge you with a little bit of personal history...
My family got an Atari ST in 1986. I learned to program in BASIC on it, at first guided by some wonderful children's programming books by Usborne, and later by typing in code listings from magazines. That's where my interest in making computer games began. Later we got an "IBM Compatible" 286 machine, which came with QBASIC. I continued making little games with that as well, but I wanted to make bigger and better games, and I really didn't know how to progress with QBASIC. Simple stuff like putting sprites on the screen was limited and sluggish, and I knew the machine was capable of doing a lot more.
I'd heard that C and Assembly was what people used to make "real" software and was eager to learn, but I didn't know where to look. I tried my local library; they had one book on assembly, but it was for a computer from the 1970s, and completely incomprehensible to me. Around 1996 I was given a book called "Turbo C++ Programming in 12 Easy Lessons" that came with a free version of Borland's famous Turbo C++ compiler, and I immediately read it back to front. I got the basics of C++ from this, but the book had almost nothing to say about graphics and sound, or other game topics. Mostly its examples were text programs about business oriented topics like organizing an employee database.
In 1997 my family got a Pentium computer, and the internet, and suddenly I had access to so much information that I'd wanted to know very badly. I found wonderful articles like David Brackeen's VGA programming guide, and the PC Game Programmer's Encyclopedia. I found an open source DOS compiler called DJGPP. I also found IRC chat rooms and could talk to other people who knew about this stuff! In particular I found a community based around an RPG engine called VERGE.
This was the time in my life where I really started to learn how to make games.
So... I may have mentioned in a previous update that I never owned an NES until 2011. I'm planning to write an article about Lizard's design later, which will be titled "My lizard is not the Lizard of Nostalgia", but in the case of the DOS port, no I definitely did do this for nostalgia. Not nostalgia for DOS games, but for working with DOS itself, and for that particular teenaged period of explosive learning.
At the same time, since I had this past experience to draw on, I had a pretty good idea of the size of the task, and I thought it would be something I could do with only a few days work. Often these assumptions turn out to be incorrect, but this time it really did go pretty smoothly.
How did you do this?
Lizard was designed to be an NES game, which means that a lot of decisions made were to make effective use of the NES hardware. The PC version, on the other hand, is basically simulating that hardware, and translating that into a more generic form of output.
The video is just software rendered, and the image produced is sent to the PC's GPU. Sound is also software rendered, and again this just gets sent to the PC's sound hardware like any other recorded sound. All the important work is already being done by this software simulation of the NES.
So... porting the PC version to another platform, like DOS, ideally should only require replacing the relatively small layer of code that delivers sound/video/input to and from the computer. For the PC version I actually used an open source library designed for this purpose called Simple DirectMedia Layer.
To get it running in DOS, I needed to replace those basic services. Specifically: Video, Sound, Input, and a Timer. Back when I was learning DOS programming, there was another open source library called Allegro that I figured might do the job...
At first I thought I should use the same DOS compiler that I learned on: DJGPP. This was a port of the well known GCC compiler for DOS. Unfortunately, I tried using it in DOSBox as well as FreeDOS. It became unstable for me on both, crashing when trying to compile various things.
Instead, I found the Open Watcom 1.9 C++ compiler. The Watcom C compiler had been around back then too, but unlike DJGPP its development carried on much longer, and it has a very usable Windows version that can still compile DOS executables. I found this much easier to work with, and had no stability problems.
If you're familiar with DOS games, you may have heard of the standard VGA Mode 13h. This has 320x200 pixels, and can display 256 colours at once. This would be great for a game that's designed for that resolution, but the NES picture is slightly taller with 256x240 pixels. Since Lizard often has important information at the top and bottom of the screen, it would be too much to cut off 40 lines to fit here.
However, Mode 13h is just a staring point. 13h is actually hexadecimal for the number 19, which you give to one of DOS' built in routines that sets up the screen mode for you. If you're satisfied with the ready-made modes, setting up the VGA hardware is as simple as picking a number, but really "Mode 13h" is just a recipe. There's a lot of ingredients you can customize if you want to operate the VGA hardware registers directly!
One of these customizations became known as Mode X. Mode X extends the resolution vertically to 320x240 pixels, which is perfect for this! I need some pillarboxing to fill the horizontal space, but all the pixels fit on the screen now. This mode also has a square pixel aspect ratio like most modern video devices.
As a consequence of having to draw more lines, the refresh rate of the monitor has to be slowed down to compensate. In this case, it works out in my favour, though. The standard Mode 13h refresh rate is 70 Hz, but Mode X reduces it to 60 Hz, which coincidentally is the same framerate as the NTSC NES. This means I can reproduce the correct game speed with much smoother animation.
There's another negative consequence here, though. The graphics memory page in DOS is only 64 kilobytes long. Mode 13h fits nicely under the limit: 320 x 200 x 1 byte per pixel = 64,000. Mode X does not: 320 x 240 = 76,800. How is that going to fit?
The answer to this is that the VGA device itself has 256 KB of internal RAM, but it only maps 64 KB at a time into the DOS memory space. To get around this every group of 4 pixels is each mapped to its own "plane" on a different 64 KB page. The reasons for this weird "planar" organization largely have to do with backward compatibility with older graphics devices. (A more straightforward linear buffer of pixels is sometimes called "chunky", the opposite flavour to "planar".)
Mode 13h actually hides the planes by allowing you to access all four planes at once as if they contained the same value. It wastes 3/4 of the available memory just to make it convenient, but it really was very convenient. Mode X instead gives you access to all of that VGA graphics memory as it's really laid out, but you have to carefully step 4 pixels at a time.
As an illustration, here's what Lizard looked like when I was working on converting Lizard's rendering to the planar format. The original image gets cut into 4 parts, then spliced back together column by column.
I have to thank Michael Abrash's amazing Black Book for teaching me about Mode X. I couldn't afford it when it came out, and I ended up reading it at a bookstore one chapter at a time. It was eventually made available for free (see the link above), and despite having a lot of outdated technical stuff, like Mode X, I found his approach to programming and problem solving profoundly insightful. I highly recommend reading the first few chapters.
So the Allegro library had support for Mode X video, a variety of sound devices, keyboard and joystick input, and a suitable timer for regulating the framerate, but this convenience came at a price. A library like this has to take a generic approach to most things, and that makes it hard to take only what you need from it. There were a few things I was unhappy with, but the worst problem was that it increased the executable file size by about 600 KB, making it a bit too big to comfortably fit on a floppy disk.
Even though I didn't end up using Allegro, it did work very well for quickly building a prototype DOS version, and even while replacing it, I appreciated it as an open source example to compare my implementation against. It helps a lot to be able to see how someone else did it, in case you may have missed something important.
When testing in DOSBox, this first implementation with Allegro required a speed setting of about "100,000 cycles" to run smoothly, roughly equivalent to a 100 MHz Pentium machine. I hoped I could do better than that, though. Replacing it with something equivalent but more minimal improved speed a little, but the real advantage of that was that I could now easily get inside every part of the implementation, and make much more intimate optimizations.
The first step in optimizing was writing some code that inserted timers to tell me how long various parts of the system were taking on each frame. This immediately showed me where the most potential for optimization was. This is really rule number 1 for optimization: measure everything, don't assume. Theories about what makes code slow or fast are very often wrong.
In game development a lot of time is spent talking about graphics and rendering performance, and there's a good reason for this. There are a _lot_ of pixels on the screen, and each one has to be individually accounted for. 256x240 pixels rendered at 60 FPS amounts to nearly 4 million pixels per second. There really isn't anything else in Lizard's code that runs nearly that often. It's also why games systems usually have a dedicated GPU to render graphics separately and more efficiently than a general purpose CPU can. In this case almost everything the NES offloaded to its GPU I have to simulate in software.
So... the biggest time-waster was the part of the code that draws the background. The NES simulator has to draw the background layer as a collection of 8x8 tiles. The code that draws each tile could probably be optimized quite a bit, but I had a better idea.
In my previous article about NES scrolling, I mentioned that the NES has 2 screens worth of memory to store its background. More specifically for Lizard I used a horizontal arrangement that is two screens wide, and most importantly, it does not really make changes to that background layer very often. So, instead of drawing every tile on every frame, I made a cache for the background that only updates tiles on the frame that they get changed.
With this in place, the tile drawing hardly ever took up significant frame time-- mostly just on the frame when a new room is loaded-- so I didn't actually have to try to optimize the tile rendering code at all. The background drawing code was reduced to just copying lines out of the cached background. (memcpy is particularly efficient and effective here.)
So the background rendering performance problem was simply obliterated by caching, and only updating it when it was (infrequently) changed. There's also a sprite layer on top of the background, but since the NES isn't capable of drawing a lot of sprite tiles, they really weren't a performance issue here.
The next biggest burden was actually just copying from my software-rendered NES screen buffer directly into VGA memory. I mentioned above how Mode 13h was wasteful of memory but made up for it with convenience; with Mode 13h I could just copy a big block of memory directly from my drawing buffer into the VGA device, but Mode X requires me to split everything up into four planes. This splitting was taking quite a bit of time... I'm not sure if I could have hand-optimized the splitting routine by rewriting it in assembly, but I thought since I already had a caching system in place, why not just split the cache itself into four planes?
The video above shows a partially completed planar renderer. I had implemented the sprite layer only. The non-planar background is seen scrolling at 4x speed and overlapping itself.
So I rewrote the background tile renderer to split its output into four parts. The splitting is probably about as inefficient it was for the final copy, but now it's only done when the background changes! By making the tile rendering code slower, I made the copy to the graphics card much faster.
There was a little bit of tricky logic for scrolling among the 4 planes, and every scanline transfer now becomes 4 block copies instead of 1, but copying a straight block of memory from one place to another is still very efficient! (A Lizard's best friend: memcpy.)
After a few more tweaks, like some simplifications to the sound rendering, and using the VGA hardware palettes to mimic the NES' own hardware palettes, I could get 60 FPS in DOSBox down to about 30,000 cycles. 30 FPS down to about 20,000. With audio off I can get 30 FPS down to about 10,000 cycles. DOSBox's performance notes suggest that's roughly equivalent to a fast 386. That seemed pretty good to me, and I was out of "big fish" to optimize, so it's where I stopped.
You might compare against games like Commander Keen, which could run very well even on a 286 machine, and with a lot less RAM, but to make the comparison fair you have to remember that a game like that was designed around the hardware it was going to run on. I think I could make a Lizard-like game that would run comparably, but a lot of things would have to change about its design.
Room layout would have to be guided by how EGA graphics memory functions. The soundtrack should be completely rewritten for the efficient Adlib OPL2 synthesizer instead of the CPU-intensive task of synthesizing NES audio in software. The world should be laid out with choke points that make loading data in and out from disk acceptable. A lot would have been done differently if this was supposed to take the form of an older DOS game.
So... Lizard now runs in DOS, and it's even small enough to fit on a single 1.44 MB floppy disk. I don't know if this will matter to most of you. To be honest, if Lizard could run in NESticle I wouldn't even have bothered, but I thought it would be fun and relatively easy, and maybe a few more people would be able to play Lizard than before.
In other news, thanks to the help of a few translators, Lizard is now available in 4 languages:
- Français / French
- Português do Brasil / Brazilian Portugese
- Deutsch / German
There are a few more langauges on the way. I actually made an overhaul of the translation system to allow for a much bigger alphabet to be used, primarily with the goal of supporting Japanese. (Coming soon!)
Also, if you'd like to see Lizard played extraordinarily well, this Saturday a runner named Smartball will be doing a live speedrun for the Best of NES Marathon. I might be there for commentary. The marathon goes all weekend, but the Lizard run is at 12:25 PM on Saturday.
Otherwise, I've still got 2 or 3 more updates planned before this Kickstarter goes silent. There's a couple of development topics with Lizard that I still want to cover. More to come.