As short guide to help demystify some of the 'magic' involved in getting Atari ST games running on the Atari Jaguar.
The ST and Jaguar share two things in common. Firstly, both run a 68000 CPU. Secondly, they are both capable of running in 16 colour 4 bit per pixel graphics mode. So, given that, you might imagine that it would not be too difficult to make software from the ST run on the Jaguar. However, in practice, things are not quite that simple.
Lets take a quick look at why.
|Atari ST||Atari Jaguar|
|RAM Address Range||$0-$400000||$0-$200000|
|Colour Mapping||RGB 3/3/3 [512 colour]||RBG 5/5/6 [65536 colour]|
|GEMDos||functions via TRAP #1||none|
|BIOS||functions via TRAP #13||none|
|XBIOS||functions via TRAP #14||none|
As you can see, there is quite a bit of difference. On top of the above, you also have the MFP processor in the ST which handles the system timers. While timers are available on the Jaguar, they are handled completely differently. However, even given all of the above, the major problem has not even been mentioned, which is:
|Atari ST||Atari Jaguar|
|Video RAM||320x200x4 planar||320x200x4bpp chunky|
The ST video memory is organised as 4 bitplanes in chunks of 16 pixels (2 bytes) per plane - four planes, interleaved. Jaguar video memory is linear (chunky), with one byte representing 2 pixels.
Take a look at the information below. To produce a pixel of colour 15 in the far left hand column of 16 pixels on the ST, you would need the following 4 16bit values.
%1000000000000000 <- Bitplane 0
%1000000000000000 <- Bitplane 1
%1000000000000000 <- Bitplane 2
%1000000000000000 <- Bitplane 3
To produce the same result on the Jaguar, it would look like this:
Now it may look like that is 4x the data needed for the ST display, but remeber, those 4 lines of numbers are generating SIXTEEN pixels, whereas that single line for the Jaguar will only generate FOUR pixels. There are a number of ways of dealing with this, including:
- Recode all of the graphics routines to Jaguar specifications
- If the display is written in 'chunks' (ie, MOVEP writes for 8 pixels) re-organise the data format for Jaguar output (I did this for Beebris)
- Take the screen output as rendered for the ST and process it into a Jaguar format in real time
For these redirected games I have used the 3rd method. Initially this was using some screen conversion code from GroovyBee, but I have since switched to a routine initially written by Orion_ and optimised by SCPCD.
In any case, before we even get to needing that code there is much more ground work to be done. The code needs to actually execute without crashing. If it doesn't do that, then there is no point rendering anything! So, where to start?
For these initial tests I have tried to keep things as simple as possible, selecting games that load entirely in one go (avoiding multi-load and disk activity). I picked games that didn't use complex raster effects and palette splits, displaying only the basic 16 colours at any time. They don't use overscan or demo techniques, and are not STe enhanced (Blitter, hardware scrolling, etc).
OK, so we've selected our game. Let's load it into Steem Debugger, and stop the emulator at the very first instruction that it executes after it has loaded/depacked and is ready to run. We can now go into the memory window and 'dump' the ST's memory into a file.
Back on the Jaguar, we load this memory dump to the same place in memory (remember, the lower 2mb is mapped the same) but we can't run it yet. The code will be expecting the host machine to be an ST, with all the accompanying hardware that goes with it. Running this on the jaguar will just produce a system crash. We need to patch this up!
On the Jaguar side we need to define and display a screen buffer (4bpp 320x200) and set up a vertical blank interrupt. This is so we can synchronise the code with the system.
In the ST code, three of the first things we need to do are:
- find the Vertical Blank interrupt code
- find the code that sets the base address for the screen memory
- find the vsync routune
We 'know' that the VB interrupt on the ST is vectored via an address stored in $70.w - so we need to find out where this is being done. We then need to change this code so that when it gets here it sets a flag, so that our Jaguar handler knows it's time to run the ST VB code each frame. However, the VB code on the ST will execute as an interrupt (exception) and terminates with an RTE instruction. If we call this from the Jaguar, it will crash. We need to change the RTE to an RTS so we can call this via a JSR from our emulated VBlank interupt.
And here comes another problem. We don't have the source code for these games. How do we make changes in line in the code when we can't re-assemble it? The answer is op-code patching. In this case it is simple. RTE is stored in memory as $4e73. We need an RTS, which is $4e75. So by changing that one byte, we can change the instruction. But how do we let the Jaguar 'handler' know when the ST is at the point that this vector needs to be set?
We overwrite the instruction that sets the vector (in this case, move.l #$someaddress,$70) with a JMP instruction pointing to a patch routine in our handler. The handler will set the flags and then JMP back into the code at the instruction *AFTER* the one we replaced. JMP instructions are 6 bytes long ($4ef9 xxxx xxxx) so we might have to execute some other instructions following the branch ourselves before jumping back to make everything align correctly. We also have to save and restore any registers we might use during the patch handler. In any case, we have now patched the VB set routine.
||; $eab4 = address of code that sets up the interupts in the ST game|
|move.w #$4ef9,(a0)+||; $4ef9 = opcode for JMP|
|pea setupHW(pc)||; our patch code address...|
|move.l (a7)+,(a0)||; ...put into the JMP instruction|
;; later in the Jaguar source
|setupHW:||move.w #$4e75,$eba0||; RTE at end of VBI(ST) to RTS (Jag)|
|move.l #$eb2a,do_gamevbl||; address of VBL routine (ST)|
Now we need to find the video screen address routine. We patch this similarly so instead of writing to the video address pointers, the screen address is written to a variable that our Jaguar handler can look at and knows where the screen data is in memory.
And now we find the vsync code. This needs to point to a handler that instead of vsync, will convert the entire 32k ST display into 32k of Jaguar display (using the GPU) - this takes roughly 50% of the Jaguar bus time per frame, so the time taken is close to what waiting 1 frame would be anyway.
So, given the above, we can now run the ST code and something should appear on the screen... right?
Wrong! The chances of the Jaguar making it to that point in the code without crashing are slim to none. Any one of a great number of things could trip it up - access to a h/w register, waiting for a system timer, etc., atc. We need more patching! (or is it potching?)
We can ease the pain by setting all of the exception vectors on the Jaguar to point to an RTE. This means if something goes wrong, or does something it shouldn't, the Jaguar will blindly ignore it and carry on regardless - handy!
But what happens when the ST game calls GEMDOS, BIOS or XBIOS? The answer, is, of course, CRASH! So, we have to take the trap vectors for TRAP #1 (GEMDos), TRAP #13 (BIOS) and TRAP #14 (XBIOS) and point them to our own routines which will intercept these calls, work out which function they are trying to use and emulate the function's results on the Jaguar.
Meanwhile, back in Steem's Boiler Room, we have set breakpoints on every single hardware mapped address we can think of and have run the game. Every time the emulator stops at a hardware access we have to patch the code to do the Jaguar equivalent (or ignore it - NOP is your friend!). Repeat ad nauseum until the game no longer breakpoints in the emulator. Congratulations! You have now fixed up all the hardware accesses.
Once this is done, we might start getting things on the screen. But even at this point it is unlikely to be running correctly and most probably will have no audio or joystick/keyboard input. This is yet more patching of a similar nature. As you can see, this is not emulation where it wouldn't matter what code we passed to it - everything is done hands on, one step at a time.