The structure of the dynarec itself is an important factor in the performance of the emulator. In order to convey some of the changes we’ve made to the dynarec, you have to understand how its structured and how it works. You can divide my dynarec into a few distinct pieces: the translator, the trampoline, the code cache, and some run-time helper functions.
The translator is given an address at which it will translate a chunk of MIPS code into PowerPC. It uses a total of 3 passes to accomplish that. Pass 0 reads in a instruction at a time until it hits an unconditional jump, a jump register, or an exception return, which signifies the end of the function its trying to recompile. Its main purpose is to identify any branch instructions and determine where they are branching to; it does this to ensure that no branches will be branching into a register mapping. Pass 1 actually does the translation by converting each MIPS instruction to a sequence of PowerPC instructions. Branches are left unfilled because we don’t yet know how many PowerPC instructions will be between any given source instructions. Pass 2 then fills out the branch destinations now that every instruction’s position is known. The translator uses volatile and nonvolatile PPC registers in its generated code. Nonvolatile registers are used to store constants like the memory address to store the register values into, the address of the N64 memory, and a few other useful emulator variables. Volatile registers are used to temporarily store N64 registers for the generated instructions to operate on. These are mapped to hardware registers as needed, and stored to memory when changed and no longer needed.
The code that’s generated by the translator goes into the code cache. On a PC with no real memory limit this isn’t necessary. However, on the Wii, memory is quite constrained. In total, we have access to a little under 88MB of memory. However, using the larger MEM2, which is 64MB, is somewhat slower than using the 24MB of MEM1, so we have to limit the code to fit in MEM1 for it to run as fast as possible. Not to mention that the cache has to share MEM1 with all of the emulator code and static structures.
I have a few functions which the recompiled code will call in order to reduce the amount of generated code generated for complex instructions. For example, interpreted instructions, updating Count, and taking floating-point unavailable exceptions. These are just ordinary C functions which will only be invoked by the recompiled code. These functions allow for a reasonable trade-off: faster than interpreting and relatively small code generated for just the function call.
The trampoline, or dispatcher, is at the heart of the dynarec. The trampoline is responsible for determining if code at a given N64 address is recompiled, and if its not, recompiling it, and then calling the recompiled code. When the code that the trampoline invoked needs to branch to another block of code, it returns to the trampoline with the N64 address of the code it wants to run, and the process begins again: the trampoline looks up the new address, possibly recompiles, and then calls the desired recompiled code. Branches within a function don’t need to return to the trampoline, but because any function can be freed from the code cache at any time, every branch outside of the function must return to the trampoline to be dispatched.