We’re nearing the WiiSX beta 2 release, so we decided that it’s time to show a preview video of the current state of the emulator and tell you about our progress. Last summer, we released a beta of WiiSX so that we could put it on the back burner for a while and focus on Wii64. We weren’t too happy with WiiSX beta 1 because it lacked a proper gui and suffered from stuttering audio and other bugs. So, we’ve focused the past few weeks on adding a gui to WiiSX and improving the sound, pad plugin, and other various parts of the emulator. While there will still be plenty of work left for the future, we feel that this release will greatly improve the emulator’s playability and polish. Look forward to the release, but for now please enjoy the preview.
Downloads are available for Wii64, Cube64, and the source.
We’ve put a lot of hard work into this, and we hope you guys will have as much fun playing as we have developing it. If you’re feeling generous, we’re always accepting donations to keep this site running and to purchase any hardware we need for testing. You can donate here.
A few notes about this release
Wii64 users (Wii build):
The emulator can be controlled with any combination of GC controllers, Classic Controllers, and Wiimotes with Nunchuks (sorry, Wiimotes must have nunchuks). See the included readme for details on the controls. You can load ROMs (only uncompressed) and saves from a FAT formatted SD or USB (details on the folders required are in the readme).
Cube64 users (Gamecube build):
Due to memory limitations, there is no expansion pak support, it uses TLB cache, 2xSaI and FB Textures may break things and of course ROMs will run slower.
We don’t want to discourage other people forking and working on the emulator; however, any unofficial builds must not be called “Wii64″ or “Cube64″ nor use the Wii64 or Cube64 logo in order to avoid confusion. For now, we’re just releasing a snapshot of the source used to build Beta 1, but we’re planning on updating the public repository with each commit we’ve made to our private repository so that everyone can see the progression of the code. This process will begin soon. Public SVN is up to date with beta 1 code.
Finally, there is a support forum located on TehSkeen, so if you are having trouble with Beta 1, please seek help there before submitting issues to the Google code tracker. You can also join in and discuss the project with other users there.
The structure of the dynarec itself is an important factor in the performance of the emulator. In order to convey some of the changes we’ve made to the dynarec, you have to understand how its structured and how it works. You can divide my dynarec into a few distinct pieces: the translator, the trampoline, the code cache, and some run-time helper functions.
The translator is given an address at which it will translate a chunk of MIPS code into PowerPC. It uses a total of 3 passes to accomplish that. Pass 0 reads in a instruction at a time until it hits an unconditional jump, a jump register, or an exception return, which signifies the end of the function its trying to recompile. Its main purpose is to identify any branch instructions and determine where they are branching to; it does this to ensure that no branches will be branching into a register mapping. Pass 1 actually does the translation by converting each MIPS instruction to a sequence of PowerPC instructions. Branches are left unfilled because we don’t yet know how many PowerPC instructions will be between any given source instructions. Pass 2 then fills out the branch destinations now that every instruction’s position is known. The translator uses volatile and nonvolatile PPC registers in its generated code. Nonvolatile registers are used to store constants like the memory address to store the register values into, the address of the N64 memory, and a few other useful emulator variables. Volatile registers are used to temporarily store N64 registers for the generated instructions to operate on. These are mapped to hardware registers as needed, and stored to memory when changed and no longer needed.
The code that’s generated by the translator goes into the code cache. On a PC with no real memory limit this isn’t necessary. However, on the Wii, memory is quite constrained. In total, we have access to a little under 88MB of memory. However, using the larger MEM2, which is 64MB, is somewhat slower than using the 24MB of MEM1, so we have to limit the code to fit in MEM1 for it to run as fast as possible. Not to mention that the cache has to share MEM1 with all of the emulator code and static structures.
I have a few functions which the recompiled code will call in order to reduce the amount of generated code generated for complex instructions. For example, interpreted instructions, updating Count, and taking floating-point unavailable exceptions. These are just ordinary C functions which will only be invoked by the recompiled code. These functions allow for a reasonable trade-off: faster than interpreting and relatively small code generated for just the function call.
The trampoline, or dispatcher, is at the heart of the dynarec. The trampoline is responsible for determining if code at a given N64 address is recompiled, and if its not, recompiling it, and then calling the recompiled code. When the code that the trampoline invoked needs to branch to another block of code, it returns to the trampoline with the N64 address of the code it wants to run, and the process begins again: the trampoline looks up the new address, possibly recompiles, and then calls the desired recompiled code. Branches within a function don’t need to return to the trampoline, but because any function can be freed from the code cache at any time, every branch outside of the function must return to the trampoline to be dispatched.
In the past few months, we’ve made significant progress on the Wii64 dynarec. Most of the bug fixes are pretty minor fixes like correcting off-by-one or other various memory errors; however, there are several substantial changes to both the infrastructure and features of the dynarec.
On the N64, there is a register called Count which keeps track of how many cycles the system has been running. This is primarily used to determined when interrupts can be taken. In Mupen64, Count is estimated as 2 cycles per instruction executed. Some emulators actually increment Count differently depending on which instruction ran (because on the hardware, some instructions will take longer to execute). The fact that Mupen was doing really well with the Count estimate led me to believe that getting an exact Count was unnecessary, and I initially tried playing some tricks to estimate without explicitly keeping track of Count. However, I quickly discovered that even deviating from the way Mupen counts will quickly result in crashes and freezes. Several major fixes have involved correcting edge-cases which caused Count to be somewhat off.
Initially only 32-bit integer instructions were supported in the dynarec (they comprise most of the ISA, and I just wanted to get something working before I tried anything too complicated). Once I got the dynarec running with just those basic instructions, it was still fairly slow because a lot of instructions were still being interpreted (thus trumping any performance benefits of the dynarec). Getting the floating-point and 64-bit instructions (which aren’t used all that often as the name N64 would lead you to believe) supported in the dynarec were important for improving the dynarec performance beyond that of the pure interpreter.
With the exception of the way floating-point comparisons and conversions are done in MIPS vs PPC and MIPS’s sqrt, floating-point was fairly straightforward to implement in the dynarec as most instructions had a 1-1 mapping. Even the comparisons were relatively simple although they do not take advantage of what I feel is a more rich FP comparison on the PPC. However, since the Wii does not have a floating-point square root instruction, it was difficult to support the MIPS sqrt instruction in only a few instructions. We did manage to get it working with what seems to be good-enough precision using the PPC frsqrte (floating reciprocal sqrt estimate), Newton-Raphson refinement, and a fmul. The only floating-point instructions left to support are conversions to and from 64-bit integers which are nearly impossible to generate code for because there is no hardware support on the Wii and the process is rather complex.
64-bit instructions were a similar story: most of the instructions had a straightforward translation from MIPS to PPC (even though the PPC in the Wii is 32-bit), but there were a few which were difficult to emulate. The simple addition, subtraction, and logical instructions were very simple: you simply need to use two PPC registers to store a 64-bit value and there are instructions which will keep track of and use the carry bit so that a 64-bit add/sub can be performed in two 32-bit add/sub. The 64-bit shifts were relatively complicated because you have shift both 32-bit words separately, and then determine what would have spilled from one into the other and or it into that word, but it can be done in around 10 instructions in PPC. Like with FP, there were a few 64-bit instructions that we couldn’t reasonably generate code for: the 64-bit multiply and divide are too complicated for generating code using only 32-bit operations.
However, even with most of the ISA implemented, there was still significant room for improvement in performance. I have since made some other significant improvements which I will be detailing in more posts to come soon.
First off, the April 1st Tiizer video is actual gameplay using a recent dev build of Wii64. As you can tell, tehpola has made tremendous progress in debugging and optimizing the dynamic recompiling core. However, there are still a handful of showstopping bugs that we need to work through before we can make a public release. Also, you should be aware that not all of your favorite games will run on the initial release because of a variety of reasons. We are not planning on initially supporting the Expansion Pak because of memory limitations. After further optimizations, tweaks, and profiling to reduce our memory consumption, then we hope to add Expansion Pak support. We may not initially support games that execute code directly from the cart or that use virtual memory (i.e. Goldeneye) because this requires more investigation and significant code changes in the dynarec to implement. Also, some graphics microcodes aren’t supported in glN64, so a few games such as Conkers BFD won’t work just yet. But, sit tight and we’ll continue to work on more features for Wii64 after the initial release.
A complete re-code of the Wii64 gui is underway, so you’ll be able to enjoy using the wii-mote for navigation and also some sleek new graphics. We’ll have a new look for the initial release, but we also plan on adding more features to the gui over time for your enjoyment.
If you have watched any of the recent gameplay videos, then you know that the accuracy of the glN64 port has increased substantially since the Wii64 Tiizer release we made for the Homebrew Channel. Because GX is not 1:1 with openGL, there was a lot of investigation and tweaking required for me to get the behavior on GC/Wii close to what glN64 looks like on PC. There are still a variety of bugs for different games, so don’t expect everything to look perfect, yet. Emu_kidid is a great tester, and he is maintaining an internal graphical issue list to work on. I hope to add a couple more features to glN64 prior to release, including glN64′s primitive framebuffer texture support as well as 2xSaI scaling for textures. The plan is, of course, to continue hunting down bugs and adding features after the upcoming release.
As for the other graphics plugins, glN64_GX is much faster than both soft_gfx and GX_gfx, so we may only release a build with glN64_GX. The only drawback is that currently glN64_GX won’t render graphics for demos that only directly manipulate the framebuffer with the CPU. However, when I have time I’ll add a feature into glN64_GX that will allow it to render the N64′s framebuffer rather than rendering primitives passed through the N64′s graphics pipeline. Then, you can just flip an option in the menu when you are running homebrew N64 games and demos that write directly to the framebuffer. Also, I have already done some work on porting Rice’s video plugin to Wii64. Rice supports more microcodes than glN64, including the one that Conkers BFD uses, and it should be faster than glN64. We have a vision of supporting custom texture packs in Wii64, so we will implement that feature as well. We hope that you, our users, will contribute your creative talents in developing texture packs to share with the Wii64 community. We can’t say when custom texture pack support will be finished, but expect it sometime in the future.
Some of you have been asking for an update on WiiSX. We are planning on working on a release of WiiSX after the upcoming Wii64 release. The reason we have not done a release yet is because there were some serious bugs in SVN last fall, and we also wanted to focus on completing Wii64. We have since resolved some WiiSX issues, internally, and so once Wii64 is out the door, we feel that we can also follow up with a WiiSX release relatively soon afterwards.
Finally, we’d continue to ask that if you enjoy using Wii64 when it’s out that you consider donating to the project. Right now, most of the donations we receive go toward hosting costs. However, there are also some small accessories like component cables and classic controllers that we are considering purchasing with donation funds to aid in development.
As its been a while since our last binary release, we wanted to clarify why its been so long and what we’re waiting for for our next release. Early on in development we were making relatively big changes which significantly improved the emulator; however, we’ve gotten to the point where a lot of the big things have been done, and only need perfecting (with the exception of the dynarec). Thus, we haven’t felt the need to make several binary releases as most of the users who aren’t interested in compiling the source themselves are mostly uninterested in the kinds of changes that have been made. We do indeed have a milestone for our next release planned: a working, stable dynarec. Most of the work that has gone into the emulator since our last release has been focusing on the dynarec, and since we still don’t have a completely working dynarec, there haven’t been many noticeable changes. So we’re holding out for a dynarec which supports at least most games without crashing before we make our next release. After getting it running initially, there will likely be more room for optimization if there are still any performance issues. In that case, we will likely have frequent releases once again as there will be noticeable improvements with each optimization that is made. As always, please be patient. We’re working hard to make the next release something worth the wait.
On an unrelated technical note, we have managed to free up 1.75MB in RAM by consolidating the various memory LUTs (look-up tables) into a single LUT for all memory operations. In Mupen64, there are 8 different memory LUTs which are used to determine how to handle memory accesses at different addresses. These 8 are split up by read/write byte/half-word/word/double. Instead of having 8 large LUTs, I created one LUT for all memory operations which points to smaller LUTs which handle the different memory operations in the specified segment. Memory operations only require an additional load for the second level LUT so there is no performance impact by this change. We are still looking into other ways to further reduce our memory usage to make sure that we have plenty of room in memory for recompiled code produced by the dynarec.