Technote PT 39 | February 1995 |
Code translation will benefit emulator performance for "typical" 680x0 code. Speed increases will naturally vary for different pieces of code. Programs which contain many small loops will benefit most. For example, Speedometer benchmarks are typically two to four times faster when code translation is enabled. Performance increases for typical application or OS code will probably not be this substantial.
Uses of the above calls should be kept to an absolute minimum. Wherever necessary, attempts should be made to use FlushCodeCacheRange. All other notification methods do not specify an address range and therefore require the DR Emulator to flush the entire translation cache. This operation is both time consuming and wasteful. Code which calls any of the above cache flushing mechanisms in a tight loop may perform very poorly under the DR Emulator.
As documented, calls to BlockMove will continue to provide cache coherency for blocks above a certain minimum size (12 bytes). Any developer wishing to use BlockMove for anything other than moving 680x0 instructions should use BlockMoveData instead to avoid the cache flushing overhead. (Note that BlockMove does not provide cache coherency for PowerPC instructions. Software which copies PowerPC code needs to explicity call MakeDataExecutable to guarantee cache coherency on split-cache processors such as the 603 and 604.)
Due to the design of the Mixed Mode Manager, many developers have released software which statically allocates routine descriptors. These data structures, central to the Mixed Mode mechanism, begin with a 680x0 A-Trap ($AAFE). Because developers have created these routine descriptors on the fly without flushing the 680x0 cache, the DR Emulator has a special check for this instruction. This guarantees that existing software works, and also allows developers to continue to allocate routine descriptors without worrying about cache flushing.
Interrupts
The interpretive emulator checks for pending external interrupts on each 680x0 instruction boundary. Because the DR Emulator translates entire blocks of code at a time, translated code only checks for interrupts at the end of each block. This means that the average interrupt latency may be slightly longer under the DR Emulator. However, the difference should be negligible. Debugging & Trace Mode
In addition to interrupts, the interpretive emulator also detects trace exceptions at 680x0 instruction boundaries. Providing trace mode support within translated code is difficult, so the interpretive emulator is used to handle all execution while running under trace mode. (Note that trace mode is used by MacsBug and other machine-level debuggers to set break points in ROM.)
When a bus error occurs on a real 680x0 machine, the bus error exception frame contains the PC of the instruction which caused the fault. On the original interpretive emulator, the reported PC was not guaranteed to be on the exact instruction, but somewhere within the instruction that caused the fault. On the DR Emulator, the PC may point to a location before the faulting instruction. This may make debugging slightly more challenging when dynamic recompilation is active.
Developers have been warned not to rely on timing of processor instructions, as exact instruction timings may not be consistent across processors or emulators. However, various pieces of software still use "DBRA loops" as a basic timing mechanism. Code translation introduces timing anomalies. The first time a piece of code is executed, the code translation process requires extra time. Subsequent executions are relatively fast.
The DR Emulator attempts to detect and deal with timing-dependent code. Whenever the 680x0 interrupt mask is raised to a non-zero level, code translation is temporarily suspended until the mask drops to zero. This means that any timing-dependent code running at interrupt time or running at a time when the interrupt mask has been raised will run with the interpretive emulator with relatively consistent timing.
Further Reference: