Technote OS 04 | May 1992 |
System 7.0.1 introduced a new version of the Standard Apple Numerics Environment (SANE) package referred to as OmegaSANE ([[Omega]]SANE). While it improves the performance of SANE, it is not without compatibility risks, which are detailed below. New binary <-> decimal conversions have been included that are correctly rounded across the entire range of double extended. This will result in incompatibility with previous conversion algorithms for a variety of input values; the results of the [[Omega]]SANE conversions are uniformly more accurate, and will be compatible with future releases of SANE.
The [[Omega]]SANE release provides a number of performance enhancements while maintaining conformance with SANE. The following enhancements have been implemented:
* Correctly rounded binary <-> decimal conversions
* Faster transcendental functions
* Backpatching of SANE traps for faster package entry
Tables showing the performance of [[Omega]]SANE are given below. A key aspect of the performance gain is the "backpatching" of SANE traps. This mechanism will be discussed below under "Pitfalls."
The version of [[Omega]]SANE supplied with System 7.0.1 installs only on machines equipped with a Macintosh IIci ROM or later and also equipped with an FPU. For example, it installs on a Macintosh IIsi running System 7.0.1 and equipped with an FPU, but not on one without an FPU. On the Macintosh Quadras and the PowerBook 140/170 it is in ROM, although it doesn't load on the PowerBook 140 unless it has an optional FPU. On machines where the FPU is optional, addition of an FPU may require re-installation of System 7.0.1 before the FPU version of [[Omega]]SANE will load.
Table 1 presents configuration information concerning the versions of SANE which may be installed by System 7.0.1. Information in this table is subject to change. The FPU column states whether the machine has an FPU or if it is optional. The column labeled "Correctly Rounded Bin <-> Dec" shows whether improved versions of the binary <-> decimal conversion routines (discussed below) are available, and which version (V.1 or V.2) is supplied. The "[[Omega]]SANE FPU Version" column states whether [[Omega]]SANE will load on a given configuration.
Note: The information in this table will undoubtedly change in the future. Developers should not assume that any particular machine/system software combination does or does not have [[Omega]]SANE.
Table 1 SANE Configurations
Macintosh CPU FPU Correctly [[Omega]]SANE Rounded FPU Version Bin<->Dec Mac Plus No No No Mac SE No No No Mac Classic No No No Mac Classic II Optional V.2 in ROM w/ 7.0.1 & FPU+ Mac LC Optional V.1 in ROM w/ 7.0.1 & FPU+ Mac IIsi Optional V.1 in ROM w/ 7.0.1 & FPU+ Mac SE/30 Yes No No Mac II Yes No No Mac IIx Yes No No Mac IIcx Yes No No Mac IIci Yes V.2 with w/ 7.0.1 7.0.1 Mac IIfx Yes V.2 with w/ 7.0.1 7.0.1 PowerBook 100 Yes No No PowerBook 140 Optional V.2 in ROM w/ 7.0.1 & FPU+ PowerBook 170 Yes V.2 in ROM Yes Quadra 700 Yes V.2 in ROM Yes Quadra 900 Yes V.2 in ROM Yes+ FPU optional systems may require reinstallation of
System 7.0.1 after installation of an FPU before the
[[Omega]]SANE FPU version will load.
There is no way programmatically to disable [[Omega]]SANE in System 7.0.1, nor is it a user option. There is also no way to reliably detect that [[Omega]]SANE is present.
[[Omega]]SANE includes binary <-> decimal conversion routines that may produce results that differ from previous versions of SANE. Versions of these routines have been in some machine ROMs since the Macintosh IIsi, and an improved version (V.2) is in the newest ROMs, as well as [[Omega]]SANE. Refer to Table 1 for current configuration information. As a result of these improved conversion routines, floating-point constants that are used in the body of source code will compile differently in the presence of [[Omega]]SANE than with older SANE engines. The results are uniformly better, but may cause unexpected variances from test suites (for instance). Therefore, care must be given to the arithmetic environment in which compilations are made. One can tell which version of the binary<->decimal conversions is currently in use by performing the following computation on the Calculator DA: 45/100 - 0.45. On older versions of SANE, the result is -2.71051E-20. With [[Omega]]SANE, the result is 0 as expected.
[[Omega]]SANE can significantly improve the performance of many floating-point SANE operations. It does this by replacing _FP68K trap calls with JSR instructions directly into the appropriate SANE code. This is discussed in more detail under "Pitfalls." 7.0.1 [[Omega]]SANE does not alter _Elems68K trap calls, but internal code improvements are used to increase the performance of transcendental functions. Finally, [[Omega]]SANE does not affect the performance of code compiled to use the FPU directly, although some library routines used with such code (such as the CSANELib881.o routines NextAfter, Classify, Scalb, and Remainder) will be backpatched by [[Omega]]SANE because they call _FP68K.
Table 2 shows typical performance improvement using [[Omega]]SANE on a Macintosh IIci. Of course, actual performance depends on how heavily you use SANE and the mixture of SANE operations you use.
Table 2 SANE Performance
Operation Macintosh Macintosh Speedup IIci SANE IIci Factor (in flops) [[Omega]]SA NE (in flops) Add/Subtract 18,794 59,259 3.15 Multiply 18,182 53,571 2.95 Divide 18,476 55,300 2.99 Square Root 18,547 65,934 3.55 (Exact) Square Root 18,349 59,113 3.22 (Inexact) Cosine 902 7,058 7.82 Sine 1,290 8,108 6.29 Tangent 826 7,185 8.70 Logarithm 820 7,643 9.32 Exponential 723 7,317 10.12 Compound 413 3,324 8.05 Annuity 358 3,191 8.91
[[Omega]]SANE achieves most of its performance improvement through use of a technique known as "backpatching." Use of backpatching introduces a number of compatibility implications. To implement backpatching, the front end of toolbox SANE has been altered to have multiple entry points corresponding to the most important operations, and your RAM-based application code is modified on the fly replacing traps with speedier JSRs to the appropriate entry point. Subsequent execution causes the code to bypass the trap dispatcher and call SANE directly. For example, an archetypal SANE package call looks like this:
pea fooSrc ; Push address of the source operand pea barDst ; Push the address of the destination operand move.w #OpCode,-(sp) ; Push a SANE opcode word _FP68K ; Trap to SANE
Notice that the last two instructions occupy three words, just enough for the desired JSR. [[Omega]]SANE modifies the RAM-image of the application resulting in code like the following:
pea fooSrc ; Push address of the source operand pea barDst ; Push the address of the dest. operand jsr `SANE entry point' ; JSR to appropriate SANE entry point
Herein lies a potential problem. There is nothing to guarantee that the instruction immediately prior to the _FP68K trap was actually executed. A program control operation, such as a BRA, could have caused control to be passed to the _FP68K operation without executing the preceding move.w. For instance, an execution sequence such as the following:
pea subSrc ; Push source for subtract pea subDst ; Push destination for subtract move #$0002,-(sp) ; Push subtract extended opcode word bra @1 ; Branch to _FP68K trap . . . pea addSrc2 ; Push source for add pea addDst2 ; Push destination for add move #$0000,-(sp) ; Push add extended opcode word @1 _FP68K ; Trap to SANE
would have the unseemly result of "backpatching" a JSR to the extended add entry point, and future executions of the subtraction code would branch to the middle of a JSR causing an illegal instruction error (the illegal instruction error might not occur right away). Note, however, that the following similar code sequence works correctly in the presence of backpatching:
pea addSrc ; Push source for add pea addDst ; Push destination for add bra @1 ; Branch to _FP68K trap . . . pea addSrc2 ; Push source for add pea addDst2 ; Push destination for add @1 move #$0000,-(sp) ; Push add extended opcode word _FP68K ; Trap to SANE
The type of code sequence that is problematic for backpatching (as above) cannot be emitted by the current MPW C and Pascal compilers or by current Think(TM) compilers. It also cannot occur with code generated using the macros contained in SANEMacs.a.
Warning: Although the current generation of MPW compilers create [[Omega]]SANE-safe code, the earlier, MPW C 2.0.2 compiler may, under limited circumstances, generate code that has problems with [[Omega]]SANE. This is discussed in more detail below.
Additionally, if you have hand coded assembly or code which has been otherwise optimized to use common _FP68K trap instructions you may be at risk from backpatching. We recommend you adjust your code to conform to the prototypical sequence above as there is no performance penalty involved.
Another common programming practice is illustrated in the following example:
move.w #$0000,D0 ; Move extended add opcode word to D0 bra @1 ; Branch to shared backend for SANE trap call . . . move.w #$0002,D0 ; Move extended subtract opcode word to D0 bra @1 ; Branch to shared backend for SANE trap call . . . @1 move.w D0,-(SP) ; Push SANE opcode word on stack _FP68K ; Trap to SANE
This code operates correctly in the presence of [[Omega]]SANE, but is clearly not backpatchable. If you have code that does this, you might consider rewriting it to obtain the performance increase, but the code will work as is.
The MPW C 2.0.2 compiler may, under limited circumstances, generate code that works incorrectly under [[Omega]]SANE. Apple recommends that all developers use the latest development tools, but as conversion of source may be difficult and time consuming in some cases, below is a description and workaround for the problem. Again, this only affects code compiled without the -mc68881 option.
The following code demonstrates the problem
struct foo { double data; float num; }; foobar(f,b) struct foo *f; unsigned char b; { extended num; num = 3.14159; if (b) f->data = num; /* FP operation ends block */ else f->num = num; /* Another FP operation ends block */ return; }
The critical combination of events occurs when more than one block in an if-else or switch statements ends with a floating-point operation or conversion. The compiler tries to optimize the final floating-point operation by pushing a selector on the stack then branching to a common _FP68K trap. This is a classic example of code, like that cited above, that works incorrectly under [[Omega]]SANE.
The workaround is to eliminate the final floating-point operation. Here is one way to do this:
foobar(f,b) struct *f; unsigned char b; { extended num; int idummy; num = 3.14159; if (b) { f->data = num; idummy = 1; /* End if block with a non-FP operation */ } else { f->num = num; idummy = 0; /* End else block with a non-FP operation */ } return; }
In this case, each floating-point operation gets its own _FP68K trap so there are no problems.
The backpatching performed by [[Omega]]SANE can have other implications for developers. For example, applications that checksum their code segments (perhaps to check for viruses) will detect that the segment has been modified. Such applications should checksum segments only at application startup or when the segments are loaded, not after they have been executed.
Likewise, an application must not create a situation which will cause the Resource Manager to write a code segment out to disk after it has been executed. If such a segment been modified by [[Omega]]SANE, it can fail when subsequently loaded into memory. Since CloseResFile is called when the application quits it will automatically write out changed resources (i.e. if the resChanged attribute is true) when closing the file.
Further Reference: