Gaddammit!
So here I am, looking forward to a nice quiet weekend; hang back, watch some telly and maybe read a bit – but NNnnneeeEEEEEUUUuuuuuuuu!! Someone had to write an interesting article about sine approximation. With a challenge at the end. And using an inefficient kind of approximation. And so now, instead of just relaxing, I have to spend my entire weekend and most of the week figuring out a better way of doing it. I hate it when this happens >_<.
Okay, maybe not.
Sarcasm aside, it is an interesting read. While the standard way of calculating a sine – via a lookup table – works and works well, there's just something unsatisfying about it. The LUTbased approach is just … dull. Uninspired. Cowardly. Inelegant. In contrast, finding a suitable algorithm for it requires effort and a modicum of creativity, so something like that always piques my interest.
In this case it's sine approximation. I'd been wondering about that when I did my arctan article, but figured it would require too many terms to really be worth the effort. But looking at Mr Schraut's post (whose site you should be visiting from time to time too; there's good stuff there) it seems you can get a decent version quite rapidly. The article centers around the work found at devmaster thread 5784, which derived the following two equations:
(1) 
These approximations work quite well, but I feel that it actually uses the wrong starting point. There are alternative approximations that give more accurate results at nearly no extra cost in complexity. In this post, I'll derive higherorder alternatives for both. In passing, I'll also talk about a few of the tools that can help analyse functions and, of course, provide some source code and do some comparisons.
1 Theory
1.1 Symmetry
The first analytical tool is symmetry. Symmetry is actually one of the most powerful concepts ever conceived. Symmetry of time leads to the conservation of energy; symmetry of space leads to conservation of momentum; in a 3D world, symmetry of direction gives rise to the inverse square law. In many cases, symmetry basically defines the kinds of functions you're looking for.
One kind of symmetry is parity, and functions can have parity as well. Take any function f(x). A function is even if f(−x) = f(x); it is odd if f(−x) = −f(x).
This may not sound impressive, but a function's parity can be a great source of information and a way of error checking. For example, the product of two odd or even functions is an even function, and an oddeven product is odd (compare positive/negative number products). If in a calculation you notice this doesn't hold true, then you know there's an error somewhere.
Symmetry can also significantly reduce the amount of work you need to do. Take the next sum, for example.
(2) 
If you find something like this in the wild on on a test, your first thought might be “WTF?!?” (assuming you don't run away screaming). As it happens, y = 0, for reasons of symmetry. The function is odd, so the parts left and right of x = 0 cancel out. Instead of actually trying to do the whole calculation, you can just write down the answer in one line: “0, cuz of symmetry”.
Another property of symmetrical functions is that, if you break them down into series expansions, odd functions will only have odd terms, and even functions only have even terms. This becomes important in the next subsection.
1.2 Polynomial and Taylor expansions
Every function can be broken down into a sum of more manageable functions. One fairly obvious choice for these subfunctions is increasing powers of x: polynomials. The most common of these is Taylor series, which uses a reference point (a, f(a)) and extrapolates to another point some distance h away by using the derivatives of f at the reference point. In equation form, it looks like this:
(3) 
Chances are you've actually used part of the Taylor series in game programming. On implementing movement with acceleration, you'll often see something like Eq 4. These are the first three terms of the Taylor expansion.
(4) 
Ihe stepsize (h in Eq 3 and Δt in Eq 4) is small, the higherorder terms will have less effect on the end result. This allows you to cut the expansion short at some point. This leaves you with a short equation that you do the calculations with and some sort of error term, composed of the part you have removed. The error term is usually linked to the order you've truncated the series at; the higher the order, the more accurate the approximation.
(5) 
If you work out the math for a sine Taylor series, with a = 0 as the reference point, you end up with Eq 6.
(6) 
Note that all the even powers are conspicuously absent. This is what I meant by symmetry being useful: a sine function is odd, therefore only odd terms are needed in the expansion. But there's more to it than that. The accuracy is given by the highest order in the approximating polynomial. This shows that there's just no point in even starting with any evenpowered polynomial, because you can get one extra order basically for free!
This is why using a quadratic approximation for a sine is somewhat useless; a cubic will have two terms as well, and be more accurate to boot. Just because it's curved doesn't mean a parabola is the most suitable approximation.
1.3 Curve fitting (and a 3rd order example)
Using the Taylor series as a basis for a sine approximation is nice, but it also has a problem. The series is meant to have an infinite number of terms and when you truncate the series, you will lose some accuracy. Of course, this was to be expected, but this isn't the real problem; the real problem is that if your function has some crucial points it must pass through (which is certainly true for trigonometry functions), the truncation will move the curve away from those points.
To fix this, you need to use a polynomial with asyet unknown coefficients (that is, multipliers to the powers) and a set of conditions that need to be satisfied. These conditions will determine the exact value of the coefficients. The Taylor expansion can serve as the basic for your initial approximation, and the final terms should be pretty close to the Taylor coefficients.
Let's try this for a thirdorder (cubic) sine approximation. Technically, a thirdorder polynomial means four unknowns, but, since the sine is odd, all the coefficients for the even powers are zero. That takes care of half the coefficients already. I told you symmetry was useful :). The starting polynomial is reduced to Eq 7, which has two coefficients a and b that have to be determined. For good measure I've also added the derivative, as that's often useful to have as well.
(7) 
Two unknowns means we need two conditional to solve the system. The most useful conditions are usually the behaviour at the boundaries. In the case of a sine, that means look at x = 0 and/or x = ½π. The latter happens to be more useful here, so let's look at that. First, sin(½π) = 1, so that's a good one. Also, we know that at ½π a sine is flat (a derivative of 0). This is the second condition.
The conditions are listed in Eq 8. Solving this system is rather straightforward and will give you values for a and b, which are also given in Eq 8. Notice that the values are roughly 5% and 30% away from the pure Taylor coefficients.
(8) 
The final equation is then:
(9) 
In Fig 1 you can see a number of different approximations to the sine. Note that I've done a little coordinate transformation for the xaxis: z = x/(½π), so z = 1 means x = ½π. The benefit of this will become clear later.
As you can see, the third order Taylor expansion starts out allright, but veers off course near the end. In contrast, the thirdorder fit matches the sine at both end points. There is also the secondorder fit from the devmaster site. As you can see, the thirdorder approximation is closer.
Now, please remember that coefficients from Eq 8 are not the only ones you can use. The conditions define what the values will be; different conditions lead to different values. For example, instead using the derivative at ½π, I could have used it at x = 0. This forms the set of equations of Eq 10 and, as you can see, the coefficients are now different. This set is actually more accurate (a 0.6% average error instead of 1.1%), but it also has some rather unsavoury characteristics of having a maximum that's not at ½π and goes over 1.0; this can be really unsettling if you intend to use the sine in something like rotation.
(10) 
1.4 Dimensionless variables and coordinate transformations
For higher accuracy, a higherorder polynomial should be used. Before doing that, though, I'd like to mention one more trick that can make your mathematical analysis considerably easier: dimensionless variables.
The problem with most quantities and equations is units. Metres, feet, litres, gallons; those kinds of units. Units suck. For one, there are different units for identical quantities which can be a total pain to convert and can sometimes lead to disaster. Literally. Then there's the fact that the unit sizes are basically picked at random and have nothing to do with the physical situation they're used for. So you have weird values for constants like G in Newton's law of universal gravitation, the speed of light c and the Planck constant, h. Keeping track of these things in equations is annoying, especially since they tend to pile up and everybody would rather that they'd just go away!
Enter dimensionless variables. The idea here is that instead of using standard units, you express quantities as ratios to some meaningful size. For example, in relativity you often get v/c : velocity over speed of light. Equations become much simpler if you just denote velocities as fractions of the speed of light: β = v/c. Using β in the equations simplifies them immensely and has the bonus that you're not tied to any specific speedunit anymore.
The dimensionless variable is a type of coordinate transformation. In particular, it's a scaling of the original variable into something more useful. Another useful transformation is translation: moving the variable to a more suitable position. We will come accross this later; but first: an example of dimensionless variables.
A sine wave has lots of symmetry lines, all revolving around the quartercircles. Because of this, the term that keeps showing up everywhere is ½π. This is the characteristic size of the wave. By using z = x/(½π), all those important points are now at integral z values. Having ones in your equations is generally a good thing because they tend to disappear in multiplications. Look at what Eq 9 becomes when expressed in terms of z
(11) 
Doesn't that look a lot nicer? It goes deeper than that though. With dimensionless units, the units your measurements are in simply cease to matter! For angles, this means that whether you're working in radians, degrees or brads, they'll all result in the same circlefraction, z. This makes converting algorithms to fixedpoint notation considerably easier.
2 Derivations and implementations
In the section above, I discussed the tools used for analysis and gave an example of a cubic approximation. In this section I'll also derive highaccuracy fourth and fifth order approximations and show some implementations. Before that, though, there's some terminology to go through.
Since multiple different approximations will be covered, there needs to be a way to separate all of them. In principle, the sine approximation will be named S_{n}, where n is the order of the polynomial. So that'll give S_{2} to S_{5}. I will also use S_{4d} for the fourthorder approximation from devmaster. In the derivation of my own fourthorder function, I'll use C_{n}, because what will actually be derived is a cosine.
Thirdorder implementation
Let's start with finishing up the story of the thirdorder approximation. The main equation for this is Eq 11. Because this equation is still rather simple, I'll make this a fixedpoint implementation. The main problem with turning a floatingpoint function into a fixedpoint one is keeping track of the fixedpoint during the calculations, always making sure there's no overflow, but no underflow either. This is one of the reasons why I wrote Eq 11 like it is: by using nested parentheses you can maximize the accuracy of intermediate calculations and possibly minimize the number of of intermediate calculations and possibly minimize the number of operations to boot.
To correctly account for the fixedpoint positions, you need to be aware of the following factors:
 The scale of the outcome (i.e., the amplitude): 2^{A}
 The scale on the inside the parentheses: 2^{p}. This is necessary to keep the multiplications from overflowing.
 The anglescale: 2^{n}. This is basically the value of ½π in the fixedpoint system. Using x for the angle, you have z = x/2^{n}.
Filling this into Eq 11 will give the following:
(12) 
with r = 2n−p and s = n+p+1−A. These represent the fixedpoint shifts you need to apply to keep everything on the level. With p as high as multiplication with x will allow and the standard libnds units leads to the following numbers.
A  n  p  r  s 

12  13  15  11  17 
That's the calculation necessary for the first quadrant, but the domain of a sine is infinite. To get the rest of the domain, you can use the symmetries of the sine: the 2π periodicity and the ½π mirror symmetries. The first is taken care of by doing z % 4. This reduces the domain to the four quadrants of a circle. The next part is somewhat tricky, so pay attention.
Look at Fig 2. S_{3} works for quadrant 0. Because it's antisymmetric, it will also correctly calculate quadrant 3, which is equivalent to quadrant −1. Quadrants 1 and 2 are the problem. As you can see in Fig 2, what needs to happen is for those quadrants to mirror onto quadrants 0 and −1. A reflection of x at D is defined by Eq 13. In this case, that means that z = 2 − z
(13) 
Some test need to be done to see when the reflection should take place. The quadrant numbers in binary are 00, 01, 10, 11. If you build a truthtable around that, you'll see that a XOR of the two bits will do the trick. If you really want to show off, you can combine the periodicity modulo and the quadrant test by doing the arithmetic in the top bits. The implementation is now complete.
/// @param x Angle (with 2^15 units/circle)
/// @return Sine value (Q12)
s32 isin_S3(s32 x)
{
// S(x) = x * ( (3<<p)  (x*x>>r) ) >> s
// n : Qpos for quarter circle 13
// A : Qpos for output 12
// p : Qpos for parentheses intermediate 15
// r = 2np 11
// s = A1pn 17
static const int qN = 13, qA= 12, qP= 15, qR= 2*qNqP, qS= qN+qP+1qA;
x= x<<(30qN); // shift to full s32 range (Q13>Q30)
if( (x^(x<<1)) < 0) // test for quadrant 1 or 2
x= (1<<31)  x;
x= x>>(30qN);
return x * ( (3<<qP)  (x*x>>qR) ) >> qS;
}
And, of course, there's an assembly version as well. It's only ten instructions, which I think is actually shorter than a LUT+lerp implementation.
@ A sine approximation via a thirdorder approx.
@ @param r0 Angle (with 2^15 units/circle)
@ @return Sine value (Q12)
.arm
.align
.global isin_S3a
isin_S3a:
mov r0, r0, lsl #(3013)
teq r0, r0, lsl #1
rsbmi r0, r0, #1<<31
mov r0, r0, asr #(3013)
mul r1, r0, r0
mov r1, r1, asr #11
rsb r1, r1, #3<<15
mul r0, r1, r0
mov r0, r0, asr #17
bx lr
Radians?
Oh wait, the requirement was for the input to be in Q12 radians,
right? Weeell, that's no biggy. You just have to do the
x → z conversion yourself. Take, say,
2^{20}/(2π). Multiply x by this gives z
as a Q30 number; exactly what the first line in the C code resulted in.
This means that all you have to do is change the first line to
`x *= 166886;
'.
NDS special
The assembly version given above uses standard ARM instructions, but
one of the interesting things is that the NDS' ARM9 core has special
multiplication instructions. In particular, there is the
SMULWx
instruction, which does a word*halfword
multiplication, where the halfword can be either the top or bottom
halfword of operand 2.The main result is 32×16→48 bits
long, of which only the top 32 bits are put in the destination
register. Effectively it's like a*b>>16 without
overflow problems. As a bonus, it's also slightly faster than the
standard MUL
. By slightly changing the parameters,
the downshift factors r and s can be made 16, fitting
perfectly with this instruction, although the internal accuracy is
made slightly worse. Additionally, careful placement of each
instruction can avoid the interlock cycle that happens for
multiplications.
The alternate isin_S3a()
becomes:
@ A sine approximation via a thirdorder sine, using special ARM9 instructions
@ @param r0 Angle (with 2^15 units/circle)
@ @return Sine value (Q12)
.arm
.align
.global isin_S3a9
isin_S3a9:
mov r0, r0, lsl #(3013) @ x ; Q30
teq r0, r0, lsl #1
rsbmi r0, r0, #1<<31
smulwt r1, r0, r0 @ y=x*x ; Q30*Q14/Q16 = Q28
mov r2, #3<<13 @ B_14=3/2
sub r1, r2, r1, asr #15 @ 3/2y/2 ; Q14+Q28/Q14/2
smulwt r0, r1, r0 @ ; Q14*Q14/Q16 = Q12
bx lr
Technically it's only two instruction less, but is quite a bit
faster due to the difference in speed between MUL
and SMULWx
.
2.1 Highprecision, fifth order
The third order approximation actually still has a substantial error, so it may be useful to use an additional term. This would be the fifthorder approximation, S_{5}. It and its derivative are given in Eq 14.
(14) 
To find the terms, I will again use z instead of x. The conditions of note are the position and derivative at z = 1 and the derivative at 0. With these conditions the approximation should behave amicably at both edges.
(15) 
Notice that these equations are linear with respect to a, b and c, which means that it can be solved via matrices. Technically this system of equations forms a 3×3 matrix, but since a is already immediately known it can be reduced to a 2×2 system. I'll spare you the details, but it leads to the coefficients of Eq 16. Note the complete absence of any horrid π^{5} terms that would have appeared if you had decided not to use dimensionless terms.
(16) 
(17) 
Eq 17 is the final quintic approximation in the form that's most accurate and easiest to implement. The implementation is basically an extension of the S_{3} function and left as an exercise for the reader.
2.2 High precision, fourth order
Lastly, a fourthorder approximation. Normally, I wouldn't even consider this for a sine (odd function == odd power series and all that), but since the devmaster post uses them and they even seem to work, there seems to be something to them after all.
The reason those approximations work is simple: they don't actually approximate a sine at all; they approximate a cosine. And, because of all the symmetries and parallels with sines and cosines, one can be used to implement the other.
(18) 
Eq 18 is the transformation you need to perform to turn a cosine into a sine wave. This can be easily done in at the start of an algorithm. What's left is to derive a cosine approximation. Because a cosine is even, only even powers will be needed. The base form and its derivative are given in Eq 19.
(19) 
For the conditions, we once again look at z = 0 and z = 1, which comes down to the eqt of equations in Eq 20. One of the interesting thing about even functions is that the derivative at 0 is zero, so that's a freebie. A very important freebie, as it means that one of the required symmetries happens automatically.
(20) 
The resulting set of coefficients are listed in Eq 21. Note that b = c+1, which may be of use later. The final equation for the fourth order cosine approximation is Eq 22. Only three MULs and two SUBs; nice.
(21) 
(22) 
Implementation
The floatingpoint implementation of Eq 22 is again too easy to mention here, so I'll focus on fixedpoint variations. Like with S_{3}, you can mix and match fixedpoint positions until you get something you like. In this case I'll stick to Q14 for almost everything to keep things simple.
The real trick here is to find out what you need to do about all the other quadrants. Cutting down to four quadrants is, again, easy. For the rest, remember that the cosine approximation calculates the top quadrants and you need to flip the sign for the bottom quadrants. If you think in terms of the parameter that a sine gets, you see that only for odd semicircles the sign needs to change. Tracing this can be done with a single bitwise AND or a clever shift.
/// @param x angle (with 2^15 units/circle)
/// @return Sine value (Q12)
s32 isin_S4(s32 x)
{
int c, x2, y;
static const int qN= 13, qA= 12, B=19900, C=3516;
c= x<<(30qN); // Semicircle info into carry.
x = 1<<qN; // sine > cosine calc
x= x<<(31qN); // Mask with PI
x= x>>(31qN); // Note: SIGNED shift! (to qN)
x= x*x>>(2*qN14); // x=x^2 To Q14
y= B  (x*C>>14); // B  x^2*C
y= (1<<qA)(x*y>>16); // A  x^2*(Bx^2*C)
return c>=0 ? y : y;
}
And an ARM9 assembly version too. As it happens, it's only two
instuctions longer than isin_S3a9()
.
@ A sine approximation via a fourthorder cosine
@ @param r0 Angle (with 2^15 units/circle)
@ @return Sine value (Q12)
.arm
.align
.global isin_S4a9
isin_S4a9:
movs r0, r0, lsl #(3113) @ r0=x%2 <<31 ; carry=x/2
sub r0, r0, #1<<31 @ r0 = 1.0 ; sin <> cos
smulwt r1, r0, r0 @ r1 = x*x ; Q31*Q15/Q16=Q30
ldr r2,=14016 @ C = (1pi/4)<<16
smulwt r0, r2, r1 @ C*x^2>>16 ; Q16*Q14/Q16 = Q14
add r2, r2, #1<<16 @ B = C+1
rsb r0, r0, r2, asr #2 @ B  C*x^2 ; Q14
smulwb r0, r1, r0 @ x^2 * (BC*x^2) ; Q30*Q14/Q16 = Q28
mov r1, #1<<12
sub r0, r1, r0, asr #16 @ 1  x^2 * (BC*x^2)
rsbcs r0, r0, #0 @ Flip sign for odd semicircles.
bx lr
3 Testing
Deriving approximations is nice and all, but there's really no point unless you do some sort of test to see how well they perform. I'll look at two things: accuracy and some speedtests. For the speedtest, I'll only consider the functions given here along with some traditional ones. The accuracy test is done only for the first quadrant and in floatingpoint, but the results should carry over well to a fixedpoint case. Finally, I'll show how you can optimize the functions for accuracy.
3.1 Third and fourthorder speed
For the speed test I calculated the sine at 256 points for x ∈ [0, 2π). There will be some loopoverhead in the numbers, but it should be small. Tests were performed on the NDS.
Functions under investigation are the three S_{3} and
two S_{4} functions given earlier. I've also tested
the standard floatingpoint sin()
library function,
the libnds sinLerp()
and my own isin()
function that you can find in
arctan:sine.
The cumulative and average times can be found in
Table 1.
Function (thumb/ARM)  Total cycles  average cycles 

sin (F)  300321  1175.1 
sinLerp (T)  10051  39.2 
isin (T)  7401  28.9 
isin_S3 (T)  5267  20.5 
isin_S4 (T)  6456  25.2 
isin_S3a (A)  3438  13.4 
isin_S3a9 (A)  2591  10.1 
isin_S4a9 (A)  3123  12.1 
The first thing that should be clear is just why we don't use the floatingpoint sine. I mean, seriously. There is also a clear difference between the Thumbcompiled and ARM assembly versions, the latter being significantly faster.
Within the compiled versions, I find it interesting to see that the algorithmic calculations are actually faster than the LUT+lerpbased implementations. I guess loading all those numbers from memory really does suck.
And then there's the assembly versions. Wow. Compared to the compiled version they're twice as fast, and up to four times as fast as the LUTbased functions.
The cycletimes from Table 1 do not make sense
if you count instruction cycles. For example, for isin_S3a
the function overhead alone should already be around 10 cycles. The
thing here is that the numbers are taken from the hardware timers,
which use the busfrequency (33 MHz) rather than the ARM9 cpu (66 MHz).
As such, it measures in halfcycles. For details, see
gbatek:ndstimings.
3.2 Accuracy
Fig 4 shows all the approximations in one graph. It only shows one quadrant because the rest can be retrieved by symmetry. I've also scaled the sine and its approximations by 2^{12} because that's the scale that usual fixedpoint scale right now. And to be sure, yes, this is a different chart than Fig 1; it's just hard to tell because the fourth and fifth order functions are virtually identical to the real sine line.
For the highaccuracy approximations, it's better to look at Fig 5, which shows the errors. Here you can clearly see a difference between S_{4d} and S_{5}, the latter is roughly 3 times better.
There's also a large difference between the devmaster fourthorder sine and my own. The reason behind this is a difference in conditions. In my case, I've fixed the derivatives at both endpoints, which always results in an over or underestimate. The devmaster's S_{4d} let go of those conditions and minimized the error. I'll also do this in the next subsection.
Table 2 and Table 3 list some interesting statistics about the various approximations, namely the minimum, average and maximum errors. It also contains a Root Mean Square Deviation (RMSD), which is a special kind of distance. If you consider the datapoints as a vector, the RMSD is the average Pythagorean length for each point. Table 2 is normed to 2^{12}, whereas Table 3 is table for the traditional floatingpoint sine scale.
The RMSD values are probably the most useful to look at. From them you can see that there is a huge gap between the lowaccuracy and highaccuracy functions of about a factor 60. And if you do your math right, all it costs is one multiplication and one addition, and maybe some extra shifts in the fixedpoint case. That's quite a bargain. Compared to that, the difference between the odd and even functions is somewhat meager: only a factor three or so. Still, it is something.
If you look at the fixedpoint table, you can see that the error you make with S_{4d} and S_{5} is in the single digits. This means that this is probably accurate enough for practical purposes. Combined with the fact that even fifth order polynomials can be made pretty fast, this makes them worth considering over LUTs.


3.3 Optimizing higherorder approximations
From the charts, you can see that S_{4} and S_{5} all err on the same side of the sine line. You can increase the accuracy of the approximation by tweaking the coefficients in such a way that the errors are redistributed in a preferable way. Two methods are possible here: shoot for a zero error average, or minimize the RMSD. Technically minimizing the RMSD is standard (it comes down to leastsquares optimization), but because a zeroaverage allows for an analytical solution, I'll use that. In any case, the differences in outcomes will be small.
First, think of what an average of a function means. The average of a set of numbers is the sum divided by the size of the set. For functions, it's the integral of that function divided by the interval. When you want a zeroaverage for an approximation, the integral of the function and that of the approximation should be equal. With a polynomial approximation to a sine, we get:
(23) 
with a_{n} reducing to the coefficients of the polynomials we had before. This can be used as an alternate condition to the derivative at 0. For S_{4} and S_{5}, you'll end up with the following coefficients.
(24) 
(25) 
If you're still awake and remember the devmaster S_{4d} coefficients, there should be something familiar about a_{4}. Yes, they're practically identical. If you optimize S_{4} for the RMSD, you actually get the exact same function as S_{4d}.
Table 4 shows the statistics for the original approximations and the new optimized versions, S_{4o} and S_{5o}. The numbers for S_{4o} are basically those from S_{4d} seen earlier. More interesting are the details for S_{5o}. The maximum and minimum errors are now within ±1. That is to say, this approximation gives values that are at most 1 off from the proper Q12 sine. This is about as good as any Q12 approximation is able to get.
min  avg  max  rmsd  

S4  0  5.87  11.4  7.11 
S5  0  0.74  1.616  0.94 
S4o  4.72  0  2.89  2.47 
S5o  0.73  0  0.79  0.52 
4 Summary and final thoughts
Here's a few things to take from all this.
 Symmetry is your friend.
 When constructing a polynomial approximation, more terms mean higher accuracy. Symmetry properties of the function approximated allow you to remove terms from consideration, simplifying the equation.
 Coordinate transformations are your friends too. Sometimes it's much easier to work on a scaled or moved version of the original problem. If your situation has a characteristic length (or time, velocity, whatever) consider using dimensionless variables: expressing parameters as ratios of the characteristic length. This makes the initial units pretty much irrelevant. For angles, think circlefractions.
 Zero and one (0 and 1) are the best values to have in your equations, as they tend to vanish to easily.
 Any approximation formula will have coefficients to be determined. In general, the Taylor series terms are not the best set; values slightly offset from these terms will be better as they can correct for the truncation. To determine the values of the coefficients, define some conditions that need to be satisfied. Examples of conditions are values of the function and its derivative at the boundaries, or its integrals. Or you can wuss out and just dump the thing in the Excel Solver.
 When converting to fixedpoint, accuracy and overflow comes into the fray. If you know the domain of the function beforehand, you can optimize for accuracy. Also, it helps if you construct the algorithm in a sort of recursive form instead of a pure polynomial: not ax + bx^{2} but x(a + xb). Ordered like this, each new additional term only requires one multiplication and one addition extra.

For fixedpoint work,
SMULWx
is teh awesome.  Even a fourth order (and presumably fifth order as well) polynomial implementation in C is faster than the LUTbased sines on the NDS. And specialized assembly versions are considerably faster still.
 The difference in accuracy of S_{4} vs S_{2} or S_{5} vs S_{3} is huge: a factor of 60. Going from an even to the next odd approximation only gains you a factor 3. Shame; I'd hoped it'd be more.
 Unlike I initially thought, the evenpowered polynomials work out quite well. This is because they're actually modified cosine approximations.
Exercises for the reader
 Express the parabolic approximation S_{2}(x) of Eq 1 in terms of z. 's Not hard, I promise.
 Implement the fixedpoint version of the fifthorder sine approximation, S_{5}(x).
 For the masochists: derive the coefficients for S_{5}(x) without dimensionless variables. That is to say, with the conditions at x = ½π instead of z = 1.
 Solve Eq 24 and Eq 25 for minimal RMDS. Also, try to derive an analytical form for minimal RMDS; I think it's exists, but it may be tricky to come up with the right form.
Okay, I've uploaded two files in a zip that you can take a look at. You can find them here.
Fixedpoint just means : apply a scaling factor to everything. A Q12 (12bit fixedpoint number) value means : scale everything by 2^{12}. So sin(18°) * 4096 = 1265 = 04F1h. 18° is 0.05 circle. Look up that value in the spreadsheet to see the real sin value and compare it with isin_s3.
No, this isn't how it works. You have a circle. What you want as the domain is a full run around the circle. For radians, that'd be [0, 2π). Or [−π, π), it doesn't really matter. What matters is: you want one run around the circle. Everything else just maps back to that.
When you normalize that run, you get a natural [0, 1) domain, where 0 means the starting point, ¼ means ¼ circle, etc. In Q15, that [0, 1) domain translates to [0, 2^{15}).
What you're proposing is a [1, 1) domain, which covers two circles. Because everything just repeats, there's no need to cover positive and negative directions separately, because the sine over [1, 0) is identical to that over [0, 1) (and [1, 2), etc). This periodicity is automatic if you use 16bit variables and a 2^{16}unit circle. ¼circle then automatically maps to +¾circle, or vice versa via integer overflow.
Heh. Relevant : http://imgur.com/gallery/iFctMlX