# Another fast fixed-point sine approximation

So here I am, looking forward to a nice quiet weekend; hang back, watch some telly and maybe read a bit – but NNnnneeeEEEEEUUUuuuuuuuu!! Someone had to write an interesting article about sine approximation. With a challenge at the end. And using an inefficient kind of approximation. And so now, instead of just relaxing, I have to spend my entire weekend and most of the week figuring out a better way of doing it. I hate it when this happens >_<.

Okay, maybe not.

Sarcasm aside, it is an interesting read. While the standard way of calculating a sine – via a look-up table – works and works well, there's just something unsatisfying about it. The LUT-based approach is just … dull. Uninspired. Cowardly. Inelegant. In contrast, finding a suitable algorithm for it requires effort and a modicum of creativity, so something like that always piques my interest.

In this case it's sine approximation. I'd been wondering about that when I did my arctan article, but figured it would require too many terms to really be worth the effort. But looking at Mr Schraut's post (whose site you should be visiting from time to time too; there's good stuff there) it seems you can get a decent version quite rapidly. The article centers around the work found at devmaster thread 5784, which derived the following two equations:

 (1) $\begin{eqnarray} S_2(x) &=& \frac4\pi x - \frac4{\pi^2} x^2 \\ \\ S_{4d}(x) &=& (1-P)S_2(x) + P S_2^2(x) \end{eqnarray}$

These approximations work quite well, but I feel that it actually uses the wrong starting point. There are alternative approximations that give more accurate results at nearly no extra cost in complexity. In this post, I'll derive higher-order alternatives for both. In passing, I'll also talk about a few of the tools that can help analyse functions and, of course, provide some source code and do some comparisons.

## 1 Theory

### 1.1 Symmetry

The first analytical tool is symmetry. Symmetry is actually one of the most powerful concepts ever conceived. Symmetry of time leads to the conservation of energy; symmetry of space leads to conservation of momentum; in a 3D world, symmetry of direction gives rise to the inverse square law. In many cases, symmetry basically defines the kinds of functions you're looking for.

One kind of symmetry is parity, and functions can have parity as well. Take any function f(x). A function is even if f(−x) = f(x); it is odd if f(−x) = −f(x).

This may not sound impressive, but a function's parity can be a great source of information and a way of error checking. For example, the product of two odd or even functions is an even function, and an odd-even product is odd (compare positive/negative number products). If in a calculation you notice this doesn't hold true, then you know there's an error somewhere.

Symmetry can also significantly reduce the amount of work you need to do. Take the next sum, for example.

 (2) $y = \int_{-N}^N sin^7(x^3) + \frac{x^5}{x^2+1} - x e^{\frac{x^2}{2\sigma^2}} dx$

If you find something like this in the wild on on a test, your first thought might be “WTF?!?” (assuming you don't run away screaming). As it happens, y = 0, for reasons of symmetry. The function is odd, so the parts left and right of x = 0 cancel out. Instead of actually trying to do the whole calculation, you can just write down the answer in one line: “0, cuz of symmetry”.

Another property of symmetrical functions is that, if you break them down into series expansions, odd functions will only have odd terms, and even functions only have even terms. This becomes important in the next subsection.

### 1.2 Polynomial and Taylor expansions

Every function can be broken down into a sum of more manageable functions. One fairly obvious choice for these sub-functions is increasing powers of x: polynomials. The most common of these is Taylor series, which uses a reference point (af(a)) and extrapolates to another point some distance h away by using the derivatives of f at the reference point. In equation form, it looks like this:

 (3) $\begin{eqnarray} f(a+h) &=& f(a) + f'(a) h + \frac{f''(a)}{2}h^2 + \frac{f'''(a)}{6} h^3 + ... \\ \\ \\ &=& \sum_{n=0} \frac{f^{(n)}(a)}{n!}h^n \end{eqnarray}$

Chances are you've actually used part of the Taylor series in game programming. On implementing movement with acceleration, you'll often see something like Eq 4. These are the first three terms of the Taylor expansion.

 (4) $x_{new} = x_{old} \:+\: v \Delta t \:+\: \frac12 a (\Delta t)^2$

The step-size (h in Eq 3 and Δt in Eq 4) is small, the higher-order terms will have less effect on the end result. This allows you to cut the expansion short at some point. This leaves you with a short equation that you do the calculations with and some sort of error term, composed of the part you have removed. The error term is usually linked to the order you've truncated the series at; the higher the order, the more accurate the approximation.

 (5) $f(a+h) = f(a) + f'(a) h + \frac{f''(a)}{2}h^2 + \frac{f'''(a)}{6} h^3 + O(h^4)$

If you work out the math for a sine Taylor series, with a = 0 as the reference point, you end up with Eq 6.

 (6) $sin(h) = h \,-\, \frac16 h^3 \,+\, \frac1{5!} h^5 \,-\, \frac1{7!} h^7 \,+\, ...$

Note that all the even powers are conspicuously absent. This is what I meant by symmetry being useful: a sine function is odd, therefore only odd terms are needed in the expansion. But there's more to it than that. The accuracy is given by the highest order in the approximating polynomial. This shows that there's just no point in even starting with any even-powered polynomial, because you can get one extra order basically for free!

This is why using a quadratic approximation for a sine is somewhat useless; a cubic will have two terms as well, and be more accurate to boot. Just because it's curved doesn't mean a parabola is the most suitable approximation.

### 1.3 Curve fitting (and a 3rd order example)

Using the Taylor series as a basis for a sine approximation is nice, but it also has a problem. The series is meant to have an infinite number of terms and when you truncate the series, you will lose some accuracy. Of course, this was to be expected, but this isn't the real problem; the real problem is that if your function has some crucial points it must pass through (which is certainly true for trigonometry functions), the truncation will move the curve away from those points. Example for a sine, it's really important that sin(π/2) = 1. With Taylor, that simply isn't the case (see Fig 1).

To fix this, you need to use a polynomial with as-yet unknown coefficients (that is, multipliers to the powers) and a set of conditions that need to be satisfied. These conditions will determine the exact value of the coefficients. The Taylor expansion can serve as the basic for your initial approximation, and the final terms should be pretty close to the Taylor coefficients.

Let's try this for a third-order (cubic) sine approximation. Technically, a third-order polynomial means four unknowns, but, since the sine is odd, all the coefficients for the even powers are zero. That takes care of half the coefficients already. I told you symmetry was useful :). The starting polynomial is reduced to Eq 7, which has two coefficients a and b that have to be determined. For good measure I've also added the derivative, as that's often useful to have as well.

 (7) $\begin{eqnarray} S_3(x) &=& ax - b x^3 &=& x (a - b x^2) \\ S_3'(x) &=& a - 3bx^2 \end{eqnarray}$

Two unknowns means we need two conditional to solve the system. The most useful conditions are usually the behaviour at the boundaries. In the case of a sine, that means look at x = 0 and/or x = ½π. The latter happens to be more useful here, so let's look at that. First, sin(½π) = 1, so that's a good one. Also, we know that at ½π a sine is flat (a derivative of 0). This is the second condition.

The conditions are listed in Eq 8. Solving this system is rather straightforward and will give you values for a and b, which are also given in Eq 8. Notice that the values are roughly 5% and 30% away from the pure Taylor coefficients.

 (8) $\left. \begin{eqnarray} S_3(\frac{\pi}2) &=& 1 &=& \frac{\pi}2 a - (\frac{\pi}2)^3 b \\ S_3'(\frac{\pi}2) &=& 0 &=& a - 3(\frac{\pi}2)^2 b \end{eqnarray} \; \; \rightarrow \; \; \begin{eqnarray} a &=& \frac3\pi &\approx& 0.955 \\ \\ b &=& \frac4{\pi^3} &\approx& 0.129 \end{eqnarray}$

The final equation is then:

 (9) $S_3(x) = \frac3\pi x - \frac4{\pi^3} x^3$

In Fig 1 you can see a number of different approximations to the sine. Note that I've done a little coordinate transformation for the x-axis: z = x/(½π), so z = 1 means x = ½π. The benefit of this will become clear later.

As you can see, the third order Taylor expansion starts out all-right, but veers off course near the end. In contrast, the third-order fit matches the sine at both end points. There is also the second-order fit from the devmaster site. As you can see, the third-order approximation is closer.

Fig 1. Sine approximations using 3rd order Taylor and parabolic cubic polynomials for first quadrant. z = x / ½π

Now, please remember that coefficients from Eq 8 are not the only ones you can use. The conditions define what the values will be: different conditions lead to different values. For example, instead using the derivative at ½π, I could have used it at x = 0. This forms the set of equations of Eq 10 and, as you can see, the coefficients are now different. This set is actually more accurate (a 0.6% average error instead of 1.1%), but it also has some rather unsavoury characteristics of having a maximum that's not at ½π and goes over 1.0; this can be really unsettling if you intend to use the sine in something like rotation.

 (10) $\left. \begin{eqnarray} S_3(\frac{\pi}2) &=& 1 &=& \frac{\pi}2 a - (\frac{\pi}2)^3 b \\ S_3'(0) &=& 1 &=& a \end{eqnarray} \; \; \rightarrow \; \; \begin{eqnarray} a &=& 1 \\ \\ b &=& \frac4{\pi^2}(1-\frac2\pi) \approx 0.147 \end{eqnarray}$

### 1.4 Dimensionless variables and coordinate transformations

For higher accuracy, a higher-order polynomial should be used. Before doing that, though, I'd like to mention one more trick that can make your mathematical analysis considerably easier: dimensionless variables.

The problem with most quantities and equations is units. Metres, feet, litres, gallons; those kinds of units. Units suck. For one, there are different units for identical quantities which can be a total pain to convert and can sometimes lead to disaster. Literally. Then there's the fact that the unit sizes are basically picked at random and have nothing to do with the physical situation they're used for. So you have weird values for constants like G in Newton's law of universal gravitation, the speed of light c and the Planck constant, h. Keeping track of these things in equations is annoying, especially since they tend to pile up and everybody would rather that they'd just go away!

Enter dimensionless variables. The idea here is that instead of using standard units, you express quantities as ratios to some meaningful size. For example, in relativity you often get v/c : velocity over speed of light. Equations become much simpler if you just denote velocities as fractions of the speed of light: β = v/c. Using β in the equations simplifies them immensely and has the bonus that you're not tied to any specific speed-unit anymore.

The dimensionless variable is a type of coordinate transformation. In particular, it's a scaling of the original variable into something more useful. Another useful transformation is translation: moving the variable to a more suitable position. We will come accross this later; but first: an example of dimensionless variables.

A sine wave has lots of symmetry lines, all revolving around the quarter-circles. Because of this, the term that keeps showing up everywhere is ½π. This is the characteristic size of the wave. By using z = x/(½π), all those important points are now at integral z values. Having ones in your equations is generally a good thing because they tend to disappear in multiplications. Look at what Eq 9 becomes when expressed in terms of z

 (11) $\begin{eqnarray} S_3(x) &=& \frac3\pi x - \frac4{\pi^3} x^3 \\ \\ &=& \frac32 \frac{2x}\pi - \frac12 (\frac{2x}\pi)^3 \\ \\ &=& \frac32 z - \frac12 z^3 \\ \\ S_3(z) &=& \frac12z (3 - z^2) \end{eqnarray}$

Doesn't that look a lot nicer? It goes deeper than that though. With dimensionless units, the units your measurements are in simply cease to matter! For angles, this means that whether you're working in radians, degrees or brads, they'll all result in the same circle-fraction, z. This makes converting algorithms to fixed-point notation considerably easier.

## 2 Derivations and implementations

In the section above, I discussed the tools used for analysis and gave an example of a cubic approximation. In this section I'll also derive high-accuracy fourth and fifth order approximations and show some implementations. Before that, though, there's some terminology to go through.

Since multiple different approximations will be covered, there needs to be a way to separate all of them. In principle, the sine approximation will be named Sn, where n is the order of the polynomial. So that'll give S2 to S5. I will also use S4d for the fourth-order approximation from devmaster. In the derivation of my own fourth-order function, I'll use Cn, because what will actually be derived is a cosine.

### Third-order implementation

Let's start with finishing up the story of the third-order approximation. The main equation for this is Eq 11. Because this equation is still rather simple, I'll make this a fixed-point implementation. The main problem with turning a floating-point function into a fixed-point one is keeping track of the fixed-point during the calculations, always making sure there's no overflow, but no underflow either. This is one of the reasons why I wrote Eq 11 like it is: by using nested parentheses you can maximize the accuracy of intermediate calculations and possibly minimize the number of of intermediate calculations and possibly minimize the number of operations to boot.

To correctly account for the fixed-point positions, you need to be aware of the following factors:

• The scale of the outcome (i.e., the amplitude): 2A
• The scale on the inside the parentheses: 2p. This is necessary to keep the multiplications from overflowing.
• The angle-scale: 2n. This is basically the value of ½π in the fixed-point system. Using x for the angle, you have z = x/2n.

Filling this into Eq 11 will give the following:

 (12) $\begin{eqnarray} S_3(z) &=& \frac12 z (3 - z^2) 2^A \\ &=& z (3 - z^2) 2^{A-1} \\ &=& x\cdot2^{-n} (3 - x^2\cdot2^{-2n}) 2^{A-1} \\ &=& x (3 - x^2\cdot2^{-2n}) 2^{A-1-n} \\ &=& x (3\cdot2^p - x^2 2^{p-2n}) 2^{A-n-1-p} \\ S_3(x) &=& x (3\cdot2^p - x^2 / 2^r ) \middle/ 2^s, \end{eqnarray}$

with r = 2np and s = n+p+1−A. These represent the fixed-point shifts you need to apply to keep everything on the level. With p as high as multiplication with x will allow and the standard libnds units leads to the following numbers.

A n p r s
12 13 15 11 17

Fig 2. Mirror domain to the right quadrants.

That's the calculation necessary for the first quadrant, but the domain of a sine is infinite. To get the rest of the domain, you can use the symmetries of the sine: the 2π periodicity and the ½π mirror symmetries. The first is taken care of by doing z % 4. This reduces the domain to the four quadrants of a circle. The next part is somewhat tricky, so pay attention.

Look at Fig 2. S3 works for quadrant 0. Because it's antisymmetric, it will also correctly calculate quadrant 3, which is equivalent to quadrant −1. Quadrants 1 and 2 are the problem. As you can see in Fig 2, what needs to happen is for those quadrants to mirror onto quadrants 0 and −1. A reflection of x at D is defined by Eq 13. In this case, that means that z = 2 − z

 (13) $x = D - (x-D) = 2D-x$

Some test need to be done to see when the reflection should take place. The quadrant numbers in binary are 00, 01, 10, 11. If you build a truth-table around that, you'll see that a XOR of the two bits will do the trick. If you really want to show off, you can combine the periodicity modulo and the quadrant test by doing the arithmetic in the top bits. The implementation is now complete.

/// A sine approximation via a third-order approx.
/// @param x    Angle (with 2^15 units/circle)
/// @return     Sine value (Q12)
s32 isin_S3(s32 x)
{
// S(x) = x * ( (3<<p) - (x*x>>r) ) >> s
// n : Q-pos for quarter circle             13
// A : Q-pos for output                     12
// p : Q-pos for parentheses intermediate   15
// r = 2n-p                                 11
// s = A-1-p-n                              17

static const int qN = 13, qA= 12, qP= 15, qR= 2*qN-qP, qS= qN+qP+1-qA;

x= x<<(30-qN);          // shift to full s32 range (Q13->Q30)

if( (x^(x<<1)) < 0)     // test for quadrant 1 or 2
x= (1<<31) - x;

x= x>>(30-qN);

return x * ( (3<<qP) - (x*x>>qR) ) >> qS;
}

And, of course, there's an assembly version as well. It's only ten instructions, which I think is actually shorter than a LUT+lerp implementation.

@ ARM assembly version, using n=13, p=15, A=12

@ A sine approximation via a third-order approx.
@ @param r0   Angle (with 2^15 units/circle)
@ @return     Sine value (Q12)
.arm
.align
.global isin_S3a
isin_S3a:
mov     r0, r0, lsl #(30-13)
teq     r0, r0, lsl #1
rsbmi   r0, r0, #1<<31
mov     r0, r0, asr #(30-13)
mul     r1, r0, r0
mov     r1, r1, asr #11
rsb     r1, r1, #3<<15
mul     r0, r1, r0
mov     r0, r0, asr #17
bx      lr

Oh wait, the requirement was for the input to be in Q12 radians, right? Weeell, that's no biggy. You just have to do the x → z conversion yourself. Take, say, 220/(2π). Multiply x by this gives z as a Q30 number; exactly what the first line in the C code resulted in. This means that all you have to do is change the first line to `x *= 166886;'.

#### NDS special

The assembly version given above uses standard ARM instructions, but one of the interesting things is that the NDS' ARM9 core has special multiplication instructions. In particular, there is the SMULWx instruction, which does a word*halfword multiplication, where the halfword can be either the top or bottom halfword of operand 2.The main result is 32×16→48 bits long, of which only the top 32 bits are put in the destination register. Effectively it's like a*b>>16 without overflow problems. As a bonus, it's also slightly faster than the standard MUL. By slightly changing the parameters, the down-shift factors r and s can be made 16, fitting perfectly with this instruction, although the internal accuracy is made slightly worse. Additionally, careful placement of each instruction can avoid the interlock cycle that happens for multiplications.

The alternate isin_S3a() becomes:

@ Special ARM assembly version, using n=13 and lots of Q14

@ A sine approximation via a third-order sine, using special ARM9 instructions
@ @param r0   Angle (with 2^15 units/circle)
@ @return     Sine value (Q12)
.arm
.align
.global isin_S3a9
isin_S3a9:
mov     r0, r0, lsl #(30-13)    @ x                 ; Q30
teq     r0, r0, lsl #1
rsbmi   r0, r0, #1<<31

smulwt  r1, r0, r0              @ y=x*x             ; Q30*Q14/Q16 = Q28
mov     r2, #3<<13              @ B_14=3/2
sub     r1, r2, r1, asr #15     @ 3/2-y/2           ; Q14+Q28/Q14/2
smulwt  r0, r1, r0              @                   ; Q14*Q14/Q16 = Q12

bx      lr

Technically it's only two instructions less, but is quite a bit faster due to the difference in speed between MUL and SMULWx.

### 2.1 High-precision, fifth order

The third order approximation actually still has a substantial error, so it may be useful to use an additional term. This would be the fifth-order approximation, S5. It and its derivative are given in Eq 14.

 (14) $\begin{eqnarray} S_5(x) &=& ax - b x^3 + c x^5 \\ \\ \\ S_5'(x) &=& a - 3b x^2 + 5c x^4 \end{eqnarray}$

To find the terms, I will again use z instead of x. The conditions of note are the position and derivative at z = 1 and the derivative at 0. With these conditions the approximation should behave amicably at both edges.

 (15) $\begin{eqnarray} S_5(z=1) &=& 1 &=& a &-& b &+& c \\ \\ \\ S'_5(z=1) &=& 0 &=& a &-& 3b &+& 5c \\ \\ \\ S'_5(z=0) &=& \frac\pi2 &=& a \end{eqnarray}$

Notice that these equations are linear with respect to a, b and c, which means that it can be solved via matrices. Technically this system of equations forms a 3×3 matrix, but since a is already immediately known it can be reduced to a 2×2 system. I'll spare you the details, but it leads to the coefficients of Eq 16. Note the complete absence of any horrid π5 terms that would have appeared if you had decided not to use dimensionless terms.

 (16) $\begin{eqnarray} a &=& \pi/2 \\ \\ b &=& \pi - 5/2 \\ \\ c &=& \pi/2 - 3/2 \end{eqnarray}$
 (17) $S_5(z) = \frac12 z (\pi - z^2 [ (2\pi-5) - z^2 (\pi - 3) ] )$

Eq 17 is the final quintic approximation in the form that's most accurate and easiest to implement. The implementation is basically an extension of the S3 function and left as an exercise for the reader.

### 2.2 High precision, fourth order

Fig 3. Sine and cosine.

Lastly, a fourth-order approximation. Normally, I wouldn't even consider this for a sine (odd function == odd power series and all that), but since the devmaster post uses them and they even seem to work, there seems to be something to them after all.

The reason those approximations work is simple: they don't actually approximate a sine at all; they approximate a cosine. And, because of all the symmetries and parallels with sines and cosines, one can be used to implement the other.

 (18) $\begin{eqnarray} sin(x) &=& cos(x - \pi/2) \\ \\ sin(z) &=& cos(z - 1) \end{eqnarray}$

Eq 18 is the transformation you need to perform to turn a cosine into a sine wave. This can be easily done in at the start of an algorithm. What's left is to derive a cosine approximation. Because a cosine is even, only even powers will be needed. The base form and its derivative are given in Eq 19.

 (19) $\begin{eqnarray} C_4 (x) &=& a - b x^2 + c x^4 \\ \\ \\ C_4'(x) &=& - 2b x + 4c x^3 \end{eqnarray}$

For the conditions, we once again look at z = 0 and z = 1, which comes down to the eqt of equations in Eq 20. One of the interesting thing about even functions is that the derivative at 0 is zero, so that's a freebie. A very important freebie, as it means that one of the required symmetries happens automatically.

 (20) $\begin{eqnarray} C_4(z=0) &=& 1 &=& a \\ \\ \\ C_4(z=1) &=& 0 &=& a &-& b &+& c \\ \\ \\ C'_4(z=1) &=& -\frac\pi2 &=& &-& 2b &+& 4c \end{eqnarray}$

The resulting set of coefficients are listed in Eq 21. Note that b = c+1, which may be of use later. The final equation for the fourth order cosine approximation is Eq 22. Only three MULs and two SUBs; nice.

 (21) $\begin{eqnarray} a &=& 1 \\ \\ b &=& 2 - \pi/4 \\ \\ c &=& 1 - \pi/4 \end{eqnarray}$
 (22) $C_4(z) = 1 - z^2 [ (2-\pi/4) - z^2 (1-\pi/4) ]$

#### Implementation

The floating-point implementation of Eq 22 is again too easy to mention here, so I'll focus on fixed-point variations. Like with S3, you can mix and match fixed-point positions until you get something you like. In this case I'll stick to Q14 for almost everything to keep things simple.

The real trick here is to find out what you need to do about all the other quadrants. Cutting down to four quadrants is, again, easy. For the rest, remember that the cosine approximation calculates the top quadrants and you need to flip the sign for the bottom quadrants. If you think in terms of the parameter that a sine gets, you see that only for odd semi-circles the sign needs to change. Tracing this can be done with a single bitwise AND or a clever shift.

/// A sine approximation via a fourth-order cosine approx.
/// @param x   angle (with 2^15 units/circle)
/// @return     Sine value (Q12)
s32 isin_S4(s32 x)
{
int c, x2, y;
static const int qN= 13, qA= 12, B=19900, C=3516;

c= x<<(30-qN);              // Semi-circle info into carry.
x -= 1<<qN;                 // sine -> cosine calc

x= x<<(31-qN);              // Mask with PI
x= x>>(31-qN);              // Note: SIGNED shift! (to qN)
x= x*x>>(2*qN-14);          // x=x^2 To Q14

y= B - (x*C>>14);           // B - x^2*C
y= (1<<qA)-(x*y>>16);       // A - x^2*(B-x^2*C)

return c>=0 ? y : -y;
}

And an ARM9 assembly version too. As it happens, it's only two instuctions longer than isin_S3a9().

@ ARM assembly version of S4 = C4(gamma-1), using n=13, A=12 and ... miscellaneous.

@ A sine approximation via a fourth-order cosine
@ @param r0   Angle (with 2^15 units/circle)
@ @return     Sine value (Q12)
.arm
.align
.global isin_S4a9
isin_S4a9:
movs    r0, r0, lsl #(31-13)    @ r0=x%2 <<31       ; carry=x/2
sub     r0, r0, #1<<31          @ r0 -= 1.0         ; sin <-> cos
smulwt  r1, r0, r0              @ r1 = x*x          ; Q31*Q15/Q16=Q30

ldr     r2,=14016               @ C = (1-pi/4)<<16
smulwt  r0, r2, r1              @ C*x^2>>16         ; Q16*Q14/Q16 = Q14
add     r2, r2, #1<<16          @ B = C+1
rsb     r0, r0, r2, asr #2      @ B - C*x^2         ; Q14
smulwb  r0, r1, r0              @ x^2 * (B-C*x^2)   ; Q30*Q14/Q16 = Q28
mov     r1, #1<<12
sub     r0, r1, r0, asr #16     @ 1 - x^2 * (B-C*x^2)
rsbcs   r0, r0, #0              @ Flip sign for odd semi-circles.

bx      lr

## 3 Testing

Deriving approximations is nice and all, but there's really no point unless you do some sort of test to see how well they perform. I'll look at two things: accuracy and some speed-tests. For the speed-test, I'll only consider the functions given here along with some traditional ones. The accuracy test is done only for the first quadrant and in floating-point, but the results should carry over well to a fixed-point case. Finally, I'll show how you can optimize the functions for accuracy.

### 3.1 Third and fourth-order speed

For the speed test I calculated the sine at 256 points for x ∈ [0, 2π). There will be some loop-overhead in the numbers, but it should be small. Tests were performed on the NDS.

Functions under investigation are the three S3 and two S4 functions given earlier. I've also tested the standard floating-point sin() library function, the libnds sinLerp() and my own isin() function that you can find in arctan:sine. The cumulative and average times can be found in Table 1.

Table 1: sine cycle-times (roughly).
Function (thumb/ARM) Total cycles average cycles
sin (F) 300321 1175.1
sinLerp (T) 10051 39.2
isin (T) 7401 28.9
isin_S3 (T) 5267 20.5
isin_S4 (T) 6456 25.2
isin_S3a (A) 3438 13.4
isin_S3a9 (A) 2591 10.1
isin_S4a9 (A) 3123 12.1

The first thing that should be clear is just why we don't use the floating-point sine. I mean, seriously. There is also a clear difference between the Thumb-compiled and ARM assembly versions, the latter being significantly faster.

Within the compiled versions, I find it interesting to see that the algorithmic calculations are actually faster than the LUT+lerp-based implementations. I guess loading all those numbers from memory really does suck.

And then there's the assembly versions. Wow. Compared to the compiled version they're twice as fast, and up to four times as fast as the LUT-based functions.

NDS timers measure half-cycles

The cycle-times from Table 1 do not make sense if you count instruction cycles. For example, for isin_S3a the function overhead alone should already be around 10 cycles. The thing here is that the numbers are taken from the hardware timers, which use the bus-frequency (33 MHz) rather than the ARM9 cpu (66 MHz). As such, it measures in half-cycles. For details, see gbatek:nds-timings.

### 3.2 Accuracy

Fig 4 shows all the approximations in one graph. It only shows one quadrant because the rest can be retrieved by symmetry. I've also scaled the sine and its approximations by 212 because that's the scale that usual fixed-point scale right now. And to be sure, yes, this is a different chart than Fig 1; it's just hard to tell because the fourth and fifth order functions are virtually identical to the real sine line.

Fig 4. Taylor and second to fifth order approximations to 212sin(x).

For the high-accuracy approximations, it's better to look at Fig 5, which shows the errors. Here you can clearly see a difference between S4d and S5, the latter is roughly 3 times better.

There's also a large difference between the devmaster fourth-order sine and my own. The reason behind this is a difference in conditions. In my case, I've fixed the derivatives at both end-points, which always results in an over- or underestimate. The devmaster's S4d let go of those conditions and minimized the error. I'll also do this in the next sub-section.

Fig 5. Errors for Fig 4.

Table 2 and Table 3 list some interesting statistics about the various approximations, namely the minimum, average and maximum errors. It also contains a Root Mean Square Deviation (RMSD), which is a special kind of distance. If you consider the data-points as a vector, the RMSD is the average Pythagorean length for each point. Table 2 is normed to 212, whereas Table 3 is table for the traditional floating-point sine scale.

The RMSD values are probably the most useful to look at. From them you can see that there is a huge gap between the low-accuracy and high-accuracy functions of about a factor 60. And if you do your math right, all it costs is one multiplication and one addition, and maybe some extra shifts in the fixed-point case. That's quite a bargain. Compared to that, the difference between the odd and even functions is somewhat meager: only a factor three or so. Still, it is something.

If you look at the fixed-point table, you can see that the error you make with S4d and S5 is in the single digits. This means that this is probably accurate enough for practical purposes. Combined with the fact that even fifth order polynomials can be made pretty fast, this makes them worth considering over LUTs.

Table 2: error statistics for 212sin(x) approx.
min avg max rms
Taylor3 -302.1 -51.5 0 92.7
S2 0 123.1 229.4 146.8
S3 -82.0 -47.6 0 55.0
S4d -4.47 0.19 3.11 2.44
S4 0 5.87 11.4 7.11
S5 0 0.74 1.62 0.94
Table 3: error statistics in percentages.
min% avg% max% rms%
Taylor3 -7.37 -1.26 0 2.26
S2 0 3 5.6 3.58
S3 -2 -1.16 0 1.34
S4d -0.11 0.0047 0.076 0.06
S4 0 0.143 0.278 0.174
S5 0 0.018 0.039 0.023

### 3.3 Optimizing higher-order approximations

From the charts, you can see that S4 and S5 all err on the same side of the sine line. You can increase the accuracy of the approximation by tweaking the coefficients in such a way that the errors are redistributed in a preferable way. Two methods are possible here: shoot for a zero error average, or minimize the RMSD. Technically minimizing the RMSD is standard (it comes down to least-squares optimization), but because a zero-average allows for an analytical solution, I'll use that. In any case, the differences in outcomes will be small.

First, think of what an average of a function means. The average of a set of numbers is the sum divided by the size of the set. For functions, it's the integral of that function divided by the interval. When you want a zero-average for an approximation, the integral of the function and that of the approximation should be equal. With a polynomial approximation to a sine, we get:

 (23) $\begin{eqnarray} \int_0^1 \sum_n a_n x^n dx &=& \int_0^1 sin(x\pi/2) dx \;\; \rightarrow \\ \\ \sum_n \frac{a_n}{n+1} &=& 2/\pi , \end{eqnarray}$

with an reducing to the coefficients of the polynomials we had before. This can be used as an alternate condition to the derivative at 0. For S4 and S5, you'll end up with the following coefficients.

 (24) $\begin{eqnarray} a_4 &=& 1 \\ b_4 &=& c_4+1 \\ c_4 &=& 5(1-\frac3\pi) \approx 0.225351707 \end{eqnarray}$
 (25) $\begin{eqnarray} a_5 &=& 4(\frac3\pi - \frac9{16}) \approx 1.569718634 \\ b_5 &=& 2 a_5 - 5/2 \\ c_5 &=& a_5 - 3/2 \end{eqnarray}$

If you're still awake and remember the devmaster S4d coefficients, there should be something familiar about a4. Yes, they're practically identical. If you optimize S4 for the RMSD, you actually get the exact same function as S4d.

Table 4 shows the statistics for the original approximations and the new optimized versions, S4o and S5o. The numbers for S4o are basically those from S4d seen earlier. More interesting are the details for S5o. The maximum and minimum errors are now within ±1. That is to say, this approximation gives values that are at most 1 off from the proper Q12 sine. This is about as good as any Q12 approximation is able to get.

Table 4: Optimized Q12 S4 and S5.
min avg max rmsd
S4 0 5.87 11.4 7.11
S5 0 0.74 1.616 0.94
S4o -4.72 0 2.89 2.47
S5o -0.73 0 0.79 0.52

Fig 6. Errors for optimized S4o and S5o.

## 4 Summary and final thoughts

Here's a few things to take from all this.

• When constructing a polynomial approximation, more terms mean higher accuracy. Symmetry properties of the function approximated allow you to remove terms from consideration, simplifying the equation.
• Coordinate transformations are your friends too. Sometimes it's much easier to work on a scaled or moved version of the original problem. If your situation has a characteristic length (or time, velocity, whatever) consider using dimensionless variables: expressing parameters as ratios of the characteristic length. This makes the initial units pretty much irrelevant. For angles, think circle-fractions.
• Zero and one (0 and 1) are the best values to have in your equations, as they tend to vanish to easily.
• Any approximation formula will have coefficients to be determined. In general, the Taylor series terms are not the best set; values slightly offset from these terms will be better as they can correct for the truncation. To determine the values of the coefficients, define some conditions that need to be satisfied. Examples of conditions are values of the function and its derivative at the boundaries, or its integrals. Or you can wuss out and just dump the thing in the Excel Solver.
• When converting to fixed-point, accuracy and overflow comes into the fray. If you know the domain of the function beforehand, you can optimize for accuracy. Also, it helps if you construct the algorithm in a sort of recursive form instead of a pure polynomial: not ax + bx2 but x(a + xb). Ordered like this, each new additional term only requires one multiplication and one addition extra.
• For fixed-point work, SMULWx is teh awesome.
• Even a fourth order (and presumably fifth order as well) polynomial implementation in C is faster than the LUT-based sines on the NDS. And specialized assembly versions are considerably faster still.
• The difference in accuracy of S4 vs S2 or S5 vs S3 is huge: a factor of 60. Going from an even to the next odd approximation only gains you a factor 3. Shame; I'd hoped it'd be more.
• Unlike I initially thought, the even-powered polynomials work out quite well. This is because they're actually modified cosine approximations.

1. Express the parabolic approximation S2(x) of Eq 1 in terms of z. 's Not hard, I promise.
2. Implement the fixed-point version of the fifth-order sine approximation, S5(x).
3. For the masochists: derive the coefficients for S5(x) without dimensionless variables. That is to say, with the conditions at x = ½π instead of z = 1.
4. Solve Eq 24 and Eq 25 for minimal RMDS. Also, try to derive an analytical form for minimal RMDS; I think it's exists, but it may be tricky to come up with the right form.

## 93 thoughts on “Another fast fixed-point sine approximation”

1. EDIT: 2009-07-28

Tightened the code from isin_S3a9. It's now a little shorter and a little faster.

2. I don't quite get how you got (24) and (25). You say "[Eq (23)] can be used as an alternate condition to the derivative at 0. For S4 and S5, you'll end up with the following coefficients", so I calculated S_5(x) with conditions at Pi/2, and setting the average of S_5(x)-Sin(x) over [0,Pi/2] to 0.

For S_5(x) = ax-bx^3+cx^5, conditions for S_5(Pi/2)==Sin(Pi/2), S_5'(Pi/2)==Sin'(Pi/2) and Integrate[S_5(x)-Sin(x),{x,0,Pi/2}]==0, I find
S_5(x) = (6 (2 + 3 Pi) x)/(7 Pi^2) - (4 (24 + Pi) x^3)/(7 Pi^4) - (48 (-4 + Pi) x^5)/(7 Pi^6)
S_5(x) = (8 (-(9/16) + 3/Pi) x)/Pi - (8 (-(5/2) + 8 (-(9/16) + 3/Pi)) x^3)/Pi^3 + (32 (-(3/2) + 4 (-(9/16) + 3/Pi)) x^5)/Pi^5

How exactly did you get (24) and (25)?

BTW, as a response to "exercise 4", there exists an analytical solution for min. RMS, that is to say if you replace the condition Integrate[S_5(x)-Sin(x),{x,0,Pi/2}]==0 with Integrate[(S_5(x)-Sin(x))^2,{x,0,Pi/2}]==0. The exact result Mathematica gave is quite lengthy, but my actual point is there are no real roots for a,b or c.

3. First, don't use the raw sine for this stuff; use the transformation I mentioned. The function I'm actually approximating is sin(½πx), not sin(x). This changes the interval to [0,[ ]1], which makes all the nasty-looking π/2 terms disappear. This helps. A lot.

Now, the conditions for the optimized S5o were:

• Equal values at x=1 : S(1) = sin(½π 1)
• Equal derivatives at x=1 : S'(1) = ½πcos(½π 1) = 0
• Equal areas over interval 0,1 : INT(S(x),x,0,1) = INT(sin(½πx),x,0,1) = π/2

Up to this point I hope it's clear. The neat thing about power series is that the derivatives and the integrals are really easy to determine. For xn you get n·xn−1 for the derivative and xn+1/(n+1) for the integral. And because I'm using the range [0,1], all these power will disappear anyway, since 1n==1. So the equations you get are:

$\begin{eqnarray} S(1) &=& 1 &=& a - b + c \\ S'(1) &=& 0 &=& a - 3b + 5c \\ INT(S,x,0,1) &=& \frac2\pi &=& \frac12 a - \frac14 b + \frac16 c \end{eqnarray}$

Notice that these are linear equations with respect to the coefficients. In other words, they can be solved via matrices. (I hope these equations show up alright; apparently the standard WP filters don't like the mimetex plugin either.)

$A = \begin{bmatrix} 1 & -1 & 1 \\ 1 & -3 & 5 \\ 1/2 & -1/4 & 1/6 \end{bmatrix} ,\;\; \mathbf{y} = \begin{bmatrix} 1 \\ 0 \\ 2/\pi \end{bmatrix} ,\;\; \mathbf{x} = \begin{bmatrix} a \\ b \\ c \end{bmatrix} ,\;\; A\mathbf{x} = \mathbf{y} \rightarrow \mathbf{x} = A^{-1} \mathbf{y}$

Use any method you like to get the matrix inverse, then multiply with the condition vector y and you have your coefficients. Apparently, the inverse is this:

$A^{-1} = \begin{bmatrix} -2\frac14 & \frac14 & 6 \\ 7 & -1 & -12 \\ -3\frac34 & \frac34 & 6 \end{bmatrix}$

So a = −2¼*1 + ¼*0 + 6*(2/π) = 12/π − 9/4 = 4(3/π − 9/16). The numerical values of b and c can be found the same way, but I preferred to express them in terms of a because it's prettier. You can do a similar thing for the 4th order approximation. (Also, the matrix A does not actually depend on the function were approximating; you can write down a generalized version for power series (with all terms or only odd/even parts) and you can find the coefficients for any function. I'm thinking of doing a post about this as well.)

I think you may have entered the data incorrectly somewhere. There is no way terms like 7π² should appear in the equations.

As to the RMSD thing, the goal was to find coefficients for which the RMDS was minimal, not zero. It can never be zero exactly because, thanks to the square, function integrated is always positive. For an analytical solution, you'd have to differentiate to the coefficients. I expect that to be all manner of ugly, and from what you say Mathematica agrees with me here.

4. > First, don't use the raw sine for this stuff; use the transformation I mentioned. The function I'm actually approximating is sin(½?x), not sin(x). This changes the interval to [0, 1], which makes all the nasty-looking ?/2 terms disappear. This helps. A lot.
Whoa, whoa, back off dude! I just did what your (3rd) "exercise" says!

> There is no way terms like 7?² should appear in the equations.
Why not? They actually do, when you calculate S(x) and not S(gamma).

I actually thought the condition I used were different from yours, so I didn't double check what I was doing. Apparently I messed up with mathematica, albeit I don't even remember anything at all (I was real sleepy back then). Probably used a variable that was already defined somewhere between. Anyway, I just re-run it in a clean page and confirmed your results.

S[x_] = l x - m x^3 + n x^5

Solve[{
S[Pi/2] - Sin[Pi/2] == 0,
S'[Pi/2] - Sin'[Pi/2] == 0,
Integrate[S[x] - Sin[x], {x, 0, Pi/2}] == 0
}, {l, m, n}]

and I got this:

{{l -> -((3 (-16 + 3 Pi))/(2 Pi^2)),
m -> -((8 (-24 + 7 Pi))/Pi^4),
n -> -((24 (-16 + 5 Pi))/Pi^6)}}

which is the same thing as S(gamma) = a gamma - (2a - 5/2) gamma^3 + (a-3/2) gamma^3
where a = 4(3/pi - 9/16).
RMS=0? Gee hell, I sure do stupid things when I'm sleepy. Anyway, I minimized it. The minimum RMS of error is 0.0000419183 which occurs at a=1.57044, b=0.64271 c=0.0724334 (I have the exact analytical expressions ---lenghty expressions with plenty of pi's). Manual solution isn't that nasty, however. Stick with Liebniz rule, and worst terms you'll have are intergrals of x^3 Sin[Pi x/2] and x^5 Sin[Pi x/2] from 0 to 1.

And hey, can "by-passers" input latex too? I tried math tag, but didn't give anything in the preview :(

5. > First, don't use the raw sine for this stuff; use the transformation I mentioned. The function I'm actually approximating is sin(½πx), not sin(x). This changes the interval to [0, 1], which makes all the nasty-looking π/2 terms disappear. This helps. A lot.
Whoa, whoa, back off dude! I just did what your (3rd) "exercise" says!

Sorry, I thought you meant how did I solve it in general. For the exercise, yeah, that was the requirement (though using mathematica is technically cheating :P ). There's actually an easy way to convert between the two. For every term, you have something like γm = (x*2/π)m, where m=1,3,5,... . For the coefficients in the x case, you just move the (2/π)m out into the coefficients.

> There is no way terms like 7π² should appear in the equations.
Why not? They actually do, when you calculate S(x) and not S(gamma).

The power of the πs in a coefficient should be equal to that of the power of x. But I had forgot that there was a also a π between brackets as well, so yeah, it does work out after all. I'm still not sure about the 7 though.

As to the analytical solution, I'd realized that it's possible as well via something like

$\frac{d}{da} \int_0^1 f^2(x,a) dx = \int_0^1 \frac{d}{da} f^2(x,a) dx = \int_0^1 2\frac{df(x,a)}{da} f(x,a) dx = etc.$

I just didn't know it had a name. I'm still not sure how that works out exactly though, since there are relations between the coefficients as well. That makes the derivative w.r.t a a little tricky.

Latex should work too now, but it uses [ tex ] tags, not [ math ]. I should probably see about [ quote ] tags too, but nested quotes make that a little trickier than the rest.

6. I created a chord calculator the can be used to approximate the sine curve thru 45 degrees. here the formula:

side_c = sqr( ( .5 / (90 / deg) ) * ( .5 / (90/deg) ) * 2 )

error_correction = (90-deg)*(deg/90) * .00125

side_c = side_c + error_correction

sine = side_c

this calculates chords which a chord is the sine of half the degrees the chord of 90 is the sine of 45.
so you have to enter (degrees x 2) because it will plot at half the entered degrees.

loop thru 90 deg to get 45 deg.

7. On the above post the cosine can be stated as :

cosine = sqr(1- sine * sine)

here is an arc formula that i believe plots a parabola. if i'm not mistaken.

y = (180-deg)*(deg/180) * .0222

it deviates from a true sine curve on the sides by about 5 degrees wider than the sine curve.

8. Albert, a simpler way to write that approximation would be:

$sin(\small\frac12\normalsize\pi x) = x\sqrt{x} + \sqrt{2} x (1-x)$

I've tested it and it does work well. It's about twice as accurate as the third-order approximation, S3(x). However, there are two problems with it.

First, it uses a square root, which is not a simple operation. If it's not in the instruction set of the processor you're working with, it will probably cost at least as much as calculating a sine.

Second, even if you have a fast square root and assuming it's as fast as a multiplication (which is a big assumption), you'd have to spend 4 MULs worth of cycles. Since the fifth order function also uses 4 multiplications and is considerably more accurate, you'd be better off using that.

9. '' Albert's attempt at a 400 degree circle
'' written in Freebasic for windows from:
'' http://www.freebasic.net

dim as double deg,sine,cosine

screen 19

xctr = 305
yctr = 305

y = 0
for deg = 0 to 100 step .5

sine = (10000 - ((100-deg) * (100-deg))) *.0001 *(.8 + (.00208*deg))
cosine = (10000 - (deg*deg)) *.0001 *(.999 - (.0005*deg))
holdx = x

line(xctr,yctr)-(xctr+ x,yctr+ -y),14
line(xctr,yctr)-(xctr+ x,yctr+ y),14
line(xctr,yctr)-(xctr+ -x,yctr+-y),14
line(xctr,yctr)-(xctr+ -x,yctr+ y),14

next

sleep

10. A guy named Richard,from the FreeBasic forum gave me this formula.
It has almost perfect degree spacing.(off by a few hundredths.)

Const As Double Pion2 = 2 * Atn(1)
For t = 0 To Pion2 Step pion2/100 '''''divided pion2 by required degrees.
tt = t * t
s = t * ( 1 + tt * ( -0.166 + tt * 7.589986441271250e-3)) ' odd function
c = ( 1 + tt * ( -0.496 + tt * 3.676551227144884e-2)) ' even function

Pset (xctr + x, yctr - y), 13
Pset (xctr + x, yctr + y), 13
Pset (xctr - x, yctr - y), 13
Pset (xctr - x, yctr + y), 13
Next t

11. 'Albert's newest formula for a 200 degree pi
'the degrees, there are 100 in a quad
' its actually a chord calculator but the chord of a particular degree is the sine of half that degree.
' the degrees are off a little (few hundredths) and the circle is a little flat on top and sides.

dim as double deg = atn(1)/100
For grad = 0 To 100 Step 10

Pset (xctr + x, yctr - y), 9
Next

12. * dons Professor cap.

[br]

Albert, I appreciate what you're trying to do here, but your approach to the subject could use some work. Your expressions are way too complicated, and using more or less random angular units. This makes it very hard to see what they actually say. Also, you're not stating the reasons behind the coefficients, nor doing proper error analysis; with the exception of Richard's, the expressions are worse than the ones in the article.

First, here are the expressions again, but compressed to their essentials. I will use the coordinate transformation z[=]x/(½π), because having the boundaries at 0.0 and 1.0 makes things much easier to analyse. Note how much smaller they are.

[eq] $\begin{eqnarray} S_{a1}(z) &=& az - bz^2 \hspace{20} & a = \sqrt{2}(1+1/(2\pi)) \approx 1.6392 \;;\; b = \sqrt(2)/\pi \approx 0.45 \\ S_{a2}(z) &=& 2z - z^2 \\ S_{a3}(z) &=& (1 - (1-z)^2)(a - b(1-z)) \hspace{20} & a = 1.008 \;;\; b = 0.208 \\ S_{a4}(z) &=& x - 0.166 x^3 + 0.00759 x^5 & \approx 1.5708z - 0.6434z^3 +0.0726 z^5 \\ S_{a5}(z) &=& az - bz^2 \hspace{20} & a = 1.5551 \;;\; b = 0.5598 \\ \end{eqnarray}$ [/eq]

The first thing to remember when analyzing mathematical expressions is to disregard the numbers. The actual numbers don't really matter. You can always move and scale the window you're working on to a simpler interval, like between 0 and 1. Not only do 0 and 1 often cancel in expressions, but have the benefit of not being approximations. What really matters are the order of the expressions and the operations; read through the numbers until it's time for the final implementation.

[br]

Sa1, Sa2 and Sa3 are all parabola with different coefficients. Notice how much smaller and easier to read Sa1 is compared to the original of comment #9. I actually messed up the earlier rewrite because the original form is so awkward with all theparentheses and divisions. Always try to simplify the expressions before you post them. I also managed to figure out that the coefficients of Sa1 actually involve π and √2, something that was very non-obvious from the original expression. I'd very much like to know where these came from.

Sa2 is actually the same as S2 from eq[ ]1, only written in terms of z instead of x. This was essentially the first of the exercises. The coefficients from Sa3 seem of come out of thin air, and do not seem to work as well as the other parabola (more on that later).

Sa3 is … interesting, but in the end it's just another third order polynomial, much like S3. The thing is that Sa1 is more complicated and not as accurate as S3, and doesn't even properly match up at the z[=]1 boundary. By the way, the reason I used (1−z) terms in the equation is because it's actually more of a cosine approximation than one for sines. To turn in into a cosine, change the (1−z) into a simple z.

Sa4 is a basic fifth order polynomial, like discussed in section 2.1, and it in fact more accurate than S5 and even S5o. However, no mention was made on where the coefficients came from and in fact there are more accurate coefficients. For example, using the Solver to minimize the RMSD under the condition S(z=1)[=]1 gives (a,b,c) ≈ (1.57033, 0.64209, 0.07176), for an RMDS of 58e-6, whereas the coefficients of Sa4 result in RMSD=88e-6.You can also use something called [wiki]Chebyshev polynomials[/wiki], which may look kinda funky but includes a method for finding very good coefficients.

Using a different set of conditions or miniziming for some other target error will give different sets of coefficients, all of which will be 'best' for those conditions. And this is something you will have to remember: there is no single 'best' approximation or set of coefficients ; the 'best' depends on what criteria you use. Not specifying your conditions means other people won't know where your coefficients come from. Since their uses may differ from yours, this can lead them up a blind alley.

[br]

That covers the forms of the expressions, now for the error analysis. You mentioned 'degree spacing', but I'm not really sure what that means. If you're talking about that if you evaluate the approximation at some point x, the position of its 'real' value can be found at, say,|Δx| < 0.01°, that's not what I'm seeing. But even if that's not what you mean, a more useful error measurement is in the y-direction: ε[=]approx − real sine. That you can analyse for the items from Table[ ]2, 3 and 4: the minimum, maximum and average error, and the average squared-distance, RMSD.

What you should not do is plot a circle and see if it looks circlish. This is problematic for several reasons. First, above some radius you will get gaps because you will use less angle-iterations then there are pixels for the circumference. This is unavoidable. You'd want something that is sure to cover everything, like Bresenham's [wiki]Midpoint circle algorithm[/wiki]. Second, just plotting the points from the approximation means that you have nothing to compare it to, so you couldn't tell what is correct and what isn't. Added to that is the fact that you're limited to the size of the pixels: With a radius of 100, you can visualise 1% accuracy at best.

Below is a table of relative error statistics, comparable to Table[ ]4. I should probably point out that, even though it's a parabolic approximation, Sa1 gives good results because use the sine-approx for the first octant and √(1-s²) for the second. Splitting it into sine and cosine parts for the first quadrant is a really neat idea, though I'm not sure if it's ultimately better than simply using a higher-order approximation.

min (%) max (%) max (%) rmds (%) -0.417 0.0113 0.299 0.216 0 3.005 5.601 3.584 -3.226 -1.664 0.719 2.0523 -0.014 0.003 0.0014 0.0088 -0.089 0.020 0.095 0.062 -7.84 -4.568 0 5.265

If you're still looking for suitable approximations, keep these points in mind.

• Keep the expressions simple! The correctness of an approximation is mostly determined by the order of the polynomial. No matter how clunky you make an expression, it makes no difference if, at the end of the day, it amounts to a simple parabola. Those are gonna suck no matter what (relative to higher-order, I mean).

• Stay away from raw numbers as much as possible. Raw numbers do not translate well to other situations (see the Gimli glider incident linked in the article), and will almost always only be approximations. Either use 'natural' dimensionless units like radians or scale the situation to a [0,[ ]1] domain or range.

• Before you do anything, decide what 'sufficiently good' means for your situation. Consider things like accuracy at the boundaries, smoothness of the function, calculation time and min/max/overall errors. When showing your work to others, include how you got to your conclusions. Graphs also help, though I understand that's a little hard to put in a comment.

• Above all, make sure it's actually an improvement over what is already available. The idea to divide into octants instead of quadrants is a great idea, but most of the actual approximations you provided was either the equal or worse than what was already
presented.

If you're interested, I've uploaded the Excel file I used for the analysis here. I've condensed it for size, but the essentials are there. The main sheet contains a sizable amount of approximations and their errors. The extra sheet shows an example of how you can calculate coefficients from a set of conditions, in this case for eq[ ]24 and comment #6.

13. This was written like my others in FreeBasic from http://www.freebasic.net
Its a simple rule of thumb that is good to 2 decimal places (off by .003... at 45 degrees.)
generally = angle + opposite angle / ( 90 + (primary angle /8) )

For deg = 0 To 90 Step 1

if deg <=45 then
sine =( (deg*1.5) / (90+(deg/8) ) )
cosine = sqr(1-s*s)
else
cosine = ( ((90-deg)*1.5) / (90+((90-deg)/8)) )
sine = sqr(1-c*c)
end if

line(xctr,yctr)-(x, y),14

next deg

14. Here is my tangent approximation its accurate to .0688 at 35 degrees not good over 45deg it needs 97degrees to get vertical.

atn(1)/ ( atn(1)/ (deg*( atn(1)/45) ) - ((1 - atn(1))/45*deg) )

15. 'Here is my attempt to calculate the hypotenuse "C" of the angles 0 to 45
'Written in freebasic from http://www.freebasic.net

'attempt to duplicate
' tangent * half tangent
' to figure the hypotenuse "C" of the angle

dim as double rad = atn(1)/45
dim as double y1,y2,y3,formula
dim as double deg
print "deg hypotenuse my formula error level"
for deg = 0 to 45 step 1

y3 = 1+ (y1*y2) ' y1*y2 = hypot length between circle and square of 1

print deg ; " - " ; y3 ;" - " ; formula ; " - " ; formula - y3

next deg

sleep

16. written in freebasic from http://www.freebasic.net
FBIDE is available on the links page so you can just cut and paste the code and hit 'f5' to run it.

#define Bignum Double ' implement your bignum maths for these variables

Function Cosine(Byval x As Bignum) As Bignum
Dim As Bignum mxs, term, sigma, last_sigma, divisor
Dim As Integer n
mxs = x * x ' precompute
mxs = - mxs ' minus x squared
term = 1.00
sigma = 1.00
n = 0 ' the only freebasic integer used
Do
n = n + 2 ' very fast integer math
divisor = (n * (n-1)) ' use str(n * (n-1)) to convert integer to bignumber
last_sigma = sigma ' keep a copy of the last sigma
term = term * mxs
term = term / divisor
sigma = sigma + term
Loop Until last_sigma = sigma ' test if no change to sigma
Return sigma
End Function

Function Sine(Byval x As Bignum) As Bignum
Dim As Bignum mxs, term, sigma, last_sigma, divisor
Dim As Integer n
mxs = x * x ' precompute
mxs = - mxs ' minus x squared
term = 1.00
sigma = 1.00
n = 1 ' the only freebasic integer used
Do
n = n + 2 ' very fast integer math
divisor = (n * (n-1)) ' use str(n * (n-1)) to convert integer to bignumber
last_sigma = sigma ' keep a copy of the last sigma
term = term * mxs
term = term / divisor
sigma = sigma + term
Loop Until last_sigma = sigma ' test if no change to sigma
sigma = x/ (1/sigma)
Return sigma
End Function

'work functions
dim as double deg,c,s
for deg = 0 to 360
dim as double rad = atn(1)/45
print deg ;" - " ; s ; " - " ; c ; " - " ; atan2(s,c)/rad
next deg

sleep

17. the following code does not give me the correct value for sine

#include

int isin_S4(int x);

int main(){
int r;
r = isin_S4(-224385); // -224385 = -13.6954*(2^14) in fixed point
printf("r = %d \n ",r);
return 1;
}

//! A sine approximation via a fourth-order cosine approx.
int isin_S4(int x)
{
int c, x2, y;
static const int qN= 13, qA= 12, B=19900, C=3516;

c= x<<(30-qN); // Semi-circle info into carry.
x -= 1< cosine calc

x= x>>(31-qN); // Note: SIGNED shift! (to qN)
x= x*x>>(2*qN-14); // x=x^2 To Q14

y= B - (x*C>>14); // B - x^2*C
y= (1<>16); // A - x^2*(B-x^2*C)

return c>=0 ? y : -y;
}

18. @ anjun

No, it's correct (within a margin of error).

The problem is that it seems I have been a little careless in defining my terms. The basis for everything is that
π = 0x4000. This is the Q14 I'm talking about. For a full rotation it's actually Q15. There's also a few leftover gammas that I need to fix.

In any case, here's how it should work.

-224385 -> -13.6954*pi = -43.0253 rad -> 0x1000*sin(-43.0253) = 3348

which is close enough to the 3353 I get from isin_S4.

Thanks for writing in, though. I really should fix this.

19. I am trying to extend the result into the so called S7o using the matrix inversion method you mentioned. The wierd thing is it only gives worse result than the S5o. The coefficients I have are : [1.509624846, -0.278874537, -0.471125463, 0.240375154]. The conditions I used are:
sin(1) = 1
sin'(1) = 0
sin''(1) = -1
integral sin = 2/PI

20. @ Strong:

Aha! You've fallen for one of calculus' classic pranks :P

[br]

Even though the coordinate transformation z = ½πx makes the math easier, it also hides the fact that in the end I'm still working with f(x) = sin(½πx), which given extra ½π factors in derivatives and primitives. f''(x) is actually −¼π2 sin(z), not just −sin(z). So your third condition should actually be −¼π2. If you try that, you'll see a great benefit.

[br]

You can also try f'(0) = ½π as the third condition. This seems to be twice as accurate overall.

21. Is there a faster way to get a cos out of isin_s4? Here I hijacked the cos->sin shift to flip quadrants 2 and 4. This pair is perfect for implementation in VHDL!

int32_t fast_cos(int32_t x) {
int32_t c, y;
static const int32_t qN= 13, qA= 12, B=19900, C=3516;

c = (x-(1<<qN))<<(30-qN);             // Semi-circle info into carry.

x= x<<(31-qN);              // Mask with PI
x= x>>(31-qN);              // Note: SIGNED shift! (to qN)
x= x*x>>(2*qN-14);          // x=x^2 To Q14

y= B - (x*C>>14);           // B - x^2*C
y= (1<<qA)-(x*y>>16);       // A - x^2*(B-x^2*C)

return c<0 ? y : -y;
}

22. If you mean "can I calculate a sine and a cosine at the same time with this", the answer is no, unfortunately.

I think you can remove the subtraction by (1<<qN) entirely for the cosine, but I can't tell whether that will work right for all quadrants without testing. Other than that, this should be as fast as it will go.

23. If anyone's interested, here's a vhdl implementation of sin.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity sin is
port (clk : in std_logic;
rst : in std_logic;
input : in unsigned(31 downto 0);   -- [0, 2*pi] == [0, 2^15-1]
output : out signed(31 downto 0);   -- [-1, 1] == [-2^13, 2^13]
done : out std_logic
);
end sin;

architecture behav of sin is
constant qN : integer := 13;
constant qA : integer := 12;
constant B : integer := 19900;
constant C : integer := 3516;

signal state : unsigned(3 downto 0) := x"0";
signal x : signed(31 downto 0) := x"00000000";
signal y : signed(31 downto 0) := x"00000000";
signal output_l : signed(31 downto 0) := x"00000000";
signal done_l : std_logic := '0';

begin
process(clk) begin
if rising_edge(clk) then
case state is
when x"0" => x <= signed(input) - to_signed(2**qN, 32);        -- shift sin -> cos
when x"1" => x <= shift_left(x, 31-qN);                        -- wrap around PI = 2^13
when x"2" => x <= shift_right(x, 31-qN);
when x"3" => x <= shift_right(x * x, 2*qN-14)(31 downto 0);    -- x=x^2 To Q14
when x"4" => y <= shift_right(x * C, 14)(31 downto 0);         -- C*x^2
when x"5" => y <= B - y;                                       -- B - C*x^2
when x"6" => y <= shift_right(x * y, 16)(31 downto 0);         -- x^2 * (B - C*x^2)
when x"7" => y <= (2**qA) - y;                                 -- A - x^2 * (B - C*x^2)
when x"8" =>
if (shift_left(signed(input), 30-qN) < 0) then
y <= -y;
end if;
when others => null;
end case;

if (rst /= '1') then
if (state /= x"9") then
state <= state + 1;
done_l <= '0';
output_l <= x"00000000";
else
done_l <= '1';
output_l <= y(31 downto 0);
end if;
else
state <= x"0";
done_l <= '0';
output_l <= x"00000000";
end if;
end if;
end process;

done <= done_l;
output <= output_l;

end behav;

24. It might interest you to know that the Atari 800 BASIC interpreter uses this polynomial for sin: pi/2*x - .6459640867*x^3 + .0796926239*x^5 - .004681754355*x^7 + .000260442752*x^9 - .00000354149939*x^11. (x is scaled so that 1 means pi/2). It's good to 7+ digits. If you do a google search for ".6459640867" you'll find the 6502 assembly language source code for this. In the case of the Atari, the floating point system is 10 BCD digits (it uses the decimal mode of the 6502).

25. I was in the process of writing a simple FM synthesizer for ARM, and went searching for a fixed-point sine approximation without a LUT and found this.
Absolutely wonderful post on so many levels, thank you for writing it!

26. I really really wish I could follow all this. I'm trying to cook up my own fixed point sin function for 64 bit integers, with pi being 1 << 60 or so...

Understand more of this would help. Should have paid more attention in math class.

27. Hi
I guess for the fig-1 Taylor implementation uses only two terms i.e.,
sin(h) = h - (1/6)*(h^3)
Ofcourse its fair to compare a third order polynomial approximation with a third-order equation from Taylor series. s(z) = z(1-(z^2)/6).

@joe
The implementation from ATARI is nothing but Taylor series implementation with higher order terms.

One advantage of using Taylor series with higher order coefficients would be that both Cosine and Sine values at a given point be calculated simultaneously. Very useful for VHDL implementation. h^2 is an intermediate value while computing h^3 and so on..
-Vinay

28. The error of S5 is provided, but not its speed.

29. I do not understand transformation from eq. 11 to eq. 12.
Probably I lack fixed-point math backgorund.
Could you give me a link to a a fixed-point math text that enables me to understand the eq. 11 to eq. 12 trasnformation?

Best regards
Daro Binacchi

30. Awesome site!! I worked some more on the function in Brian's post because Code Composer Studio generated very inefficient assembly code. I also liked the idea of having a function that can easily be changed from 32bit to 16bit or 64bit without much modification. So below it is. To change it to 16 or 64 bit simply change the data type for x and y.

int32_t cos32(int32_t x)
{
int16_t c = x;
if (x & 0x2000) x |= -16384;
else x &= 16383;
x = x*x>>12;
int32_t y = 19900 - (x*3516>>14);
y = 0x1000 - (x*y>>16);
return (c-0x2000) & 0x4000 ? y : -y;
}
#define sin32(x) cos32((x)-0x2000)

31. I think there is an aditional "4" at the beginning of Eq. 12.

Shouldn't it be ...(3-z^2)...

Regards,

Walter

32. I believe equation 11 has an error. The end result is correct, but there is a typo on the third line of the first form of the equation.

It currently shows: (1/2)z - (1/2)z^3
It should show: (3/2)z - (1/2)z^3

I only mention it because I started working backwards and couldn't figure out how you got from line 3 to line 4. So I then started at the beginning and couldn't see how you got from line 2 to line 3. So I worked it out for myself and viola...typo.

33. Also, I am a little confused about the powers of 2 (A, n, p) near eq. 12. I am afraid I don't really understand how these numbers were derived. Would you explain this?

I am assuming that n=13 is arbitrary and the other powers are derived from that somehow, but I am missing something because I cannot seem to derive these numbers. In the text, it says something like choose the largest value of p allowed by multiplication with x. Maybe this is the thing I am missing as I do not understand the consequences of this statement.

34. Argh, sorry it took so long, Phocks. Been a little busy the last week.

Both A and n are arbitrary. They're they fixed point bits of the output and the input, respectively.

p is a tricky one, yes. It's something of a controller value. At all times, the results of the multiplications must not overflow. First, you have z2, which can be controlled via n. But you also have a multiplication of z·(3 − z2). That's where p comes in. With it you can keep the right-hand side as large as possible.

That it has a value of 15 is also somewhat arbitrary. The multiplication result needs to have 30 fractional bits at max, and with n=13, I think that p can be as high as 30−n, but I haven't tested that.

I've expanded the derivation a little so that it is easier to follow. I've also fixed the type you mentioned in the other comment, thanks :D

35. Could you please also provide some examples for isin_S3 and isin_S4, the values for x and it's corresponding result. I would like to write some testcases to these functions. Also could you please give little information on how to choose a right number for x for my test cases. Should x be in degrees or radians?

Does the current implementation of iSin_S3 and isin_S4 works for all quadrants?

With your answer as on 2011-05-11(your reply to the person anjum), It is not clear to me the reason behind taking pi(angle notatation) as 0x4000 and how you derived at 0x1000*sin(-43.0253) from -224385.
Back to the same question, what needs to be referred to choose -13.6954? Is (n.2^-e), e fixed to 14(180deg) always?

36. Could you please also provide some examples for isin_S3 and isin_S4, the values for x and its corresponding result. I would like to write some testcases to these functions. Also could you please give little information on how to choose a right number for x for my test cases. Should x be in degrees or radians?

For writing tests, it's easiest to create an excel/google sheet and just enter the formulas there. All the shifts will become *2k multiplications, and the divisions must be integer divisions, but that's it. The only trouble might be the 32-bit overflow conversions, but I think a MOD will work as well. I still have an excel sheet with all of this here, but it has all the calculations, and it's rather messy as it is. I'll see if I can condense it for your needs if you want.

As for radians or degrees: neither. When it comes to trigonometry, the units are rather arbitrary – what you really need to consider is circle fractions. The sine function is defined as the y-coordinate as you walk along the perimeter of the unit circle. From that it follows that sin(0) is 0, and sin(¼ circle) is 1. That's what's important to know about the sine function. It's just convention that generally sin() is implemented in radians (which is wrong because pi is wrong, but oh well) because you have to have some unit, but technically it doesn't matter if you use something else.

In my particular case, I've chosen 0x8000 for a full circle because that's what libnds used. This is wrong as well, because the natural thing would be to use 0x10000 and circle fractions would be a natural fraction of a 16-bit integer, but that's how it goes.

Does the current implementation of iSin_S3 and isin_S4 works for all quadrants

Yes. The equations in this article only work properly for quadrant I (well, and IV), but you can get the other quadrants through mirroring. That's what the shifts with (30-qN) and such are for. Basically, the format for the fixed-point angles is:

xqqi iiii iiii iiii

x : # revolutions or sign-bit.

In the functions, I use the two quadrant bits to determine the sign of the return value, and the in-quadrant angle to calculate the sign magnitude. The result covers the whole circle.

how you derived at 0x1000*sin(-43.0253) from -224385. Back to the same question, what needs to be referred to choose -13.6954?

To be honest, I can't really say where the -13.6954 comes from; why he chose that as a value to test with confused me as well. From what I can make out, he meant to use that as -13.6954*π, which translates to -6.8477 circles (again, here's why pi is wrong). This then translates to -6.8477*0x8000 = -224385.

Is (n.2^-e), e fixed to 14(180deg) always?

No, it's rather arbitrary. All that matters is in how many pieces you want to divide a circle, whether that's 2π (radians), 360 (degrees) or 0x8000 (libnds angles), or 0x10000 (natural 16-bit angles). Just use something that's convenient.

37. I've just added some comments to the functions describing the input and output. Apologies that they were absent before.

As for test values, the exact values don't really matter, but you can take the standard angle-sines for reference, adjusted for scale. Note that the values for the isin() variants won't be exact. They are approximations, after all.

(note also the awkward *2 and /2 scalings in the rad and Q15 columns because their circle constants are wrongly picked)

circle  deg     rad      Q15     Q16       | sin   isin (Q12)
-------------------------------------------+------------------
0      0       0        0000h     0000h   | 0      0000h
1/12   30      π/6      0666h     1333h   | ½      0800h
1/8    45      π/4      1000h     2000h   | ½√2    0B50h-ish
1/6    60      π/3      1333h     1666h   | ½√3    0DDBh-ish
1/4    90      π/2      2000h     4000h   | 1      1000h
1/2    180     π        4000h     8000h   | 0      0000h
3/4    270     π*3/2    6000h     C000h   | -1     F000h
1      360     π*2      8000h  (1)0000h   | 0      0000h

38. Your detailed description cleared up many confusions that I had on the input value.
Thank you very very much!
I would definitely like to have the excel sheet from you that you have already worked on, so as to use it as a reference to check against my excel sheet. I too follow the excel sheet to write my testcases, which really saves lot of time and energy and is more easier to add as many testcases we want to cover all the usecases. I really appreciate that you shared this tip , as I myself know the value of using excel sheet for such use case.

39. 1) If I have understood the angle conversion for Q15 and Q16 according to libnds, In the table which you have posted on 2015-04-20 at 0:33: the corresponding hex value for 60Â° in Angle(2^15 units/circle or Q15) is 1555h instead of 1333h.

Similarly for 30Â° is 1555h in Q16
60Â° is 2AAAh in Q16

2) Could you please tell me "ish " in 0B50h-ish(i think h stands for hex like in 0B50h)?

3) for 666h(Q15), isin(Q12), I get 4BCh instead of what you have mentioned in the table(0800h)? -- once I get the excel sheet from you, may I can also check with the values for the quadrants III and IV

40. 1) Oops, yeah you're right. 30Â° and 60Â° in Q15 should be 0AAAh and 1555h. I had calculator in the wrong mode. I thought it was weird that I didn't have 5s and As in the answer ...

2) "ish" means "roughly". Like 12:03 is noon-ish. And yes, the 'h' suffix is for hex.

3) Try it again with AAAh instead of 666h. That should work better. 666h would correspond with 666h/8000h*360 = 18Â°. sin(18Â°) = 0.309, which translates to 4F1h which is a better match to 4BCh ... but I expected better. I'll check why it's still 5% off later.

41. 1). I just updated the table that you had posted
These result relates to the implementation with isin_S3.
What I want to know is your sentence
"sin(18Â°) = 0.309, which translates to 4F1h which is a better match to 4BCh"
It is not clear to me how you got 4F1? which equation is been used to get 4F1h.
sin(18Â°) = 0.3090170273 is clear to me.
when I convert 18Â° into angle and apply in the equation 12(which is the implementation of isin_S3) gives 4BCh.
What is the reference to know the deviation?

circle   deg     rad        Q15     Q16         | sin       isin (Q12)
---------------------------------------------------+------------------------
0         0         0          0000h      0000h    | 0         0000h
1/12    30      Ï€/6       0AAAh      1555h    | Â½        0800h(i get 07B4h)
1/8      45      Ï€/4       1000h      2000h    | Â½âˆš2    0B50h-ish(i get 0B00h)
1/6      60      Ï€/3       1555h      2AAAh    | Â½âˆš3    0DDBh-ish(i get 0DA1h)
1/4      90      Ï€/2       2000h      4000h    | 1         1000h
1/2      180    Ï€          4000h      8000h    | 0          0000h
3/4      270    Ï€*3/2    6000h      C000h   | -1        F000h
1        360     Ï€*2      8000h   (1)0000h   | 0          0000h

2) I have another question which needs clarification, isn't x u16 when unit circle is been divided into Q15(32768)

42. as per my second question on RaTh on 2015-04-24 at 16:00
2) I have another question which needs clarification, isn't x u16 when unit circle is been divided into Q15(32768)
I answered to myself this way and I want to cross check with you,
-32768 to 0 -- gives sine on negative x-axis(-2pi, -3pi/2, -pi, -pi/2, 0)
0 to 32768 -- gives sine on positive x-axis(0, pi/2, pi, 3pi/2, 2pi)

just because s16 range is -32768 to 32767, just for one bit on the positive side, you want to use s32, which accomodates -32768 to 32768(do you agree that, this should be mentioned as Q16 instead of Q15) and similarly for -65536 to 65536(Q17).

Is my interpretation correct?