This article was an inspiration and a challenge for me.

I would have enjoyed it more if additional explanations were given, but never the less it opened my eyes and I thank you that you took time to write it.

Thanks :D

On the “additional information”, could you elaborate on that? Over the years, I’ve gotten more than a few comments on this article. Whenever I have to re-read things, I also feel like I’m missing a few things.

I’m toying with the idea of revising it a little, and would appreciate your feedback.

]]>I would have enjoyed it more if additional explanations were given, but never the less it opened my eyes and I thank you that you took time to write it.

Microchip offers fixed point math libraries for their XC compilers, but is embedded, you cannot see the code and use with other compilers, except if it is stripped out from .lst file. I think they use also polynomial approximation approach, maybe higher order polynomial.

Page 232 of the next file – see _Q15sinPI for example:

http://ww1.microchip.com/downloads/en//softwarelibrary/fixed%20point%20math%20library/50001456j.pdf

Here is the stripped code for PIC24, in case someone is interested.

It takes 1.9us at 80MHz internal clock of PIC24HJ64GP202.

int fast_sin(int an) {

#asm

mov an, w0

mov w0, w9

mov w0, w2

mov #0×8000, w1

clr w0

cpsne w2, w1

return

mov w8, [w15++]

mov #0×1, w8

cpsgt w2, w0

mov #0xffff, w8

mul.ss w2, w8, w0

mov #0×4001, w2

mov #0×8000, w3

cpslt w0, w2

sub w3, w0, w0

mov #0x28bf, w2

cp w0, w2

bra GE, SinePI_CosCall

mov #0×6488, w2

mul.ss w0, w2, w2

mov #0×1000, w4

add w2, w4, w2

addc w3, #0×0, w3

lsr w2, #0xd, w2

sl w3, #0×3, w4

ior w2, w4, w0

mov w0, w2

mov #0x6bb5, w0

mov #0x7fff, w1

cpsne w2, w1

bra L_SIN_PI_RETURN

mov #0x944b, w0

mov #0×8001, w1

cpsgt w2, w1

bra L_SIN_PI_RETURN

mul.ss w2, w2, w6

lsr w6, #0xf, w6

sl w7, #0×1, w7

ior w6, w7, w3

sl w3, #0×1, w3

mul.su w2, w3, w4

mov #0×5555, w1

mul.ss w5, w1, w6

asr w7, #0×1, w7

sub w2, w7, w0

mul.su w5, w3, w4

mov #0×4444, w1

mul.ss w5, w1, w6

asr w7, #0×5, w7

add w0, w7, w0

mul.su w5, w3, w4

mov #0×6807, w1

mul.ss w5, w1, w6

mov #0×400, w5

add w7, w5, w7

asr w7, #0xb, w7

sub w0, w7, w0

#asm

mov an, w0

mov w0, w9

mov w0, w2

mov #0×8000, w1

clr w0

cpsne w2, w1

return

mov w8, [w15++]

mov #0×1, w8

cpsgt w2, w0

mov #0xffff, w8

mul.ss w2, w8, w0

mov #0×4001, w2

mov #0×8000, w3

cpslt w0, w2

sub w3, w0, w0

mov #0x28bf, w2

cp w0, w2

bra GE, SinePI_CosCall

mov #0×6488, w2

mul.ss w0, w2, w2

mov #0×1000, w4

add w2, w4, w2

addc w3, #0×0, w3

lsr w2, #0xd, w2

sl w3, #0×3, w4

ior w2, w4, w0

mov w0, w2

mov #0x6bb5, w0

mov #0x7fff, w1

cpsne w2, w1

bra L_SIN_PI_RETURN

mov #0x944b, w0

mov #0×8001, w1

cpsgt w2, w1

bra L_SIN_PI_RETURN

mul.ss w2, w2, w6

lsr w6, #0xf, w6

sl w7, #0×1, w7

ior w6, w7, w3

sl w3, #0×1, w3

mul.su w2, w3, w4

mov #0×5555, w1

mul.ss w5, w1, w6

asr w7, #0×1, w7

sub w2, w7, w0

mul.su w5, w3, w4

mov #0×4444, w1

mul.ss w5, w1, w6

asr w7, #0×5, w7

add w0, w7, w0

mul.su w5, w3, w4

mov #0×6807, w1

mul.ss w5, w1, w6

mov #0×400, w5

add w7, w5, w7

asr w7, #0xb, w7

sub w0, w7, w0

L_SIN_PI_RETURN:

bra SIN_PI_END

SinePI_CosCall:

mov #0×4000, w3

sub w3, w0, w0

mov #0×6488, w2

mul.ss w0, w2, w2

mov #0×1000, w4

add w2, w4, w2

addc w3, #0×0, w3

lsr w2, #0xd, w2

sl w3, #0×3, w4

ior w2, w4, w0

mov #0xff01, w1

mov #0xff, w2

cp w0, w1

bra LT, SIN_PI_END

cp w0, w2

bra GT, L_SIN_PI_Cos_Else

mov #0x7fff, w0

bra SIN_PI_END

L_SIN_PI_Cos_Else:

mov w0, w2

mov #0×8000, w1

mov #0×4529, w0

cpsne w2, w1

bra SIN_PI_END

mov #0x7fff, w1

mov #0x7fff, w0

mul.ss w2, w2, w4

lsr w4, #0xf, w4

sl w5, #0×1, w5

ior w4, w5, w4

sl w4, #0×1, w4

mul.us w4, w1, w2

mov #0×8000, w7

mul.su w3, w7, w6

sub w0, w7, w0

mul.us w4, w3, w2

mov #0×5555, w7

mul.su w3, w7, w6

asr w7, #0×3, w7

add w0, w7, w0

mul.us w4, w3, w2

mov #0x2d83, w7

mul.su w3, w7, w6

asr w7, #0×7, w7

sub w0, w7, w0

mul.us w4, w3, w2

mov #0xd0, w7

mul.su w3, w7, w6

asr w7, #0×7, w7

add w0, w7, w0

SIN_PI_END:

mul.ss w0, w8, w0

mov [--w15], w8

mov w0, _RETURN_

#endasm

}

(cearn: edited for code blocks)

]]>The next Word file to preserve nice format of seeing the polynomial, you have to download it to see the polynomials in a nice format:

https://goo.gl/HCc9JT

PIC24HJ64GP202 on 16bit with internal clock 80MHz, 25ns for one instruction:

x=0…16383 and S(x)=0…4095.

S3(x) takes 250ns.

//multiply x*x, the result is W2 and W3

//only content of W3 is used

MOV x,W0

MOV x,W1

MUL.UU W0,W1,W2

//only content of W3 is used

MOV x,W0

MOV x,W1

MUL.UU W0,W1,W2

//shift logical right with 15 (x*x>>15)

//by multiplyinh with 2 and shift right with 16

//the result is W4 and W5, only content of W4 is used

MUL.UU W3,#0×2,W4

//substract (3*2^13) – (x*x>>15), the reuslt is in W4

MOV #0×6000,W6

SUB W6,W4,W4

//multiply x*((3*2^13) – (x*x>>15)), result in W2 and W3

//only W3 is used, being equivalent with shift right 16bit

MUL.UU W1,W4,W2

//get the result from W3 in S3x

MOV W3, S3x

S5(x) takes 425ns.

//multiply x*x and divide by 2^16, the result is in W3

MOV x, W0

MOV x, W1

MUL.UU W0, W1, W2

MOV x, W0

MOV x, W1

MUL.UU W0, W1, W2

//multiply by 9279=0x243F and divide by 2^16, the result is in W7

MOV #0x243F, W4

MUL.UU W3, W4, W6

//substract from 5256=0×1488, the result is in W6

MOV #0×1488, W6

SUB W6, W7, W6

//multiply with x and divide by 2^16, the result is in W9

MUL.UU W6, W0, W8

//multiply by 2^5=32=0×20, the result is in W10

MOV #0×20, W5

MUL.UU W5, W9, W10

//multiply with x and divide by 2^16, the result is in W7

MUL.UU W0, W10, W6

//substract from 25736=0×6488, the result is in W7

MOV #0×6488, W6

SUB W6, W7, W7

//multiply with x, which is in W0 or W1 and divide by 2^16, the result is in W3

MUL.UU W0, W7, W2

//get the result from W3 in S5x

MOV W3, S5x

(cearn: edited for code blocks)

]]>Sorry, for the delay. So much to do all the time …

Anyway, about your questions. *A* and *n* are indeed chosen based on the user’s parameters. I’m wondering now if I should have let *n* simply be the full-circle instead of a quarter-circle.

I think you’re right about *p* as well. This is a tricky constant, because it’s there’s no real “best” value for it. In my case, I believe 15 worked, and 16 would send it over the edge of overflow somewhere along the line. You just need to have *something* to scale the mulitplications with, and I just named it *p*.

So yeah, I think you interpreted it correctly. (Which is amazing, as even I have difficulty when reading it these days >_>)

]]>I do not understand exactly how A, n, p are chosen.

If I understand right:

– “The scale of the outcome (i.e., the amplitude): 2^A”

The A is chosen A=12, because the sine value return is Q12.

– “The scale on the inside the parentheses: 2^p. This is necessary to keep the multiplications from overflowing.”

The p is chosen p=15, because the final sine function isin_S3 is on 32bit and has the argument on 32bit and when e multiply 2 operands, in order to avoid overflow, then each operand must be 15 binary digits after the decimal point plus the sign.

– “The angle-scale: 2^n. This is basically the value of ½π in the fixed-point system. Using x for the angle, you have z = x/2^n.”

The n is chosen n=13, because if full circle (0…2PI) uses 2^15 angles for resolution, then for 1 quarter of circle (0..PI/2) we need only (2^15)/4=(2^15)/(2^2)=2^13, so n=13.

Is my interpretation right/proper? Please correct me if I am wrong.

I would like to port and test these calculations with a 16bit PIC24 microcontroller.

]]>