<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coranac</title>
	<atom:link href="http://www.coranac.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coranac.com</link>
	<description>my own little world</description>
	<lastBuildDate>Thu, 11 Feb 2010 20:04:27 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>grit 0.8.4 (out with the old bugs, in with the new)</title>
		<link>http://www.coranac.com/2010/02/grit-0-8-4/</link>
		<comments>http://www.coranac.com/2010/02/grit-0-8-4/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 20:04:14 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=154</guid>
		<description><![CDATA[
Okay, so it&#8217;s been a while, but there&#8217;s finally a new version for grit.

&#160;

First of all, the vector::insert should finally be fixed. And there was much rejoicing. I&#8217;ve also added an option for forcing the map palette-index (-mp &#60;num&#62;, which should help with NDS backgrounds that use ext-palettes.

&#160;

Also &#8211; and this one is pretty big [...]]]></description>
			<content:encoded><![CDATA[<p>
Okay, so it&#8217;s been a while, but there&#8217;s finally a new version for grit.
</p>
<p><div>&nbsp;</div></p>
<p>
First of all, the <code>vector::insert</code> should finally be fixed. And there was much rejoicing. I&#8217;ve also added an option for forcing the map palette-index (<tt>-mp &lt;num&gt;</tt>, which should help with NDS backgrounds that use ext-palettes.
</p>
<p><div>&nbsp;</div></p>
<p>
Also &ndash; and this one is pretty big &ndash; I&#8217;ve <b>completely replaced</b> the tile-mapping routines for something more general. The new method should be able to handle variable-sized tiles (<tt>-tw &lt;n&gt;</tt> and <tt>-th &lt;n&gt;</tt>) and is mostly independent of bitdepth. Specifically, bitdepths over 8 bpp can be handled as well, at least in principle. It also means that the external tileset can be a metatile-set as well now, which is good if you&#8217;re using metatiles.
</p>
<p>
With this new method also comes a way to create a custom bitformat for maps (<tt>-mB</tt> flag). I&#8217;m not entirely sure how this can be used yet, but using more than 10 bits for the tile-index, or a 1bpp collision map should be possible now.
</p>
<p>
Since this is a fairly major change, I kinda expect there&#8217;s still some bugs in the system. I have tested it for a number of options, but you know how it is with multi-platform stuff. In particular, if any of you big-endian-system users have trouble now, this will probably be the cause.
</p>
<p>
And now I will leave you with a &hellip;
</p>
<ul>
<li><a href=/projects/#grit>link to grit</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2010/02/grit-0-8-4/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Because robots need love too</title>
		<link>http://www.coranac.com/2010/01/because-robots-need-love-too/</link>
		<comments>http://www.coranac.com/2010/01/because-robots-need-love-too/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 19:15:33 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=147</guid>
		<description><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />

From xkcd (obviously)


    
  



T_T
]]></description>
			<content:encoded><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<p>
From <a href="http://xkcd.com/695/">xkcd</a> (obviously)
</p>
<p><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</p>
<p>
<kbd>T_T</kbd></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2010/01/because-robots-need-love-too/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Symmetrical date is symmetrical</title>
		<link>http://www.coranac.com/2010/01/symmetrical-date-is-symmetrical/</link>
		<comments>http://www.coranac.com/2010/01/symmetrical-date-is-symmetrical/#comments</comments>
		<pubDate>Sat, 02 Jan 2010 18:32:46 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=142</guid>
		<description><![CDATA[
Random trivia: today is January 2nd, 2010. Or, in the
One True Date Formatting, 20100102 : a symmetrical date. The next one won&#8217;t be November next year.

insert obigatory &#8220;symmetrical&#8221; macro here
]]></description>
			<content:encoded><![CDATA[<p>
Random trivia: today is January 2<sup>nd</sup>, 2010. Or, in the<br />
<a href="http://en.wikipedia.org/wiki/ISO_8601">One True Date Formatting</a>, 20100102 : a symmetrical date. The next one won&#8217;t be November next year.
</p>
<div class=cblock><i>insert obigatory &#8220;symmetrical&#8221; macro here</i></div>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2010/01/symmetrical-date-is-symmetrical/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>I maek game :D</title>
		<link>http://www.coranac.com/2009/12/setds/</link>
		<comments>http://www.coranac.com/2009/12/setds/#comments</comments>
		<pubDate>Wed, 23 Dec 2009 19:12:00 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[nds]]></category>
		<category><![CDATA[setds]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[game]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=139</guid>
		<description><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />

Okay, so it&#8217;s only a card game; but a game nonetheless.

&#160;

The game in question is an NDS implementation of

SET. Set is a card-matching game
with 81 cards (see below). The figures on the cards have four
properties and 3 possibilities for each property. The key is to find
three cards for which the values of each property are [...]]]></description>
			<content:encoded><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<p>
Okay, so it&#8217;s <i>only</i> a card game; but a game nonetheless.
</p>
<p><div>&nbsp;</div></p>
<p>
The game in question is an NDS implementation of<br />

<a href="http://en.wikipedia.org/wiki/Set_%28game%29">SET</a>. <i>Set</i> is a card-matching game<br />
with 81 cards (see below). The figures on the cards have four<br />
properties and 3 possibilities for each property. The key is to find<br />
three cards for which the values of each property are either all<br />
<b>equal</b> or all <b>different</b>. Looking at the color property,<br />
for example, a &#8220;Red Red Red&#8221; combination could (yes &#8220;could&#8221;; there<br />
are still three other properties to consider) form a set.<br />
&#8220;Red Green Blue&#8221; would also work, but &#8220;Red Green Green&#8221; would not.
</p>
<p>
Further details can be found in the readme and the game itself.
</p>
<div class="cblock">
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>

</div>
<p><div>&nbsp;</div></p>
<p>
The game is <i>mostly</i> finished. There may be some tweaking to do<br />
here and there, but right now I don&#8217;t want to get bogged down in a<br />
massive fine-tuning-fest &ndash; especially since I&#8217;m not sure what<br />
parts need fine-tuning &hellip; and because there&#8217;s<br />
<a href="http://www.coranac.com/projects/grit">other stuff</a> I really should get<br />
back to.
</p>
<p>
That said, all important aspects work &hellip; with <b>one</b><br />
exception: hiscore saving. Yes, <i>that</i>. I&#8217;ve seen the<br />
multitude of threads on the subject but sofar I&#8217;m unsure of what would<br />
work on both hardware and emulator, so I&#8217;m leaving it as is for now.<br />
If anyone has a tidy hw+emu solution, please do tell.
</p>
<h4>Links</h4>
<ul>
<li>binary: <a href="http://www.coranac.com/files/setds-bin.zip">setds-bin.zip</a> (146k)</li>
<li>source: <a href="http://www.coranac.com/files/setds-src.zip">setds-src.zip</a> (354k)</li>
<li>readme: <a href="http://www.coranac.com/files/setds-readme.txt">setds-readme.txt</a></li>
</ul>
<p><div>&nbsp;</div></p>
<p>
Oh, and merry Christmas everybody.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/12/setds/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Some new notes on NDS code size</title>
		<link>http://www.coranac.com/2009/11/sizeof-new/</link>
		<comments>http://www.coranac.com/2009/11/sizeof-new/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 15:31:31 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[nds]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=133</guid>
		<description><![CDATA[
When I discussed the

memory footprints of several C/C++ elements, I apparently missed a
very important item: operator new and related functions. I
assumed new shouldn&#8217;t increase the binary that much,
but boy was I wrong.


The short story is that officially new should throw an
exception when it can&#8217;t allocate new memory. Exceptions come with about
60 kb worth of baggage. [...]]]></description>
			<content:encoded><![CDATA[<p>
When I discussed the<br />
<a href="http://www.coranac.com/2009/02/some-interesting-numbers-on-nds-code-size/"><br />
memory footprints of several C/C++ elements</a>, I apparently missed a<br />
very important item: <code>operator new</code> and related functions. I<br />
assumed <code>new</code> shouldn&#8217;t increase the binary that much,<br />
but boy was I wrong.
</p>
<p>
The short story is that officially <code>new</code> should throw an<br />
exception when it can&#8217;t allocate new memory. Exceptions come with about<br />
60 kb worth of baggage. Yes, this is more or less the same stuff that<br />
goes into <code>vector</code> and <code>string</code>.
</p>
<p>
The long story, including a detailed look at a minimal binary,<br />
a binary that uses <code>new</code> and a solution to the exception overhead (in this particular case anyway) can be read below the fold.
</p>
<p><span id="more-133"></span></p>
<p><div>&nbsp;</div><ul>
  <li> <a href="#sec-base">1
Minimal project
</a> </li>
  <li> <a href="#sec-std-new">2
Standard C++ new/delete
</a> </li>
  <li> <a href="#sec-own-new">3
Custom new/delete
</a> </li>
  <li> <a href="#sec-conc">4
Other considerations and conclusions.
</a> </li>
</ul>
</p>
<p><h2 id="sec-base">1
Minimal project
</h2>
</p>
<p>
The following is essentially an empty project. It should represent<br />
the smallest binary you can get with the current DKA (r26) and<br />
libnds (1.3.7). This is the primary reference case.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co2">#include &lt;nds.h&gt;</span></p>
<p><span class="kw1">int</span> main()<br />
{<br />
&nbsp; &nbsp; <span class="kw1">while</span>() ;<br />
}</div>
</div>
<p>
This actually already leads to a binary of 53.5 kb. To analyze what<br />
goes on in there, we can look at the map file. <i>Not</i> the mapfile<br />
generated by the linker, mind you, but by the <tt>arm-eabi-nm</tt> tool,<br />
whose generated files are considerably easier to read. To use this tool,<br />
add the following line to <code>$(BUILD)</code> rule in the makefile,<br />
so that it looks like below. If you want to know what the flags mean,<br />
please <a href="http://sourceware.org/binutils/docs/binutils/nm.html">RTFM</a>.
</p>
<div class="make">
<div class="make proglist" style=" ">$(<span class="re2">BUILD</span>):<br />
&nbsp; &nbsp; @[ -d <span class="re0">$@</span> ] || mkdir -p <span class="re0">$@</span><br />
&nbsp; &nbsp; @make &#8211;no-print-directory -C $(<span class="re2">BUILD</span>) -f $(<span class="re2">CURDIR</span>)/Makefile<br />
&nbsp; &nbsp; arm-eabi-nm -Sn $(<span class="re2">OUTPUT</span>).elf &gt; $(<span class="re2">BUILD</span>)/$(<span class="re2">TARGET</span>).map</div>
</div>
<p>
And this is the resulting mapfile, in full.
</p>
<div class="none">
<div class="none proglist" style=" ">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w _Jv_RegisterClasses<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w __deregister_frame_info<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w __register_frame_info<br />
00080000 N _stack<br />
01000000 A __vectors_end<br />
01000000 A __vectors_start<br />
01000100 A __itcm_start<br />
01000100 000000c8 T irqTable<br />
010001c8 T IntrMain<br />
010001fc t findIRQ<br />
01000218 t no_handler<br />
01000228 t jump_intr<br />
0100023c t got_handler<br />
0100025c t IntrRet<br />
01000290 A __itcm_end<br />
02000000 T __text_start<br />
02000000 T _start<br />
02000194 t ILoop<br />
02000198 t checkARGV<br />
020001dc t .copyforward<br />
020001f0 t .copybackward<br />
02000200 t .copydone<br />
02000214 t ClearMem<br />
02000228 t ClrLoop<br />
02000238 t CopyMemCheck<br />
0200023c t CopyMem<br />
0200024c t CIDLoop<br />
02000300 T _init<br />
02000310 t __do_global_dtors_aux<br />
0200033c t frame_dummy<br />
0200037c 00000004 T main<br />
02000380 000000ec T initSystem<br />
0200046c 00000012 T ledBlink<br />
02000480 0000002c T powerOff<br />
020004ac 00000030 T powerOn<br />
020004dc 00000018 T systemSleep<br />
020004f4 00000010 T powerValueHandler<br />
02000504 00000044 T systemMsgHandler<br />
02000548 00000164 t fifoInternalSend<br />
020006ac 00000038 T fifoSendAddress<br />
020006e4 00000048 T fifoSendValue32<br />
0200072c 00000070 T fifoGetAddress<br />
0200079c 00000074 T fifoSetAddressHandler<br />
02000810 00000070 T fifoGetValue32<br />
02000880 00000074 T fifoSetValue32Handler<br />
020008f4 00000024 T fifoCheckAddress<br />
02000918 00000024 T fifoCheckDatamsg<br />
0200093c 00000024 T fifoCheckValue32<br />
02000960 00000094 t fifoInternalSendInterrupt<br />
020009f4 00000010 t __timeoutvbl<br />
02000a04 000001b8 T fifoInit<br />
02000bbc 00000100 T fifoGetDatamsg<br />
02000cbc 0000040c t fifoInternalRecvInterrupt<br />
020010c8 000000a8 T fifoSetDatamsgHandler<br />
02001170 00000070 T fifoSendDatamsg<br />
020011e0 00000002 T irqDummy<br />
020011e4 0000006c T irqSet<br />
02001250 0000004c T irqInit<br />
0200129c 00000030 T irqInitHandler<br />
020012cc 00000060 T irqEnable<br />
0200132c 00000060 T irqDisable<br />
0200138c 0000002c T irqClear<br />
020013c0 T swiSoftReset<br />
020013c4 T swiDelay<br />
020013c8 T swiIntrWait<br />
020013cc T swiWaitForVBlank<br />
020013d0 T swiSleep<br />
020013d4 T swiChangeSoundBias<br />
020013d8 T swiDivide<br />
020013dc T swiRemainder<br />
020013e2 T swiDivMod<br />
020013ee T swiCopy<br />
020013f2 T swiFastCopy<br />
020013f6 T swiSqrt<br />
020013fa T swiCRC16<br />
020013fe T swiIsDebugger<br />
02001402 T swiUnpackBits<br />
02001406 T swiDecompressLZSSWram<br />
0200140a T swiDecompressLZSSVram<br />
0200140e T swiDecompressHuffman<br />
02001412 T swiDecompressRLEWram<br />
02001416 T swiDecompressRLEVram<br />
0200141a T swiWaitForIRQ<br />
0200141e T swiDecodeDelta8<br />
02001422 T swiDecodeDelta16<br />
02001426 T swiSetHaltCR<br />
02001430 00000030 T __libc_fini_array<br />
02001460 00000050 T __libc_init_array<br />
020014b4 00000080 T memcpy<br />
02001534 00000006 T _times_r<br />
0200153c 0000002c T _gettimeofday_r<br />
02001568 00000014 T _times<br />
0200157c 00000052 T build_argv<br />
020015d0 0000000c T __errno<br />
020015dc T _fini<br />
020015e8 A __text_end<br />
020015e8 00000004 R _global_impure_ptr<br />
020015f0 A __exidx_end<br />
020015f0 A __exidx_start<br />
020015f0 t __frame_dummy_init_array_entry<br />
020015f0 A __init_array_start<br />
020015f0 A __preinit_array_end<br />
020015f0 A __preinit_array_start<br />
020015f4 t __do_global_dtors_aux_fini_array_entry<br />
020015f4 A __fini_array_start<br />
020015f4 A __init_array_end<br />
020015f8 r __EH_FRAME_BEGIN__<br />
020015f8 r __FRAME_END__<br />
020015f8 A __fini_array_end<br />
020015fc d __JCR_END__<br />
020015fc d __JCR_LIST__<br />
02001600 A __data_start<br />
02001600 D __dso_handle<br />
02001600 A __ewram_start<br />
02001604 00000004 D fifo_freewords<br />
02001608 00000004 D fifo_send_queue<br />
0200160c 00000004 D fifo_buffer_free<br />
02001610 00000004 D fifo_receive_queue<br />
02001618 00000004 D _impure_ptr<br />
02001620 00000428 d impure_data<br />
02001a48 A __bss_start<br />
02001a48 A __bss_start__<br />
02001a48 A __bss_vma<br />
02001a48 A __data_end<br />
02001a48 A __dtcm_lma<br />
02001a48 A __itcm_lma<br />
02001a48 b completed.2775<br />
02001a4c b object.2787<br />
02001a64 00000004 b __timeout<br />
02001a68 00000004 B processing<br />
02001a6c 00000004 B fake_heap_end<br />
02001a70 00000004 B fake_heap_start<br />
02001a74 00000004 B theTime<br />
02001a78 00000040 B fifo_datamsg_data<br />
02001ab8 00000800 B fifo_buffer<br />
02001bd8 A __vectors_lma<br />
020022b8 00000040 B fifo_value32_func<br />
020022f8 00000040 B fifo_address_func<br />
02002338 00000040 B fifo_value32_data<br />
02002378 00000040 B fifo_value32_queue<br />
020023b8 00000040 B fifo_data_queue<br />
020023f8 00000040 B fifo_address_data<br />
02002438 00000040 B fifo_datamsg_func<br />
02002478 00000040 B fifo_address_queue<br />
020024b8 00000004 B punixTime<br />
020024bc A __bss_end<br />
020024bc A __bss_end__<br />
020024bc A __end__<br />
020024bc A _end<br />
023ff000 A __eheap_end<br />
023ff000 A __ewram_end<br />
027fff70 a _libnds_argv<br />
0b000000 A __dtcm_end<br />
0b000000 A __dtcm_start<br />
0b000000 A __sbss_end<br />
0b000000 A __sbss_start<br />
0b000000 A __sbss_start__<br />
0b003d00 A __sp_usr<br />
0b003e00 A __sp_irq<br />
0b003f00 A __sp_svc<br />
0b003ff8 A __irq_flags<br />
0b003ffc A __irq_vector<br />
0b004000 A __dtcm_top</div>
</div>
<p>
Now, I expect you can&#8217;t really tell much from this, so here&#8217;s a summary.
</p>
<div class="none">
<div class="none proglist" style=" ">[map]<br />
begin &nbsp; &nbsp; &nbsp; end &nbsp; &nbsp; &nbsp; &nbsp; size&nbsp; &nbsp; &nbsp; Description<br />
02000000 &#8211; 0200033c &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : crt0.S (roughly)<br />
0200037c &#8211; 02000380 &nbsp; &nbsp; 0004 &nbsp; &nbsp;: main.c<br />
02000380 &#8211; 02000548 &nbsp; &nbsp; 01C8&nbsp; &nbsp; : libnds system init/handlers<br />
02000548 &#8211; 020011e0 &nbsp; &nbsp; 0C98&nbsp; &nbsp; : libnds fifo routines<br />
020011e0 &#8211; 020013c0 &nbsp; &nbsp; 01E0&nbsp; &nbsp; : libnds interrupt.c<br />
020013c0 &#8211; 02001430 &nbsp; &nbsp; 0070&nbsp; &nbsp; : libnds bios.s<br />
02001430 &#8211; 020015e8 &nbsp; &nbsp; 01B8&nbsp; &nbsp; : libc misc<br />
020015e8 A __text_end<br />
020015e8 &#8211; 02001600 &nbsp; &nbsp; 0018&nbsp; &nbsp; : C/C++ ctor/dtor overhead, etc?<br />
02001600 &#8211; 02001618 &nbsp; &nbsp; 0018&nbsp; &nbsp; : libnds fifo data<br />
02001618 &#8211; 02001a48 &nbsp; &nbsp; 0430&nbsp; &nbsp; : impure ?!?<br />
02001a48 &#8211; 02001a78 &nbsp; &nbsp; 0030&nbsp; &nbsp; : misc bookkeeping</p>
<p>02001a78 &#8211; 020024b8 &nbsp; &nbsp; 0A40&nbsp; &nbsp; : libnds fifo data + pointers<br />
020024bc A _end</p>
<p>000024bc &#8211; 0000D630 0000B174&nbsp; &nbsp; : ???<br />
[/map]</div>
</div>
<p>
The <code>0100:xxxx</code> and <code>0B00:xxxx</code> ranges belong to<br />
ITCM and DTCM, so those are irrelevant when looking at main RAM size.<br />
The libc, impure and misc bookkeeping sections are stuff related to the<br />
C library and C overhead, accounting for about 1.5 kb. The boot code,<br />
<tt>crt0.S</tt> also covers close to 1.0 kb. As expected, the code for<br />
<code>main.c</code> &ndash;the actual project&ndash; is more or less<br />
nothing.
</p>
<p>
The rest, about 7 kb, is libnds. Now, you may say that this is quite a bit<br />
of overhead, but it really isn&#8217;t. Pretty much all of it relates to<br />
interrupts and the fifo system, which takes care of ARM7-ARM9<br />
communication. You <i>need</i> to have these parts. Okay, you could try<br />
to roll your own to shrink this down to the bare essentials, but in all<br />
likelihood that&#8217;s more trouble than it&#8217;s worth.
</p>
<p><div>&nbsp;</div></p>
<p>
The observant of you should have noticed something: we&#8217;re only at 9.5 kb,<br />
but the file size is 53.5 kb. So what the hell happened to the other 44 kb?<br />
Well, I don&#8217;t know, to be honest. It doesn&#8217;t appear in MWRAM to be sure.<br />
It&#8217;s probably the stuff <tt>ndstool</tt> adds. My guess it that that&#8217;s<br />
where the ARM7 binary goes, along with the icon, titles and possibly<br />
DLDI interfaces, but I really can&#8217;t say right now.
</p>
<p><h2 id="sec-std-new">2
Standard C++ new/delete
</h2>
</p>
<p>
And now, let&#8217;s look at what happens when you invoke <code>new</code>.
</p>
<div class="none">
<div class="none proglist" style=" ">void test_std_new()<br />
{<br />
&nbsp; &nbsp; u8 *ptr= new u8[<span class="nu0">8</span>];<br />
&nbsp; &nbsp; delete[] ptr;<br />
}</p>
<p>int main()<br />
{<br />
&nbsp; &nbsp; while(<span class="nu0">1</span>) ;<br />
}</div>
</div>
<p>
Just this small thing increases the file size to 117 kb! And remember,<br />
that&#8217;s not merely a doubling of the size, as 44 kb of the binary is not<br />
put in memory. The memory load has gone from about 10 kb to over 70 kb.<br />
What causes this increase? Well, let&#8217;s see:
</p>
<div class="none">
<div class="none proglist" style=" ">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w _Jv_RegisterClasses<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w __deregister_frame_info<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w __gnu_Unwind_Find_exidx<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w __register_frame_info<br />
00080000 N _stack<br />
01000000 A __vectors_end<br />
01000000 A __vectors_start<br />
01000100 A __itcm_start<br />
01000100 000000c8 T irqTable<br />
010001c8 T IntrMain<br />
010001fc t findIRQ<br />
01000218 t no_handler<br />
01000228 t jump_intr<br />
0100023c t got_handler<br />
0100025c t IntrRet<br />
01000290 A __itcm_end<br />
02000000 T __text_start<br />
02000000 T _start<br />
02000194 t ILoop<br />
02000198 t checkARGV<br />
020001dc t .copyforward<br />
020001f0 t .copybackward<br />
02000200 t .copydone<br />
02000214 t ClearMem<br />
02000228 t ClrLoop<br />
02000238 t CopyMemCheck<br />
0200023c t CopyMem<br />
0200024c t CIDLoop<br />
02000300 T _init<br />
02000310 t __do_global_dtors_aux<br />
0200033c t frame_dummy<br />
0200037c 00000004 T main<br />
02000380 00000012 T _Z12test_std_newv<br />
02000394 000000ec T initSystem<br />
02000480 00000012 T ledBlink<br />
02000494 0000002c T powerOff<br />
020004c0 00000030 T powerOn<br />
020004f0 00000018 T systemSleep<br />
02000508 00000010 T powerValueHandler<br />
02000518 00000044 T systemMsgHandler<br />
0200055c 00000164 t fifoInternalSend<br />
020006c0 00000038 T fifoSendAddress<br />
020006f8 00000048 T fifoSendValue32<br />
02000740 00000070 T fifoGetAddress<br />
020007b0 00000074 T fifoSetAddressHandler<br />
02000824 00000070 T fifoGetValue32<br />
02000894 00000074 T fifoSetValue32Handler<br />
02000908 00000024 T fifoCheckAddress<br />
0200092c 00000024 T fifoCheckDatamsg<br />
02000950 00000024 T fifoCheckValue32<br />
02000974 00000094 t fifoInternalSendInterrupt<br />
02000a08 00000010 t __timeoutvbl<br />
02000a18 000001b8 T fifoInit<br />
02000bd0 00000100 T fifoGetDatamsg<br />
02000cd0 0000040c t fifoInternalRecvInterrupt<br />
020010dc 000000a8 T fifoSetDatamsgHandler<br />
02001184 00000070 T fifoSendDatamsg<br />
020011f4 00000002 T irqDummy<br />
020011f8 0000006c T irqSet<br />
02001264 0000004c T irqInit<br />
020012b0 00000030 T irqInitHandler<br />
020012e0 00000060 T irqEnable<br />
02001340 00000060 T irqDisable<br />
020013a0 0000002c T irqClear<br />
020013d0 T swiSoftReset<br />
020013d4 T swiDelay<br />
020013d8 T swiIntrWait<br />
020013dc T swiWaitForVBlank<br />
020013e0 T swiSleep<br />
020013e4 T swiChangeSoundBias<br />
020013e8 T swiDivide<br />
020013ec T swiRemainder<br />
020013f2 T swiDivMod<br />
020013fe T swiCopy<br />
02001402 T swiFastCopy<br />
02001406 T swiSqrt<br />
0200140a T swiCRC16<br />
0200140e T swiIsDebugger<br />
02001412 T swiUnpackBits<br />
02001416 T swiDecompressLZSSWram<br />
0200141a T swiDecompressLZSSVram<br />
0200141e T swiDecompressHuffman<br />
02001422 T swiDecompressRLEWram<br />
02001426 T swiDecompressRLEVram<br />
0200142a T swiWaitForIRQ<br />
0200142e T swiDecodeDelta8<br />
02001432 T swiDecodeDelta16<br />
02001436 T swiSetHaltCR<br />
02001440 00000054 t d_make_comp<br />
02001494 0000003a t d_make_name<br />
020014d0 00000058 t d_number<br />
02001528 0000004c t d_call_offset<br />
02001574 00000096 t d_cv_qualifiers<br />
0200160c 00000060 t d_template_param<br />
0200166c 00000160 t d_substitution<br />
020017cc 00000050 t d_append_char<br />
0200181c 00000084 t d_find_pack<br />
020018a0 00000090 t d_source_name<br />
02001930 00000240 t d_expression<br />
02001b70 0000056c t d_type<br />
020020dc 0000009a t d_bare_function_type<br />
02002178 000000ec t d_operator_name<br />
02002264 00000136 t d_unqualified_name<br />
0200239c 000000ca t d_expr_primary<br />
02002468 000000aa t d_template_args<br />
02002514 0000022c t d_name<br />
02002740 0000039c t d_encoding<br />
02002adc 00000060 t d_exprlist<br />
02002b3c 0000008a t d_growable_string_callback_adapter<br />
02002bc8 00000098 t d_append_buffer<br />
02002c60 000000a0 t d_append_string<br />
02002d00 000001f8 t d_print_array_type<br />
02002ef8 00000108 t d_print_mod_list<br />
02003000 00000234 t d_print_function_type<br />
02003234 00000ba0 t d_print_comp<br />
02003dd4 000001c0 t d_demangle_callback<br />
02003f94 0000002e T __gcclibcxx_demangle_callback<br />
02003fc4 000000c0 T __cxa_demangle<br />
02004084 000000c8 t d_print_mod<br />
0200414c 00000104 t d_print_cast<br />
02004250 0000009c t d_print_expr_op<br />
020042ec 000000a8 t d_print_subexpr<br />
02004398 T __cxa_end_cleanup<br />
020043a4 T __aeabi_uidiv<br />
020043a4 0000007a T __udivsi3<br />
02004420 0000000e T __aeabi_uidivmod<br />
02004430 00000002 T __aeabi_idiv0<br />
02004430 00000002 T __aeabi_ldiv0<br />
02004430 00000002 T __div0<br />
02004434 00000010 t _Unwind_decode_target2<br />
02004444 0000002a T _Unwind_VRS_Get<br />
02004470 0000001a t _Unwind_GetGR<br />
0200448c 0000002a T _Unwind_VRS_Set<br />
020044b8 0000001c t _Unwind_SetGR<br />
020044d4 00000020 t selfrel_offset31<br />
020044f4 00000074 t search_EIT_table<br />
02004568 00000004 T _Unwind_GetCFA<br />
0200456c 00000002 T _Unwind_Complete<br />
02004570 00000016 T _Unwind_DeleteException<br />
02004588 000002bc t __gnu_unwind_pr_common<br />
02004844 0000000e W __aeabi_unwind_cpp_pr2<br />
02004854 0000000e W __aeabi_unwind_cpp_pr1<br />
02004864 0000000e T __aeabi_unwind_cpp_pr0<br />
02004874 000000d0 t get_eit_entry<br />
02004944 0000005a t restore_non_core_regs<br />
020049a0 00000080 T __gnu_Unwind_Backtrace<br />
02004a20 000000e4 t unwind_phase2_forced<br />
02004b04 00000018 T __gnu_Unwind_ForcedUnwind<br />
02004b1c 00000034 t unwind_phase2<br />
02004b50 00000060 T __gnu_Unwind_RaiseException<br />
02004bb0 0000001e T __gnu_Unwind_Resume_or_Rethrow<br />
02004bd0 00000040 T __gnu_Unwind_Resume<br />
02004c10 00000268 T _Unwind_VRS_Pop<br />
02004e80 0000001c T __restore_core_regs<br />
02004e80 0000001c T restore_core_regs<br />
02004e9c T __gnu_Unwind_Restore_VFP<br />
02004ea4 T __gnu_Unwind_Save_VFP<br />
02004eac T __gnu_Unwind_Restore_VFP_D<br />
02004eb4 T __gnu_Unwind_Save_VFP_D<br />
02004ebc T __gnu_Unwind_Restore_VFP_D_16_to_31<br />
02004ec4 T __gnu_Unwind_Save_VFP_D_16_to_31<br />
02004ecc T __gnu_Unwind_Restore_WMMXD<br />
02004f10 T __gnu_Unwind_Save_WMMXD<br />
02004f54 T __gnu_Unwind_Restore_WMMXC<br />
02004f68 T __gnu_Unwind_Save_WMMXC<br />
02004f7c 0000002a T _Unwind_RaiseException<br />
02004f7c 0000002a T ___Unwind_RaiseException<br />
02004fa8 0000002a T _Unwind_Resume<br />
02004fa8 0000002a T ___Unwind_Resume<br />
02004fd4 0000002a T _Unwind_Resume_or_Rethrow<br />
02004fd4 0000002a T ___Unwind_Resume_or_Rethrow<br />
02005000 0000002a T _Unwind_ForcedUnwind<br />
02005000 0000002a T ___Unwind_ForcedUnwind<br />
0200502c 0000002a T _Unwind_Backtrace<br />
0200502c 0000002a T ___Unwind_Backtrace<br />
02005058 00000036 t next_unwind_byte<br />
02005090 00000006 T _Unwind_GetTextRelBase<br />
02005098 00000006 T _Unwind_GetDataRelBase<br />
020050a0 0000001a t _Unwind_GetGR<br />
020050bc 0000000e t unwind_UCB_from_context<br />
020050cc 00000018 T _Unwind_GetLanguageSpecificData<br />
020050e4 0000000e T _Unwind_GetRegionStart<br />
020050f4 000002e8 T __gnu_unwind_execute<br />
020053dc 0000002a T __gnu_unwind_frame<br />
02005408 0000000e T abort<br />
02005418 0000002c T fputc<br />
02005444 00000026 T _fputc_r<br />
0200546c 0000005c T _fputs_r<br />
020054c8 0000001c T fputs<br />
020054e4 00000324 T __sfvwrite_r<br />
0200580c 0000007c T _fwrite_r<br />
02005888 00000028 T fwrite<br />
020058b0 00000030 T __libc_fini_array<br />
020058e0 00000050 T __libc_init_array<br />
02005934 00000018 T free<br />
0200594c 00000018 T malloc<br />
02005964 00000504 T _malloc_r<br />
02005e68 00000080 T memchr<br />
02005ee8 00000058 T memcmp<br />
02005f40 00000080 T memcpy<br />
02005fc0 000000a0 T memmove<br />
02006060 00000094 T memset<br />
020060f4 00000002 T __malloc_lock<br />
020060f8 00000002 T __malloc_unlock<br />
020060fc 00000064 T putc<br />
02006160 0000005e T _putc_r<br />
020061c0 0000001c T realloc<br />
020061dc 00000360 T _realloc_r<br />
0200653c 0000005c T _raise_r<br />
02006598 00000018 T raise<br />
020065b0 00000036 T _init_signal_r<br />
020065e8 00000014 T _init_signal<br />
020065fc 00000056 T __sigtramp_r<br />
02006654 00000018 T __sigtramp<br />
0200666c 00000040 T _signal_r<br />
020066ac 0000001c T signal<br />
020066cc 00000044 T sprintf<br />
02006710 00000040 T _sprintf_r<br />
02006750 0000005c T strcmp<br />
020067ac 0000004c T strcpy<br />
020067f8 0000006c T strlen<br />
02006864 000000ac T strncmp<br />
02006910 00000134 t __sprint_r<br />
02006a44 000015d6 T _svfprintf_r<br />
02008020 00000020 T write<br />
02008040 000000c4 T __swbuf_r<br />
02008104 0000001c T __swbuf<br />
02008120 00000042 T _wcrtomb_r<br />
02008164 00000020 T wcrtomb<br />
02008184 000000da T _wcsrtombs_r<br />
02008260 00000028 T wcsrtombs<br />
02008288 000002c8 T _wctomb_r<br />
02008550 000000d0 T __swsetup_r<br />
02008620 00000154 t quorem<br />
02008774 00000e9c T _dtoa_r<br />
02009610 00000114 T _fflush_r<br />
02009724 00000030 T fflush<br />
02009758 00000002 T __sfp_lock_acquire<br />
0200975c 00000002 T __sfp_lock_release<br />
02009760 00000002 T __sinit_lock_acquire<br />
02009764 00000002 T __sinit_lock_release<br />
02009768 00000004 t __fp_lock<br />
0200976c 00000004 t __fp_unlock<br />
02009770 0000001c T __fp_unlock_all<br />
0200978c 0000001c T __fp_lock_all<br />
020097a8 00000014 T _cleanup_r<br />
020097bc 00000014 T _cleanup<br />
020097d0 0000004c t std<br />
0200981c 0000005c T __sinit<br />
02009878 00000030 T __sfmoreglue<br />
020098a8 00000090 T __sfp<br />
02009938 000000a4 T _malloc_trim_r<br />
020099dc 000001ac T _free_r<br />
02009b88 00000064 T _fwalk_reent<br />
02009bec 0000005c T _fwalk<br />
02009c4c 0000000c T __locale_charset<br />
02009c58 00000008 T _localeconv_r<br />
02009c60 00000008 T localeconv<br />
02009c68 00000254 T _setlocale_r<br />
02009ebc 0000001c T setlocale<br />
02009ed8 000000e8 T __smakebuf_r<br />
02009fc0 0000065e T _mbtowc_r<br />
0200a620 00000016 T _Bfree<br />
0200a638 00000054 T __hi0bits<br />
0200a68c 00000068 T __lo0bits<br />
0200a6f4 00000042 T __mcmp<br />
0200a738 00000050 T __ulp<br />
0200a788 0000009c T __b2d<br />
0200a824 00000064 T __ratio<br />
0200a888 00000044 T _mprec_log10<br />
0200a8cc 00000048 T __copybits<br />
0200a914 00000054 T __any_on<br />
0200a968 00000052 T _Balloc<br />
0200a9bc 000000e4 T __d2b<br />
0200aaa0 00000120 T __mdiff<br />
0200abc0 000000c4 T __lshift<br />
0200ac84 00000164 T __multiply<br />
0200ade8 00000016 T __i2b<br />
0200ae00 000000a4 T __multadd<br />
0200aea4 000000b8 T __pow5mult<br />
0200af5c 0000009c T __s2b<br />
0200aff8 00000024 T __isinfd<br />
0200b01c 00000020 T __isnand<br />
0200b03c 00000010 T __sclose<br />
0200b04c 00000030 T __sseek<br />
0200b07c 0000003c T __swrite<br />
0200b0b8 0000002c T __sread<br />
0200b0e4 0000005c T _calloc_r<br />
0200b140 000000a2 T _fclose_r<br />
0200b1e4 00000018 T fclose<br />
0200b200 0000004c T _close_r<br />
0200b250 00000054 T _fstat_r<br />
0200b2a8 0000000a T _getpid_r<br />
0200b2b4 00000004 T _isatty_r<br />
0200b2b8 0000000a T _kill_r<br />
0200b2c4 0000004c T _lseek_r<br />
0200b314 0000004c T _read_r<br />
0200b364 00000054 T _sbrk_r<br />
0200b3b8 00000006 T _times_r<br />
0200b3c0 0000002c T _gettimeofday_r<br />
0200b3ec 00000014 T _times<br />
0200b400 0000004c T _write_r<br />
0200b450 00000014 T _exit<br />
0200b468 00000052 T build_argv<br />
0200b4bc 00000020 T __get_handle<br />
0200b4dc 0000003c T __alloc_handle<br />
0200b518 0000002c T __release_handle<br />
0200b544 00000014 T setDefaultDevice<br />
0200b558 0000007c T AddDevice<br />
0200b5d4 00000068 T FindDevice<br />
0200b63c 00000020 T GetDeviceOpTab<br />
0200b65c 00000024 T RemoveDevice<br />
0200b680 T __aeabi_idiv<br />
0200b680 00000094 T __divsi3<br />
0200b714 0000000e T __aeabi_idivmod<br />
0200b724 T __aeabi_drsub<br />
0200b72c 00000314 T __aeabi_dsub<br />
0200b72c 00000314 T __subdf3<br />
0200b730 00000310 T __adddf3<br />
0200b730 00000310 T __aeabi_dadd<br />
0200ba40 00000024 T __aeabi_ui2d<br />
0200ba40 00000024 T __floatunsidf<br />
0200ba64 00000028 T __aeabi_i2d<br />
0200ba64 00000028 T __floatsidf<br />
0200ba8c 00000040 T __aeabi_f2d<br />
0200ba8c 00000040 T __extendsfdf2<br />
0200bacc 00000074 T __aeabi_ul2d<br />
0200bacc 00000074 T __floatundidf<br />
0200bae0 00000060 T __aeabi_l2d<br />
0200bae0 00000060 T __floatdidf<br />
0200bb40 00000290 T __aeabi_dmul<br />
0200bb40 00000290 T __muldf3<br />
0200bdd0 0000020c T __aeabi_ddiv<br />
0200bdd0 0000020c T __divdf3<br />
0200bfdc 00000094 T __gedf2<br />
0200bfdc 00000094 T __gtdf2<br />
0200bfe4 0000008c T __ledf2<br />
0200bfe4 0000008c T __ltdf2<br />
0200bfec 00000084 T __cmpdf2<br />
0200bfec 00000084 T __eqdf2<br />
0200bfec 00000084 T __nedf2<br />
0200c070 00000034 T __aeabi_cdrcmple<br />
0200c08c 00000018 T __aeabi_cdcmpeq<br />
0200c08c 00000018 T __aeabi_cdcmple<br />
0200c0a4 00000018 T __aeabi_dcmpeq<br />
0200c0bc 00000018 T __aeabi_dcmplt<br />
0200c0d4 00000018 T __aeabi_dcmple<br />
0200c0ec 00000018 T __aeabi_dcmpge<br />
0200c104 00000018 T __aeabi_dcmpgt<br />
0200c11c 0000005c T __aeabi_d2iz<br />
0200c11c 0000005c T __fixdfsi<br />
0200c178 0000000c T __errno<br />
0200c184 0000000c T _ZdaPv<br />
0200c190 0000004c t _ZL21base_of_encoded_valuehP15_Unwind_Context<br />
0200c1dc 0000016c t _ZL17parse_lsda_headerP15_Unwind_ContextPKhP16lsda_header_info<br />
0200c348 0000073a T __gxx_personality_v0<br />
0200ca84 00000010 T _ZSt13set_terminatePFvvE<br />
0200ca94 00000010 T _ZSt14set_unexpectedPFvvE<br />
0200caa4 00000020 T _ZN10__cxxabiv111__terminateEPFvvE<br />
0200cac4 00000010 T _ZSt9terminatev<br />
0200cad4 0000000c T _ZN10__cxxabiv112__unexpectedEPFvvE<br />
0200cae0 00000010 T _ZSt10unexpectedv<br />
0200caf0 00000018 T _Znaj<br />
0200cb08 0000010e T _ZN9__gnu_cxx27__verbose_terminate_handlerEv<br />
0200cc18 00000010 T _ZdlPv<br />
0200cc28 000000f8 T __cxa_type_match<br />
0200cd20 00000062 T __cxa_begin_cleanup<br />
0200cd84 0000006a T __gnu_end_cleanup<br />
0200cdf0 00000020 T __cxa_bad_typeid<br />
0200ce10 00000020 T __cxa_bad_cast<br />
0200ce30 00000048 T __cxa_call_terminate<br />
0200ce78 00000122 T __cxa_call_unexpected<br />
0200cf9c 00000004 T __cxa_get_exception_ptr<br />
0200cfa0 00000012 T _ZSt18uncaught_exceptionv<br />
0200cfb4 00000086 T __cxa_end_catch<br />
0200d03c 00000086 T __cxa_begin_catch<br />
0200d0c4 0000000c T _ZNSt9exceptionD2Ev<br />
0200d0d0 0000000c T _ZNSt9exceptionD1Ev<br />
0200d0dc 0000000c T _ZNSt13bad_exceptionD2Ev<br />
0200d0e8 0000000c T _ZNSt13bad_exceptionD1Ev<br />
0200d0f4 0000000c T _ZN10__cxxabiv115__forced_unwindD2Ev<br />
0200d100 0000000c T _ZN10__cxxabiv115__forced_unwindD1Ev<br />
0200d10c 0000000c T _ZN10__cxxabiv119__foreign_exceptionD2Ev<br />
0200d118 0000000c T _ZN10__cxxabiv119__foreign_exceptionD1Ev<br />
0200d124 00000008 T _ZNKSt9exception4whatEv<br />
0200d12c 00000008 T _ZNKSt13bad_exception4whatEv<br />
0200d134 00000000 T _ZNKSt13bad_exhelpimtrappedinabinaryfactoryEv<br />
0200d134 0000001c T _ZN10__cxxabiv119__foreign_exceptionD0Ev<br />
0200d150 0000001c T _ZN10__cxxabiv115__forced_unwindD0Ev<br />
0200d16c 0000001c T _ZNSt9exceptionD0Ev<br />
0200d188 0000001c T _ZNSt13bad_exceptionD0Ev<br />
0200d1a4 00000008 T __cxa_get_globals_fast<br />
0200d1ac 00000008 T __cxa_get_globals<br />
0200d1b4 00000068 T __cxa_rethrow<br />
0200d21c 0000005c T __cxa_throw<br />
0200d278 00000034 t _ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP21_Unwind_Control_Block<br />
0200d2ac 00000026 T __cxa_current_exception_type<br />
0200d2d4 0000001c T _ZN10__cxxabiv123__fundamental_type_infoD1Ev<br />
0200d2f0 0000001c T _ZN10__cxxabiv123__fundamental_type_infoD2Ev<br />
0200d30c 00000020 T _ZN10__cxxabiv123__fundamental_type_infoD0Ev<br />
0200d32c 00000010 T _ZSt15set_new_handlerPFvvE<br />
0200d33c 00000008 T _ZNKSt9bad_alloc4whatEv<br />
0200d344 0000001c T _ZNSt9bad_allocD1Ev<br />
0200d360 0000001c T _ZNSt9bad_allocD2Ev<br />
0200d37c 00000020 T _ZNSt9bad_allocD0Ev<br />
0200d39c 0000006a T _Znwj<br />
0200d408 00000004 T _ZNK10__cxxabiv119__pointer_type_info14__is_pointer_pEv<br />
0200d40c 0000004c T _ZNK10__cxxabiv119__pointer_type_info15__pointer_catchEPKNS_17__pbase_type_infoEPPvj<br />
0200d458 0000001c T _ZN10__cxxabiv119__pointer_type_infoD1Ev<br />
0200d474 0000001c T _ZN10__cxxabiv119__pointer_type_infoD2Ev<br />
0200d490 00000020 T _ZN10__cxxabiv119__pointer_type_infoD0Ev<br />
0200d4b0 00000014 T __cxa_pure_virtual<br />
0200d4c4 0000002e T _ZNK10__cxxabiv120__si_class_type_info11__do_upcastEPKNS_17__class_type_infoEPKvRNS1_15__upcast_resultE<br />
0200d4f4 00000096 T _ZNK10__cxxabiv120__si_class_type_info12__do_dyncastEiNS_17__class_type_info10__sub_kindEPKS1_PKvS4_S6_RNS1_16__dyncast_resultE<br />
0200d58c 00000048 T _ZNK10__cxxabiv120__si_class_type_info20__do_find_public_srcEiPKvPKNS_17__class_type_infoES2_<br />
0200d5d4 0000001c T _ZN10__cxxabiv120__si_class_type_infoD1Ev<br />
0200d5f0 0000001c T _ZN10__cxxabiv120__si_class_type_infoD2Ev<br />
0200d60c 00000020 T _ZN10__cxxabiv120__si_class_type_infoD0Ev<br />
0200d62c 0000000c T _ZNSt9type_infoD2Ev<br />
0200d638 0000000c T _ZNSt9type_infoD1Ev<br />
0200d644 0000000c T _ZNKSt9type_infoeqERKS_<br />
0200d650 00000004 T _ZNKSt9type_info14__is_pointer_pEv<br />
0200d654 00000004 T _ZNKSt9type_info15__is_function_pEv<br />
0200d658 0000000c T _ZNKSt9type_info10__do_catchEPKS_PPvj<br />
0200d664 00000004 T _ZNKSt9type_info11__do_upcastEPKN10__cxxabiv117__class_type_infoEPPv<br />
0200d668 0000001c T _ZNSt9type_infoD0Ev<br />
0200d684 00000008 T _ZNKSt8bad_cast4whatEv<br />
0200d68c 0000001c T _ZNSt8bad_castD1Ev<br />
0200d6a8 0000001c T _ZNSt8bad_castD2Ev<br />
0200d6c4 00000020 T _ZNSt8bad_castD0Ev<br />
0200d6e4 00000008 T _ZNKSt10bad_typeid4whatEv<br />
0200d6ec 0000001c T _ZNSt10bad_typeidD1Ev<br />
0200d708 0000001c T _ZNSt10bad_typeidD2Ev<br />
0200d724 00000020 T _ZNSt10bad_typeidD0Ev<br />
0200d744 0000003e T _ZNK10__cxxabiv117__class_type_info11__do_upcastEPKS0_PPv<br />
0200d784 00000012 T _ZNK10__cxxabiv117__class_type_info20__do_find_public_srcEiPKvPKS0_S2_<br />
0200d798 00000020 T _ZNK10__cxxabiv117__class_type_info11__do_upcastEPKS0_PKvRNS0_15__upcast_resultE<br />
0200d7b8 0000004a T _ZNK10__cxxabiv117__class_type_info12__do_dyncastEiNS0_10__sub_kindEPKS0_PKvS3_S5_RNS0_16__dyncast_resultE<br />
0200d804 00000034 T _ZNK10__cxxabiv117__class_type_info10__do_catchEPKSt9type_infoPPvj<br />
0200d838 0000001c T _ZN10__cxxabiv117__class_type_infoD1Ev<br />
0200d854 0000001c T _ZN10__cxxabiv117__class_type_infoD2Ev<br />
0200d870 00000020 T _ZN10__cxxabiv117__class_type_infoD0Ev<br />
0200d890 00000002 t _GLOBAL__I___cxa_allocate_exception<br />
0200d894 0000003c T __cxa_free_dependent_exception<br />
0200d8d0 0000003c T __cxa_free_exception<br />
0200d90c 00000084 T __cxa_allocate_dependent_exception<br />
0200d990 00000088 T __cxa_allocate_exception<br />
0200da18 00000018 W _ZNK10__cxxabiv117__pbase_type_info15__pointer_catchEPKS0_PPvj<br />
0200da30 00000064 T _ZNK10__cxxabiv117__pbase_type_info10__do_catchEPKSt9type_infoPPvj<br />
0200da94 0000001c T _ZN10__cxxabiv117__pbase_type_infoD1Ev<br />
0200dab0 0000001c T _ZN10__cxxabiv117__pbase_type_infoD2Ev<br />
0200dacc 00000020 T _ZN10__cxxabiv117__pbase_type_infoD0Ev<br />
0200daec T _fini<br />
0200daf8 A __text_end<br />
0200e1dc 000000c4 r standard_subs<br />
0200e2a0 00000280 r cplus_demangle_builtin_types<br />
0200e520 00000350 r cplus_demangle_operators<br />
0200e884 00000004 R _global_impure_ptr<br />
0200e9ec 00000010 r blanks.3548<br />
0200e9fc 00000010 r zeroes.3549<br />
0200ea94 00000030 r lconv<br />
0200eadc 00000048 r JIS_state_table<br />
0200eb24 00000048 r JIS_action_table<br />
0200eb70 000000c8 R __mprec_tens<br />
0200ec38 0000000c r p05.2435<br />
0200ec48 00000028 R __mprec_bigtens<br />
0200ec70 00000028 R __mprec_tinytens<br />
0200ec98 0000005c R dotab_stdnull<br />
0200f4f8 00000014 R _ZTVN10__cxxabiv115__forced_unwindE<br />
0200f510 00000008 R _ZTISt9exception<br />
0200f518 00000014 R _ZTVSt9exception<br />
0200f530 00000008 R _ZTIN10__cxxabiv115__forced_unwindE<br />
0200f538 00000012 R _ZTSSt13bad_exception<br />
0200f54c 00000024 R _ZTSN10__cxxabiv119__foreign_exceptionE<br />
0200f594 00000008 R _ZTIN10__cxxabiv119__foreign_exceptionE<br />
0200f5a0 00000014 R _ZTVSt13bad_exception<br />
0200f5b8 0000000d R _ZTSSt9exception<br />
0200f5c8 00000014 R _ZTVN10__cxxabiv119__foreign_exceptionE<br />
0200f5e0 00000020 R _ZTSN10__cxxabiv115__forced_unwindE<br />
0200f600 0000000c R _ZTISt13bad_exception<br />
0200f60c 00000010 V _ZTIPKe<br />
0200f61c 00000010 V _ZTIPe<br />
0200f62c 00000008 V _ZTIe<br />
0200f634 00000010 V _ZTIPKd<br />
0200f644 00000010 V _ZTIPd<br />
0200f654 00000008 V _ZTId<br />
0200f65c 00000010 V _ZTIPKf<br />
0200f66c 00000010 V _ZTIPf<br />
0200f67c 00000008 V _ZTIf<br />
0200f684 00000010 V _ZTIPKy<br />
0200f694 00000010 V _ZTIPy<br />
0200f6a4 00000008 V _ZTIy<br />
0200f6ac 00000010 V _ZTIPKx<br />
0200f6bc 00000010 V _ZTIPx<br />
0200f6cc 00000008 V _ZTIx<br />
0200f6d4 00000010 V _ZTIPKm<br />
0200f6e4 00000010 V _ZTIPm<br />
0200f6f4 00000008 V _ZTIm<br />
0200f6fc 00000010 V _ZTIPKl<br />
0200f70c 00000010 V _ZTIPl<br />
0200f71c 00000008 V _ZTIl<br />
0200f724 00000010 V _ZTIPKj<br />
0200f734 00000010 V _ZTIPj<br />
0200f744 00000008 V _ZTIj<br />
0200f74c 00000010 V _ZTIPKi<br />
0200f75c 00000010 V _ZTIPi<br />
0200f76c 00000008 V _ZTIi<br />
0200f774 00000010 V _ZTIPKt<br />
0200f784 00000010 V _ZTIPt<br />
0200f794 00000008 V _ZTIt<br />
0200f79c 00000010 V _ZTIPKs<br />
0200f7ac 00000010 V _ZTIPs<br />
0200f7bc 00000008 V _ZTIs<br />
0200f7c4 00000010 V _ZTIPKh<br />
0200f7d4 00000010 V _ZTIPh<br />
0200f7e4 00000008 V _ZTIh<br />
0200f7ec 00000010 V _ZTIPKa<br />
0200f7fc 00000010 V _ZTIPa<br />
0200f80c 00000008 V _ZTIa<br />
0200f814 00000010 V _ZTIPKc<br />
0200f824 00000010 V _ZTIPc<br />
0200f834 00000008 V _ZTIc<br />
0200f83c 00000010 V _ZTIPKDi<br />
0200f84c 00000010 V _ZTIPDi<br />
0200f85c 00000008 V _ZTIDi<br />
0200f864 00000010 V _ZTIPKDs<br />
0200f874 00000010 V _ZTIPDs<br />
0200f884 00000008 V _ZTIDs<br />
0200f88c 00000010 V _ZTIPKw<br />
0200f89c 00000010 V _ZTIPw<br />
0200f8ac 00000008 V _ZTIw<br />
0200f8b4 00000010 V _ZTIPKb<br />
0200f8c4 00000010 V _ZTIPb<br />
0200f8d4 00000008 V _ZTIb<br />
0200f8dc 00000010 V _ZTIPKv<br />
0200f8ec 00000010 V _ZTIPv<br />
0200f8fc 00000008 V _ZTIv<br />
0200f904 00000004 V _ZTSPKe<br />
0200f908 00000003 V _ZTSPe<br />
0200f90c 00000002 V _ZTSe<br />
0200f910 00000004 V _ZTSPKd<br />
0200f914 00000003 V _ZTSPd<br />
0200f918 00000002 V _ZTSd<br />
0200f91c 00000004 V _ZTSPKf<br />
0200f920 00000003 V _ZTSPf<br />
0200f924 00000002 V _ZTSf<br />
0200f928 00000004 V _ZTSPKy<br />
0200f92c 00000003 V _ZTSPy<br />
0200f930 00000002 V _ZTSy<br />
0200f934 00000004 V _ZTSPKx<br />
0200f938 00000003 V _ZTSPx<br />
0200f93c 00000002 V _ZTSx<br />
0200f940 00000004 V _ZTSPKm<br />
0200f944 00000003 V _ZTSPm<br />
0200f948 00000002 V _ZTSm<br />
0200f94c 00000004 V _ZTSPKl<br />
0200f950 00000003 V _ZTSPl<br />
0200f954 00000002 V _ZTSl<br />
0200f958 00000004 V _ZTSPKj<br />
0200f95c 00000003 V _ZTSPj<br />
0200f960 00000002 V _ZTSj<br />
0200f964 00000004 V _ZTSPKi<br />
0200f968 00000003 V _ZTSPi<br />
0200f96c 00000002 V _ZTSi<br />
0200f970 00000004 V _ZTSPKt<br />
0200f974 00000003 V _ZTSPt<br />
0200f978 00000002 V _ZTSt<br />
0200f97c 00000004 V _ZTSPKs<br />
0200f980 00000003 V _ZTSPs<br />
0200f984 00000002 V _ZTSs<br />
0200f988 00000004 V _ZTSPKh<br />
0200f98c 00000003 V _ZTSPh<br />
0200f990 00000002 V _ZTSh<br />
0200f994 00000004 V _ZTSPKa<br />
0200f998 00000003 V _ZTSPa<br />
0200f99c 00000002 V _ZTSa<br />
0200f9a0 00000004 V _ZTSPKc<br />
0200f9a4 00000003 V _ZTSPc<br />
0200f9a8 00000002 V _ZTSc<br />
0200f9ac 00000005 V _ZTSPKDi<br />
0200f9b4 00000004 V _ZTSPDi<br />
0200f9b8 00000003 V _ZTSDi<br />
0200f9bc 00000005 V _ZTSPKDs<br />
0200f9c4 00000004 V _ZTSPDs<br />
0200f9c8 00000003 V _ZTSDs<br />
0200f9cc 00000004 V _ZTSPKw<br />
0200f9d0 00000003 V _ZTSPw<br />
0200f9d4 00000002 V _ZTSw<br />
0200f9d8 00000004 V _ZTSPKb<br />
0200f9dc 00000003 V _ZTSPb<br />
0200f9e0 00000002 V _ZTSb<br />
0200f9e4 00000004 V _ZTSPKv<br />
0200f9e8 00000003 V _ZTSPv<br />
0200f9ec 00000002 V _ZTSv<br />
0200f9f0 0000000c R _ZTIN10__cxxabiv123__fundamental_type_infoE<br />
0200f9fc 00000028 R _ZTSN10__cxxabiv123__fundamental_type_infoE<br />
0200fa28 00000020 R _ZTVN10__cxxabiv123__fundamental_type_infoE<br />
0200fa48 00000014 R _ZTVSt9bad_alloc<br />
0200fa60 0000000d R _ZTSSt9bad_alloc<br />
0200fa70 0000000c R _ZTISt9bad_alloc<br />
0200fa8c 00000001 R _ZSt7nothrow<br />
0200fa90 00000024 R _ZTSN10__cxxabiv119__pointer_type_infoE<br />
0200fab4 0000000c R _ZTIN10__cxxabiv119__pointer_type_infoE<br />
0200fac0 00000024 R _ZTVN10__cxxabiv119__pointer_type_infoE<br />
0200fb08 0000002c R _ZTVN10__cxxabiv120__si_class_type_infoE<br />
0200fb38 0000000c R _ZTIN10__cxxabiv120__si_class_type_infoE<br />
0200fb44 00000025 R _ZTSN10__cxxabiv120__si_class_type_infoE<br />
0200fb6c 00000008 R _ZTISt9type_info<br />
0200fb74 0000000d R _ZTSSt9type_info<br />
0200fb88 00000020 R _ZTVSt9type_info<br />
0200fba8 0000000c R _ZTISt8bad_cast<br />
0200fbb4 0000000c R _ZTSSt8bad_cast<br />
0200fbc0 00000014 R _ZTVSt8bad_cast<br />
0200fbe8 00000014 R _ZTVSt10bad_typeid<br />
0200fc00 0000000c R _ZTISt10bad_typeid<br />
0200fc1c 0000000f R _ZTSSt10bad_typeid<br />
0200fc30 0000002c R _ZTVN10__cxxabiv117__class_type_infoE<br />
0200fc60 0000000c R _ZTIN10__cxxabiv117__class_type_infoE<br />
0200fc6c 00000022 R _ZTSN10__cxxabiv117__class_type_infoE<br />
0200fc90 0000000c R _ZTIN10__cxxabiv117__pbase_type_infoE<br />
0200fc9c 00000022 R _ZTSN10__cxxabiv117__pbase_type_infoE<br />
0200fcc0 00000024 R _ZTVN10__cxxabiv117__pbase_type_infoE<br />
0200ff44 A __exidx_start<br />
02010364 A __exidx_end<br />
02010364 t __frame_dummy_init_array_entry<br />
02010364 A __init_array_start<br />
02010364 A __preinit_array_end<br />
02010364 A __preinit_array_start<br />
0201036c t __do_global_dtors_aux_fini_array_entry<br />
0201036c A __fini_array_start<br />
0201036c A __init_array_end<br />
02010370 r __EH_FRAME_BEGIN__<br />
02010370 A __fini_array_end<br />
02011114 r __FRAME_END__<br />
02011118 d __JCR_END__<br />
02011118 d __JCR_LIST__<br />
0201111c A __data_start<br />
0201111c D __dso_handle<br />
0201111c A __ewram_start<br />
02011120 00000004 D fifo_freewords<br />
02011124 00000004 D fifo_send_queue<br />
02011128 00000004 D fifo_buffer_free<br />
0201112c 00000004 D fifo_receive_queue<br />
02011130 00000004 D _impure_ptr<br />
02011138 00000428 d impure_data<br />
02011560 00000408 D __malloc_av_<br />
02011968 00000004 D __malloc_sbrk_base<br />
0201196c 00000004 D __malloc_trim_threshold<br />
02011970 00000004 d charset<br />
02011974 0000000c d last_lc_ctype.1268<br />
02011980 0000000c D __lc_ctype<br />
0201198c 0000000c d last_lc_messages.1270<br />
02011998 0000000c d lc_messages.1269<br />
020119a4 00000004 D __mb_cur_max<br />
020119a8 00000004 d defaultDevice<br />
020119ac 00000040 D devoptab_list<br />
020119ec 00000004 D _ZN10__cxxabiv119__terminate_handlerE<br />
020119f0 00000004 D _ZN10__cxxabiv120__unexpected_handlerE<br />
020119f4 A __bss_start<br />
020119f4 A __bss_start__<br />
020119f4 A __bss_vma<br />
020119f4 A __data_end<br />
020119f4 A __dtcm_lma<br />
020119f4 A __itcm_lma<br />
020119f4 b completed.2775<br />
020119f8 b object.2787<br />
02011a10 00000004 b __timeout<br />
02011a14 00000004 B processing<br />
02011a18 00000001 b _ZZN9__gnu_cxx27__verbose_terminate_handlerEvE11terminating<br />
02011a1c 0000000c b _ZL10eh_globals<br />
02011a28 00000004 B __new_handler<br />
02011a2c 00000004 b _ZL15dependents_used<br />
02011a30 000001e0 b _ZL17dependents_buffer<br />
02011b84 A __vectors_lma<br />
02011c10 00000004 b _ZL14emergency_used<br />
02011c18 00000800 b _ZL16emergency_buffer<br />
02012418 00000004 B __malloc_top_pad<br />
0201241c 00000028 B __malloc_current_mallinfo<br />
02012444 00000004 B __malloc_max_sbrked_mem<br />
02012448 00000004 B __malloc_max_total_mem<br />
0201244c 00000004 B __nlocale_changed<br />
02012450 00000004 B __mlocale_changed<br />
02012454 00000004 B _PathLocale<br />
02012458 00000004 b heap_start.2602<br />
0201245c 00000004 B fake_heap_end<br />
02012460 00000004 B fake_heap_start<br />
02012464 00000008 B __syscalls<br />
0201246c 00001000 b handles<br />
0201346c 00000004 B theTime<br />
02013470 00000040 B fifo_datamsg_data<br />
020134b0 00000800 B fifo_buffer<br />
02013cb0 00000040 B fifo_value32_func<br />
02013cf0 00000040 B fifo_address_func<br />
02013d30 00000040 B fifo_value32_data<br />
02013d70 00000040 B fifo_value32_queue<br />
02013db0 00000040 B fifo_data_queue<br />
02013df0 00000040 B fifo_address_data<br />
02013e30 00000040 B fifo_datamsg_func<br />
02013e70 00000040 B fifo_address_queue<br />
02013eb0 00000004 B punixTime<br />
02013eb4 A __bss_end<br />
02013eb4 A __bss_end__<br />
02013eb4 A __end__<br />
02013eb4 A _end<br />
023ff000 A __eheap_end<br />
023ff000 A __ewram_end<br />
027fff70 a _libnds_argv<br />
0b000000 A __dtcm_end<br />
0b000000 A __dtcm_start<br />
0b000000 A __sbss_end<br />
0b000000 A __sbss_start<br />
0b000000 A __sbss_start__<br />
0b003d00 A __sp_usr<br />
0b003e00 A __sp_irq<br />
0b003f00 A __sp_svc<br />
0b003ff8 A __irq_flags<br />
0b003ffc A __irq_vector<br />
0b004000 A __dtcm_to</div>
</div>
<p>
Well, I did say this was going to be the <i>long</i> story, didn&#8217;t I?<br />
Everything that was in the base project is in here as well. The<br />
additional parts can summarized as follows.
</p>
<div class="none">
<div class="none proglist" style=" "><span class="co2"># Additions w.r.t the base case.</span><br />
02001440 &#8211; 02004398 &nbsp; &nbsp; 2F58&nbsp; &nbsp; : d_* routines<br />
020043a4 &#8211; 02004434 &nbsp; &nbsp; 0090&nbsp; &nbsp; : software div (__aeabi_uidiv etc)<br />
02004434 &#8211; 02005408 &nbsp; &nbsp; 0FD4&nbsp; &nbsp; : exception unwind routines<br />
02005418 &#8211; 0200b680 &nbsp; &nbsp; <span class="nu0">6268</span>&nbsp; &nbsp; : various libc : printf et al, malloc,mem*,locale, Device, etc<br />
0200b680 &#8211; 0200c184 &nbsp; &nbsp; 0B04&nbsp; &nbsp; : div and FP math routines. (for printf)<br />
0200c190 &#8211; 0200daf8 &nbsp; &nbsp; <span class="nu0">1968</span>&nbsp; &nbsp; : exception/typeinfo routines.<br />
0200daf8 A __text_end<br />
0200e1dc &#8211; 0200ff44 &nbsp; &nbsp; 1D68&nbsp; &nbsp; : exception/typeinfo strings and pointers.<br />
02013eb4 A _end</div>
</div>
<p>
There are three main areas to discern:
</p>
<ul>
<li>
    <code>d_*()</code> routines, presumably for debug printing.<br />
	(size: 12k)
  </li>
<li>
    Stdio formatting and related. This includes file handling, device<br />
	handling and many forms of <code>printf</code>, which brings a<br />
	whole lot of bagage (some allocation, format parsing and<br />
	math/floating point routines). There&#8217;s also some abort and<br />
	signalling routines. (size: 28k).
  </li>
<li>
    Exception handling. Not just routines for handling them, but also<br />
	the typeinfo stuff required, the output strings and the output<br />
	string pointers. (size: 18k)
  </li>
</ul>
<p>
These roughly 60k of stuff is the overhead of exceptions &ndash;<br />
any <i>potential</i> exception. In this case, it&#8217;s because<br />
<code>new</code> requires a <code>bad_alloc</code> exception when it&#8217;s<br />
unable to allocate more.
</p>
<p>
The problem is that exceptions have<br />
many dependencies: to do exception handling, you keep track and unwind<br />
the stack. You also need to be able to tell the type of exception<br />
thrown, which requires RTTI. And then you say which exception was<br />
thrown, so you need error messages, <i>and</i> a list of pointers to<br />
those messages, <i>and</i> a way to format and write those messages,<br />
hence the <code>d_*()</code> routines and all the stdio stuff.
</p>
<p><h2 id="sec-own-new">3
Custom new/delete
</h2>
</p>
<p>
There is a way around this, though: redefine <code>new</code> and<br />
related functions. Technically speaking, this is a <i>bad</i> idea<br />
if you don&#8217;t know what you&#8217;re doing, but it can be done. Note that<br />
you would need overload four operators: <code>new</code>,<br />
<code>delete</code> and their array counterparts.
</p>
<div class="none">
<div class="none proglist" style=" ">void* operator new(size_t size) &nbsp; &nbsp; { &nbsp; return malloc(size);&nbsp; &nbsp; }</p>
<p>void operator delete(void *p) &nbsp; &nbsp; &nbsp; { &nbsp; free(p);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</p>
<p>void* operator new[](size_t size) &nbsp; { &nbsp; return malloc(size);&nbsp; &nbsp; }</p>
<p>void operator delete[](void *p) &nbsp; &nbsp; { &nbsp; free(p);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</p></div>
</div>
<p>
This way, you just incur the cost of <code>malloc()</code> and<br />
<code>free()</code>, which are only about 3k. But again, this is going<br />
against the standard and you&#8217;ll really have to ask yourself if the<br />
(at best) 2% of main RAM you save with this is really worth it.
</p>
<p><div>&nbsp;</div></p>
<p>More on this can be read at<br />
<a href="http://brewforums.qualcomm.com/showthread.php?t=2033"><br />
http://brewforums.qualcomm.com/showthread.php?t=2033</a>.</p>
<p><h2 id="sec-conc">4
Other considerations and conclusions.
</h2>
</p>
<p>
The binary size is <i>not</i> the same as the main RAM footprint.<br />
About 44 kb other stuff.
</p>
<p>
The overhead of the standard <code>new</code> is 60 kb, which is all<br />
due to exceptions. You <i>cannot</i> remove it by<br />
using the compiler options <tt>-fno-exceptions</tt> and<br />
<tt>-fno-rtti</tt>, because that only affects your own code, not the<br />
standard libraries. You can remove this overhead by using overloading<br />
<code>new</code> and related functions, but you have to be really<br />
careful with this.
</p>
<p>
I&#8217;ve also done a little bit of testing with <code>vector</code>, and<br />
it seems that <code>vector</code>&#8217;s overhead also comes from<br />
<code>new</code> and can be removed the same way. However, other parts<br />
of <code>vector</code> (and STL) may use other exceptions, so it&#8217;s<br />
quite possible it won&#8217;t work in all cases.
</p>
<p>
Note that roughly 28 kb of the exception overhead is actually<br />
stdio related &ndash; specifically formatted printing:<br />
<code>*printf</code>. If you&#8217;re using <code>printf</code> anyway, the<br />
effective overhead of exceptions is reduced considerably.
</p>
<p>
Finally, remember that the exception overhead amounts to roughly 2% of<br />
main RAM at most. In most homebrew cases it won&#8217;t matter that much.<br />
When it does start to affect your app, you will likely have other parts<br />
that are easier and safer to optimize out.
</p>
<p><div>&nbsp;</div></p>
<p><a href="[FILEBASE]minimal.zip">Test project + notes.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/11/sizeof-new/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Filter juggling and comment preview</title>
		<link>http://www.coranac.com/2009/08/filter-juggling-and-comment-preview/</link>
		<comments>http://www.coranac.com/2009/08/filter-juggling-and-comment-preview/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 11:49:50 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[blag]]></category>
		<category><![CDATA[codesnippet]]></category>
		<category><![CDATA[comment preview]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=103</guid>
		<description><![CDATA[
One of the nice features of WordPress is that it already has a lot of
functionality built-in. The whole thing is set up so that normal people
can just install and start writing posts immediately, with WordPress
taking care of all the details like converting
HTML entities
and adding newline where appropriate.


Of course, for those that aren&#8217;t normal and that [...]]]></description>
			<content:encoded><![CDATA[<p>
One of the nice features of WordPress is that it already has a lot of<br />
functionality built-in. The whole thing is set up so that normal people<br />
can just install and start writing posts immediately, with WordPress<br />
taking care of all the details like converting<br />
<a href="http://www.w3schools.com/tags/ref_entities.asp">HTML entities</a><br />
and adding newline where appropriate.
</p>
<p>
Of course, for those that aren&#8217;t normal and that would like to write<br />
in raw HTML, these things are somewhat annoying. Fortunately, though,<br />
WordPress allows you to disable these kinds of filters. The catch is<br />
that you need to find out which filters to disable, namely,<br />
<code>wptexturize</code> (which converts HTML entities) and<br />
<code>wpautop</code> (which does newline control). WordPress also makes<br />
it easy add additional filters, like the<br />
<a href="http://blog.hackerforhire.org/code-snippet/"<br />
rel="pingback">CodeSnippet plugin</a> that I use for code highlighting.
</p>
<p>
However, with the amount of filters available, sometimes things will<br />
clash. A good example of this is comments that have source code<br />
in them. Part of what CodeSnippet does is convert certain characters<br />
(specifically: &lsquo;&lt;&rsquo;, &lsquo;&gt;&rsquo;,<br />
&lsquo;&amp;&rsquo;) to printable characters<br />
(&amp;lt;, &amp;gt;, &amp;amp;) and aren&#8217;t considered special HTML<br />
characters anymore. However, there are several other filters that<br />
have a similar task, so that when you write this:</p>
<p><!-- For example, for comments I still have<br />
<code>wptexturize</code> and <code>wpautop</code> enabled, since in<br />
all probability most readers here are merely slightly odd at best.<br />
Besides that, this is the expected comment behaviour and makes sure<br />
no-one gets to add &lt;script&gt; tags and all. -->
</p>
<p><blockquote>
<br />
Oh hai! This is a useful bitfield function.<div>&nbsp;</div><br />
&#91;code lang="cpp"]<br/><br />
template&lt;class T&gt;<br/><br />
inline void bfInsert(T &amp;y, u32 x, int start, int len)<br/><br />
{<br/><br />
 &nbsp; &nbsp;u32 mask= ((1&lt;&lt;len)-1) &lt;&lt; start;<br/><br />
 &nbsp; &nbsp;y &amp;= ~mask;<br/><br />
 &nbsp; &nbsp;y |= (x&lt;&lt;start) &amp; mask;<br/><br />
}<br/><br />
[/code]<br />

</blockquote>
</p>
<p>what it becomes is:</p>
<p><blockquote>
<br />
Oh hai! This is a useful bit function.</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">template</span><br />
<span class="kw1">inline</span> <span class="kw1">void</span> bfInsert(T &amp;amp;y, u32 x, <span class="kw1">int</span> start, <span class="kw1">int</span> len)<br />
{<br />
&nbsp; &nbsp; u32 mask= (<span class="nu0">1</span>&amp;lt;&amp;lt;len) &amp;lt;&amp;lt; start;<br />
&nbsp; &nbsp; y &amp;amp;= ~mask;<br />
&nbsp; &nbsp; y |= (x&amp;lt;&amp;lt;start) &amp;amp; mask;<br />
}</div>
</div>
<p>
</blockquote>
</p>
<p>
Not exactly pretty. Note that the template class is simply removed<br />
because it's seen as an illicit HTML tag, and all the special<br />
characters are doubly converted. This is still a mild example; I think<br />
if you place the angle brackets wrong, whole swaths of code can<br />
simply be eaten by the sanitizer.
</p>
<p>
Unfortunately, finding out where the problem lies is tricky. Not<br />
only are there dozens of potential functions doing the conversion,<br />
they can be called from anywhere and PHP isn't exactly rich in the<br />
debugger department. You also have no idea where to start, because<br />
the filters can be called from everywhere. Worse still, in this<br />
particular case the place where the bad happens is actually before<br />
the comment is even saved to the database (but only for unregistered<br />
people; for me the code comments would work fine), and because comments<br />
are handled on a page that you don't actually ever see, random<br />
echo/print statements are useless as well.
</p>
<p>
But I think I finally got it: it was<br />
<code>wp_kses()</code> using (in a roundabout way)<br />
<code>wp_specialchars()</code> in the <tt>wp-includes/kses.php</tt><br />
<s>room</s>file. The contractor is actually<br />
<code>wp_filter_comment()</code> from <tt>wp-includes/comment.php</tt>,<br />
using the <code>pre_comment_content</code> filter as a middleman.
</p>
<p>
The trick now is to keep it from happening. What I've done is define<br />
not one but two <code>pre_comment_content</code> filters: one that<br />
pre-mangles the brackets and ampersand before <code>wp_kses</code>,<br />
and one that de-mangles them afterwards. Of course, this will only<br />
be of importance between &#91;code] tags. Exactly how to do this will<br />
depend on the plugin you're using, but in the case of<br />
CodeSnippet it goes like this:
</p>
<div class="php">
<div class="php proglist" style=" "><span class="co1">//# Put this along with the other add_filter() calls.</span></p>
<p><span class="co1">// Ensure in-\&amp;#91;code] entities ('&lt;&gt;&amp;') work out right in the end.</span><br />
add_filter(<span class="st_h">'pre_comment_content'</span>, <a href="http://www.php.net/array"><span class="kw3">array</span></a>(&amp;<span class="re0">$CodeSnippet</span>, <span class="st_h">'filterDeEntity'</span>), <span class="nu0">1</span>);<br />
add_filter(<span class="st_h">'pre_comment_content'</span>, <a href="http://www.php.net/array"><span class="kw3">array</span></a>(&amp;<span class="re0">$CodeSnippet</span>, <span class="st_h">'filterReEntity'</span>), <span class="nu0">50</span>);</p>
<p>...</p>
<p><span class="co1">//# Add these methods to the CodeSnippet class.</span><br />
&nbsp; &nbsp; <span class="co4">/** <br />
&nbsp; &nbsp; &nbsp;* Pre-encode HTML entities. Should come \e before wp_kses.<br />
&nbsp; &nbsp; &nbsp;*/</span><br />
&nbsp; &nbsp; <span class="kw2">function</span> filterDeEntity(<span class="re0">$content</span>)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= &nbsp;<a href="http://www.php.net/preg_replace"><span class="kw3">preg_replace</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'#(\[code.*?\])(.*?)(\[/code\])#msie'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'&quot;\\1&quot; . str_replace(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;&lt;&quot;, &quot;&gt;&quot;, &quot;&amp;&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;[|LT|]&quot;, &quot;[|GT|]&quot;, &quot;[|AMP|]&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \'\\2\') . &quot;\\3&quot;;'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= <a href="http://www.php.net/str_replace"><span class="kw3">str_replace</span></a>(<span class="st_h">'\&quot;'</span>, <span class="st_h">'&quot;'</span>, <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="re0">$content</span>;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; <span class="co4">/** <br />
&nbsp; &nbsp; &nbsp;* Decode HTML entities. Should come \e after wp_kses.<br />
&nbsp; &nbsp; &nbsp;*/</span> <br />
&nbsp; &nbsp; <span class="kw2">function</span> filterReEntity(<span class="re0">$content</span>)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(<a href="http://www.php.net/strstr"><span class="kw3">strstr</span></a>(<span class="re0">$content</span>, <span class="st0">&quot;[|&quot;</span>))<br />
&nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= <a href="http://www.php.net/preg_replace"><span class="kw3">preg_replace</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'#(\[code.*?\])(.*?)(\[/code\])#msie'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">'&quot;\\1&quot; . str_replace(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;[|LT|]&quot;, &quot;[|GT|]&quot;, &quot;[|AMP|]&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array(&quot;&lt;&quot;, &quot;&gt;&quot;, &quot;&amp;&quot;), <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \'\\2\') . &quot;\\3&quot;;'</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="re0">$content</span>= <a href="http://www.php.net/str_replace"><span class="kw3">str_replace</span></a>(<span class="st_h">'\&quot;'</span>, <span class="st_h">'&quot;'</span>, <span class="re0">$content</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="re0">$content</span>;<br />
&nbsp; &nbsp; }</div>
</div>
<p>
Notice that both methods are under the same filter group. The trick<br />
is that they have different priorities, which makes one act before<br />
<code>wp_kses()</code>, and one after. Also note how the regexps work<br />
in the replacement part of <code>preg_replace()</code>. This particular<br />
feature of <code>preg_replace()</code> allows for shorter code, but is<br />
<i>very</i> fragile; it may be better to use<br />
<code>preg_replace_callback()</code> instead. In any case, written like<br />
this it seems to work:
</p>
<p><blockquote>
Oh hai! This is a useful bit function. </p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">template</span>&lt;<span class="kw1">class</span> T&gt;<br />
<span class="kw1">inline</span> <span class="kw1">void</span> bfInsert(T &amp;y, u32 x, <span class="kw1">int</span> start, <span class="kw1">int</span> len)<br />
{<br />
&nbsp; &nbsp;u32 mask= ((<span class="nu0">1</span>&lt;&lt;len)-<span class="nu0">1</span>)&lt;&lt;start;<br />
&nbsp; &nbsp;y &amp;= ~mask;<br />
&nbsp; &nbsp;y |= (x&lt;&lt;start) &amp; mask;<br />
}</div>
</div>
<p>
</blockquote>
</p>
<h4>Comment preview</h4>
<p>
The code-comment mangling is just part of the issues one can<br />
encounter in blog comments. It's usually impossible to see beforehand<br />
what will be accepted and what not. Is HTML allowed? Are all tags<br />
allows, or just some or none at all? What about whitespace? Or<br />
BB-like tags? Basically, you'll never know what a comment will look<br />
like until you submitted it, and by then it's too late to change it.
</p>
<p>
You know what'd be really helpful? A <b>comment preview</b>!
</p>
<p>
You'd think this'd be a fairly obvious feature for a blogging<br />
system to have, but apparently not.<br />
I was thinking of making by own preview functionality, but when<br />
attempting to do so several items within WP thwarted my efforts.<br />
Fortunately, it seems plugins of this sort exist already. The plugin<br />
I'm now using is <a href="http://blogwaffe.com/ajax-comment-preview/"<br />
rel="pingback">ajax-comment-preview</a>, which works pretty darn well.
</p>
<p><div>&nbsp;</div></p>
<p>
So anyway, comments should be able to handle code properly now and<br />
there's a comment-preview to show you what the comment will look<br />
like in the end. And there was much rejoicing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/08/filter-juggling-and-comment-preview/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Signs from Hell</title>
		<link>http://www.coranac.com/2009/08/signs-from-hell/</link>
		<comments>http://www.coranac.com/2009/08/signs-from-hell/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 19:42:24 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=100</guid>
		<description><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />


The integer datatypes in C can be either signed or unsigned. Sometimes,
it&#8217;s obvious which should be used; for negative values you clearly
should use signed types, for example. In many cases there is no obvious
choice &#8211; in that case it usually doesn&#8217;t matter which you use.
Usually, but not always. Sometimes, picking the wrong kind
can introduce subtle [...]]]></description>
			<content:encoded><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<p></p>
<p>
The integer datatypes in C can be either signed or unsigned. Sometimes,<br />
it&#8217;s obvious which should be used; for negative values you clearly<br />
should use signed types, for example. In many cases there is no obvious<br />
choice &ndash; in that case it usually doesn&#8217;t matter which you use.<br />
Usually, but not always. <i>Sometimes</i>, picking the wrong kind<br />
can introduce subtle bugs in your program that, unless you know what<br />
to look out for, can catch you off-guard and have you searching for<br />
the problem for hours.
</p>
<p>
I&#8217;ve mentioned a few of these occasions in Tonc<br />
<a href="/tonc/text/affine.htm#ssec-fin-type">here</a> and<br />
<a href="/tonc/text/numbers.htm#ssec-bits-int">there</a>, but I think<br />
it&#8217;s worth going over them again in a little more detail. First, I&#8217;ll<br />
explain how signed integers work and what the difference between signed<br />
and unsigned and where potential problems can come from. Then I&#8217;ll<br />
discuss some common pitfalls so you know what to expect.
</p>
<p><ul>
  <li> <a href="#sec-basics">1
Basics
</a> </li>
  <li> <a href="#sec-prob">2
Potential problems
</a> </li>
  <li> <a href="#sec-summary">3
Summary
</a> </li>
</ul>
</p>
<p><h2 id="sec-basics">1
Basics
</h2>
</p>
<p>
The <dfn>signedness</dfn> of a variable refers to whether it can be<br />
used to represent negative values or not. Unsigned variables can only<br />
have positive values; signed values can be both positive or negative.
</p>
<p>
In the computer world, signedness is mostly a matter of interpretation.<br />
Say you have a variable that is <i>N</i> bits long. This is enough<br />
room for 2<sup>N</sup> distinct numbers, but it says nothing about<br />
which range of numbers you should be using them for. Interpreted as<br />
unsigned integers, its range would be [0,&nbsp;2<sup>N</sup>&minus;1].<br />
Under a signed interpretation, you&#8217;d use some bit-patterns for negative<br />
numbers. There are actually several ways of doing this, but the most<br />
commonly used is known as 
<a href="http://en.wikipedia.org/wiki/two%26%238217%3Bs%20complement">two&#8217;s complement</a> which leads to<br />
a [&minus;2<sup>N&minus;1</sup>,&nbsp;2<sup>N&minus;1</sup>&minus;1] range:<br />
half positive and half negative.
</p>
<p><h3 id="ssec-base-twos">1.1
Two&#8217;s complement theory
</h3>
</p>
<p>
Two&#8217;s complement is sometimes seen as an awkward system, but it<br />
actually follows quite naturally when you only have a fixed number<br />
of digits to write down numbers with.<br />
Consider the whole line of positive and negative integers. As you<br />
move away from zero, the numbers will grow larger and larger.<br />
Now suppose you have an<br />

<a href="http://en.wikipedia.org/wiki/Counter%23Mechanical_counters">counting device</a><br />
composed of a limited number of digits, each of which can only display<br />
numbers 0 through 10&minus;1. With <i>N</i> digits, you only have<br />
room for 10<sup>N</sup> different numbers, and once those are used up<br />
(at 10<sup>N</sup>&minus;1), the counter returns to 0 and counting<br />
effectively resets. In essence, the number on the counter works in<br />
modulo 10<sup>N</sup>.
</p>
<p>
The key is that this works in both positive and negative directions.<br />
As far as the counter is concerned, 0 and 10<sup>N</sup> are the<br />
same thing. This being the case, you can argue that &minus;1<br />
(that is, the number before zero) is equivalent to 10<sup>N</sup>&minus;1;<br />
and &minus;2&nbsp;&equiv;&nbsp;10<sup>N</sup>&minus;2, and so on.<br />
Note that this works regardless of what 10 actually is; it can be<br />
ten (decimal), two (binary) or sixteen (hexadecimal).
</p>
<p>
The 10<sup>N</sup> possible numbers form a window over the number line,<br />
but where the window starts is up to the user. For signed numbers,<br />
you can move the window so that the upper half of the 10<sup>N</sup><br />
range is interpreted as negative numbers.
</p>
<p><div>&nbsp;</div></p>
<p>
Fig&nbsp;1 shows how this works for 8-bit numbers<br />
(written in hex for convenience). The black numbers represent the<br />
entire number line, where numbers can have as many digits as you<br />
need. With only two 
<a href="http://en.wikipedia.org/wiki/nybble">nybble</a>s, the counter repeats every<br />
100h&nbsp;=&nbsp;256 values. FFh, 1FFh, but also &minus;1 all reduce to the same<br />
symbol, namely FFh. In Fig&nbsp;2 you can see<br />
how the available symbols are mapped to either signed or unsigned<br />
values. In the unsigned case, numbers simply count from 0 to FFh;<br />
for signed, the top half of the symbol range is put on the left side<br />
of zero and are used for negative numbers.
</p>
<div class=cblock>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-numline"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;1. </b>
</div>
</p>
<p><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-signedness"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;2. </b>
</div>

</p></div>
<p><div>&nbsp;</div></p>
<p>
The mathematical reason behind all this like this. Assume for<br />
convenience that <i>N</i>&nbsp;=&nbsp;1, so that 0 is equivalent to 10 and<br />
in fact every multiple of 10. By definition, subtracting a value<br />
from itself gives 0. Because subtraction is merely addition by its<br />
negative value, you get the following:
</p>
<p><table class="eqtbl" id="eq-complement-def">
<tr>
<td class="eqnrcell">(1)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20x%20%26-%26%20x%20%26%3D%26%200%26%20%5C%5C%20x%20%26%2B%26%20%28-x%29%20%26%3D%26%200%20%26%20%5C%5C%20x%20%26%2B%26%20%28-x%29%20%26%5Cequiv%26%2010%20%26%20%5C%5C%20%26%20%26%20%28-x%29%20%26%3D%26%2010%20%26-%20x%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} x &amp;-&amp; x &amp;=&amp; 0&amp; \\ x &amp;+&amp; (-x) &amp;=&amp; 0 &amp; \\ x &amp;+&amp; (-x) &amp;\equiv&amp; 10 &amp; \\ &amp; &amp; (-x) &amp;=&amp; 10 &amp;- x \end{eqnarray}"<br />
	alt="\begin{eqnarray} x &amp;-&amp; x &amp;=&amp; 0&amp; \\ x &amp;+&amp; (-x) &amp;=&amp; 0 &amp; \\ x &amp;+&amp; (-x) &amp;\equiv&amp; 10 &amp; \\ &amp; &amp; (-x) &amp;=&amp; 10 &amp;- x \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
The term &minus;<i>x</i> in the last step should be seen as a unit,<br />
call it <i>C</i>. Numerically, <i>C</i> is the number that, when added<br />
to <i>x</i>, gives 10. In decimal, if <i>x</i>&nbsp;=&nbsp;1, then <i>C</i>&nbsp;=&nbsp;9.<br />
<i>C</i> is called the 10&#8217;s <dfn>complement</dfn> of <i>x</i>, because<br />
it&#8217;s what&#8217;s needed to complete the 10. It&#8217;s called the two&#8217;s<br />
complement in binary, because then 10 equals two.
</p>
<p>
In binary, there is an alternative to calculate the twos complement<br />
of a number. Subtracting a number from 2<sup>N</sup> is equivalent to<br />
inverting all its bits, so you get:
</p>
<p><table class="eqtbl" id="eq-complement-bin">
<tr>
<td class="eqnrcell">(2)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20%28-x%29%20%26%3D%26%202%5EN%20-%20x%20%5C%5C%20%26%3D%26%202%5EN%20-1%20-%20x%20%2B%201%20%5C%5C%20%28-x%29%20%26%3D%26%20%5Csim%20x%20%2B%201%20%5C%5C%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} (-x) &amp;=&amp; 2^N - x \\ &amp;=&amp; 2^N -1 - x + 1 \\ (-x) &amp;=&amp; \sim x + 1 \\ \end{eqnarray}"<br />
	alt="\begin{eqnarray} (-x) &amp;=&amp; 2^N - x \\ &amp;=&amp; 2^N -1 - x + 1 \\ (-x) &amp;=&amp; \sim x + 1 \\ \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
Using two&#8217;s complement<span class="fnote"><a href="#ft-nr1" title="Or any 10&#8217;s complement, really.">(1)</a></span><br />
for negative numbers has some interesting<br />
properties. First, subtraction and addition are basically the same<br />
thing. This is nice for arithmetic implementers for two reasons:<br />
the same hardware can be used for both operations, and it can be used<br />
for both positive and negative numbers.
</p>
<p>
Second, because the top half<br />
is now used for negative numbers, the most significant bit can be<br />
seen as a sign bit. Note: <i>a</i> sign bit, not <i>the</i> sign bit.<br />
There is a subtle linguistic difference here. When talking about<br />
<i>the</i><br />
sign bit, one may thing of it as a single bit that indicates the sign.<br />
For example, 8-bit +1 and &minus;1 could be `<code><b>0</b>000&nbsp;0001</code>&#8216;<br />
and `<code><b>1</b>000 0001</code>&#8216;, respectively. In two&#8217;s complement,<br />
however, +1 and &minus;1 are actually `<code>0000&nbsp;0001</code>&#8216;<br />
and `<code>11111111</code>&#8216; (the sum of which is<br />
`<code>1,00000000</code>&#8216;&nbsp;&equiv;&nbsp;<code>0</code>, as<br />
it should be).
</p>
<p><h3 id="ssec-base-decl">1.2
Declaring signed or unsigned
</h3>
</p>
<p>
In the end, whether a particular group of bits is signed or unsigned is<br />
a matter of interpretation. For example, the 8-bit group<br />
`<code>1111&nbsp;1111</code>&#8216; can be either 255 or &minus;1, depending on<br />
how you <i>want</i> to look at it. You can&#8217;t determine the signedness<br />
from just the bits themselves.
</p>
<p>
Also, when you&#8217;ve decided you&#8217;re going to use a signed interpretation,<br />
whether the group forms negative number or not depends on the size of<br />
the group. for example, consider the two bytes `<code>01 FF</code>&#8216;.<br />
As separate bytes, these would form +1 and &minus;1, respectively.<br />
However, if you view them as a single 16-bit integer<br />
(&lsquo;short&rdquo;), it forms 0&#215;01FF, which is a positive number.
</p>
<p><div>&nbsp;</div></p>
<p>
In C, you specify signedness when you declare a variable. The general<br />
rule is that an integer is signed unless the keyword<br />
`<code>unsigned</code>&#8216; is used. The exception to the rule is<br />
`<code>char</code>&#8216;, whose default signedness is platform and<br />
compiler-dependent! Be careful with this particular datatype.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">int</span> ia; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Signed integer.</span><br />
<span class="kw1">unsigned</span> <span class="kw1">int</span> ib;&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unsigned integer.</span></p>
<p><span class="kw1">short</span> sa; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Signed 16-bit integer.</span><br />
<span class="kw1">unsigned</span> <span class="kw1">short</span> sb;&nbsp; &nbsp; &nbsp; <span class="co1">// Signed 16-bit integer.</span></p>
<p><span class="kw1">char</span> ca;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// ??-signed 8-bit integer.</span><br />
<span class="kw1">signed</span> <span class="kw1">char</span> cb; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// signed 8-bit integer.</span><br />
<span class="kw1">unsigned</span> <span class="kw1">char</span> cc; &nbsp; &nbsp; &nbsp; <span class="co1">// unsigned 8-bit integer.</span></div>
</div>
<p>
Because they&#8217;re shorter and more descriptive, the following typedefs<br />
are often used for variable declarations. Basically, it&#8217;s<br />
&lsquo;<code>s</code>&rsquo; or &lsquo;<code>u</code>&rsquo; for signed<br />
or unsigned, respectively, followed by the size of the type in bits.<br />
Unsigned variants are also sometimes indicated by<br />
&lsquo;u&rdquo;+<i>typename</i>.
</p>
<div class=lblock>
<table id="tbl-data-typedefs" border=1 cellpadding=2 cellspacing=0 width=200>
<caption align=bottom>
  <b>Table&nbsp;1</b>: common short (un)signed typedefs.<br />
</caption>
<tr>
<th>Base type</th>
<th>Signed</th>
<th colspan=2>Unsigned</th>
</tr>
<tr class=rnum>
<th>char</th>
<td>s8</td>
<td>u8</td>
<td>uchar</td>
</tr>
<tr class=rnum>
<th>short</th>
<td>s16</td>
<td>u16</td>
<td>ushort</td>
</tr>
<tr class=rnum>
<th>int/long</th>
<td>s32</td>
<td>u32</td>
<td>uint</td>
</tr>
<tr class=rnum>
<th>long long</th>
<td>s64</td>
<td>u64</td>
<td>&nbsp;</td>
</tr>
</table>
</div>
<p><div>&nbsp;</div></p>
<p>
In assembly, you can&#8217;t declare the signedness of variables, because<br />
there&#8217;s no such thing as variables. There&#8217;s only labels and how you<br />
use those labels determines what the related data are. Technically,<br />
there is only one datatype: the 32-bit word, corresponding to C&#8217;s<br />
int or long. The other datatypes are essentially emulated, or<br />
defined by how which memory instructions you use:<br />
<code>LDRB/LDRSB/STRB</code> for bytes and<br />
<code>LDRH/LDRSH/STRH</code> for halfwords. For most data<br />
operations, signedness is irrelevant and as such mostly ignored.<br />
Only in a few cases does the sign actually matter and as these are<br />
essentially the topic of the rest of the article, we&#8217;ll get<br />
to those eventually.
</p>
<p><h2 id="sec-prob">2
Potential problems
</h2>
</p>
<p>
The following sections are cases where signedness may become<br />
problematic. I say &ldquo;may&rdquo;, because often it just works<br />
out. But that&#8217;s just the thing: it can work most of the time and then<br />
things can go horribly wrong all of a sudden. The root of the problem<br />
comes down to one thing: negative numbers; usually, negative numbers<br />
becoming large positive numbers when interpreted as unsigned values.
</p>
<p>
For example, 32-bit signed &minus;1 = 0xFFFFFFFF = unsigned 4294967295<br />
(= 2<sup>32</sup>&minus;1). If nothing else, remember that part.
</p>
<p><h3 id="ssec-prob-extend">2.1
Sign extension, casting and shifting
</h3>
</p>
<p>
When you go from a small datatype to a larger one, you&#8217;re essentially<br />
adding a new set of bits at the top, and these bits have to be<br />
initialized in a meaningful way. The addition of these bits should have<br />
no effect on the value itself. For example, +1 should remain +1 and<br />
&minus;1 should remain &minus;1. What this boils down to for two&#8217;s<br />
complement is that the new bits need to be filled with the sign-bit<br />
of the old value. This is called <dfn>sign extension</dfn>, because<br />
the top-bit (the sign-bit) is extended into all the higher bits. There<br />
is also <dfn>zero-extension</dfn>, which is when the higher bits are<br />
zeroed out. These two forms effectively correspond to signed<br />
and unsigned casting. <span class="fnote"><a href="#ft-nr2" title="One could say that zero-extension is just a
form of sign-extension; it&#8217;s just that the sign for an unsigned number
is always positive.">(2)</a></span>.
</p>
<div class=cblock>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-extend"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;3. </b>
</div>

</div>
<p>
Conversions of this kind actually happen <b>all the time</b>,<br />
without any kind of direct intervention from the programmer. Data<br />
operations are always done in CPU words and any time you use a smaller<br />
datatype, there is the need to sign- or zero-extend.<br />
This also brings forth the question of which type of extension will be<br />
used: sign- or zero-extension. As the following bit of code shows, it<br />
depends on the signedness of the variable you&#8217;re converting <i>from</i>.<br />
8-bit variables <code>sc</code> and <code>uc</code> are both initialized<br />
by 0xFF, which is either &minus;1 or 255 (you can use either of those too,<br />
by the way). After that, these are used to initialize signed or unsigned<br />
words.
</p>
<p>
As you can see from the output, the value in the words correspond<br />
to the signedness of the bytes, not the words. Also note that printing<br />
<code>sc</code> (the signed byte) gives 0xFFFFFFFF and not the 0xFF you<br />
initialized it with, and which are in fact its actual contents since<br />
0xFFFFFFFF is too large to fit into a byte. However, when using it with<br />
anything, it&#8217;s automatically extended to word-size. This becomes great<br />
fun when you later compare it to 0xFF again.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">// Testing implicit conversions.</span><br />
<span class="kw1">void</span> test_conversion()<br />
{<br />
&nbsp; &nbsp; s8 sc= <span class="nu0">0xFF</span>;&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// 8-bit -1 (and 255) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span><br />
&nbsp; &nbsp; u8 uc= <span class="nu0">0xFF</span>;&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// 8-bit 255 (and -1)</span></p>
<p>&nbsp; &nbsp; s32 sisc= sc, siuc= uc;<br />
&nbsp; &nbsp; u32 uisc= sc, uiuc= uc;</p>
<p>&nbsp; &nbsp; <span class="kw3">printf</span>(<span class="st0">&quot; &nbsp;sc: %4d=%08X ; &nbsp; uc:%4d=%08X<span class="es1">\n</span>&quot;</span>, sc, sc, uc, uc);<br />
&nbsp; &nbsp; <span class="kw3">printf</span>(<span class="st0">&quot;sisc: %4d=%08X ; siuc:%4d=%08X<span class="es1">\n</span>&quot;</span>, sisc, sisc, siuc, siuc);<br />
&nbsp; &nbsp; <span class="kw3">printf</span>(<span class="st0">&quot;uisc: %4d=%08X ; uiuc:%4d=%08X<span class="es1">\n</span>&quot;</span>, uisc, uisc, uiuc, uiuc);<br />
&nbsp; &nbsp; <span class="kw3">printf</span>(<span class="st0">&quot;sc==0xFF : %s<span class="es1">\n</span>&quot;</span>, (sc==<span class="nu0">0xFF</span> ? <span class="st0">&quot;true&quot;</span> : <span class="st0">&quot;false&quot;</span>) );</p>
<p>&nbsp; &nbsp; <span class="coMULTI">/* Output:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sc: &nbsp; -1=FFFFFFFF ; &nbsp; uc: 255=000000FF<br />
&nbsp; &nbsp; &nbsp; &nbsp; sisc: &nbsp; -1=FFFFFFFF ; siuc: 255=000000FF<br />
&nbsp; &nbsp; &nbsp; &nbsp; uisc: &nbsp; -1=FFFFFFFF ; uiuc: 255=000000FF<br />
&nbsp; &nbsp; &nbsp; &nbsp; sc==0xFF : false</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; Warnings issued (for sc=0xFF):<br />
&nbsp; &nbsp; &nbsp; &nbsp; &#8211; warning C4305: &#8216;initializing&#8217; : truncation from &#8216;const int&#8217; to &#8217;signed char&#8217;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &#8211; warning C4309: &#8216;initializing&#8217; : truncation of constant value<br />
&nbsp; &nbsp; */</span><br />
}</div>
</div>
<p><div>&nbsp;</div></p>
<p>
Sign- and zero-extension also play a role in right-shifts. When using<br />
shifts for arithmetic (shift-right is short-hand for a division by<br />
power of two), you want the sign preserved. For example, when dividing<br />
&minus;16&nbsp;=&nbsp;0xFFFF:FFF0 by 16 (shift-right by 4), you want the result<br />
to be &minus;1 (=0xFFFF:FFFF), and not 268435455 (=0&#215;0FFF:FFFF).<br />
The right-shift that preserves the sign is the<br />
<dfn>arithmetic right-shift</dfn>, and is used for signed numbers.<br />
For unsigned numbers, or if the variable is considered a set of bits<br />
instead of a single number, a <dfn>logical right-shift</dfn> is<br />
appropriate, since that uses zero-extension.
</p>
<p>
In assembly, arithmetic and logical right-shift are called<br />
<code>ASR</code> and <code>LSR</code>, respectively. In Java and other<br />
languages where the keyword <code>unsigned</code> does not exist<br />
the difference is indicated by <code>&gt;&gt;</code> (sign-extend)<br />
and <code>&gt;&gt;&gt;</code> (zero-extend). In C, however,<br />
both types use the same symbol: <code>&gt;&gt;</code>. As such,<br />
you cannot tell which type of extension is used from just the<br />
expression; you&#8217;d have to look at the signedness of the operands<br />
(including temporaries) to see if it&#8217;s a logical or arithmetic<br />
right-shift.
</p>
<div class=cblock>
<table border=0>
<tr>
<td>
<table id="tbl-shift" border=1 cellpadding=2 cellspacing=0>
<caption align=bottom>
  <b>Table&nbsp;2</b>: Right-shifts for different languages.<br />
</caption>
<tr>
<th>Language</th>
<th>Signed</th>
<th>Unsigned</th>
</tr>
<tr>
<th>ARM asm</th>
<td>asr</td>
<td>lsr</td>
</tr>
<tr>
<th>C</th>
<td>&gt;&gt;</td>
<td>&gt;&gt;</td>
</tr>
<tr>
<th>Java(script)</th>
<td>&gt;&gt;</td>
<td>&gt;&gt;&gt;</td>
</tr>
</table>
</td>
<td width=32></td>
<td>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-rshift"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;4. </b>
</div>

</td>
</tr>
</table>
</div>
<p>
This ambivalence of shift symbols in C can be a major source of pain in<br />
fixed-point calculations. Since unsigned has precedence over signed, if<br />
you have an unsigned variable at <i>any</i> point of the calculation,<br />
all subsequent calculations are unsigned too and you can kiss negative<br />
numbers goodbye. If everything starts going wrong as soon as you move in<br />
another direction or if rotations aren&#8217;t calculated properly, this will<br />
be the cause.
</p>
<p><div>&nbsp;</div></p>
<p>
The code below illustrates the problem in a very common situation. You<br />
have a position <b>p</b>, and a directional vector for movement,<br />
<b>u</b>. Since you want sub-pixel control of these, you use<br />
fixed-point notation for both (I&#8217;m assuming non-FPU system<br />
here). The <b>u</b> vector is a unit vector<br />
(say, cos(&alpha;),&nbsp;sin(&alpha;)); to get to the full velocity vector,<br />
we have to multiply <b>u</b> by some speed. The procedure comes down to<br />
something like this:
</p>
<p><table class="eqtbl">
<tr>
<td class="eqnrcell"></td>
  <td class="eqcell"><br />
<b>p</b><sub>new</sub>&nbsp;=&nbsp;<b>p</b><sub>old</sub>&nbsp;+&nbsp;<i>speed</i>&middot;<b>u</b><br />
</td>
</tr>
</table></p>
<p>
In the example, I&#8217;m only considering the <i>x</i>-component for<br />
convenience. Now, because position and direction can have negative<br />
components, those would be signed. The speed, however, is a length<br />
and therefore always positive, so it makes sense to make it unsigned,<br />
right? Well, yes and no. As you can see from the result, mostly no.
</p>
<p>
With <i>speed</i>&nbsp;=&nbsp;+1 and <i>u</i><sub>x</sub>&nbsp;=&nbsp;&minus;1, the<br />
end result should be +1*&minus;1&nbsp;=&nbsp;&minus;1, which would be 0xFFFFFF00<br />
in Q8 fixed-point notation. However, it <i>isn&#8217;t</i>, thanks to the<br />
unsignedness of <code>speed</code>, which makes subsequent arithmetic<br />
unsigned so the right-shift does not sign-extend. So instead of the<br />
small step you intended, you get a giant leap into no man&#8217;s land.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">void</span> test_right_shift()<br />
{<br />
&nbsp; &nbsp; <span class="co1">// Assume movement for 2 directions, with Q8 for everything.</span><br />
&nbsp; &nbsp; <span class="co1">// a = look direction. &nbsp;</span><br />
&nbsp; &nbsp; <span class="co1">// p = (px, py) = position.</span><br />
&nbsp; &nbsp; <span class="co1">// u = (ux, uy) = ( cos(a), sin(a) )</span></p>
<p>&nbsp; &nbsp; <span class="kw1">int</span> &nbsp;px= <span class="nu0">0</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Starting position.</span><br />
&nbsp; &nbsp; <span class="kw1">int</span> &nbsp;ux= -<span class="nu0">1</span>&lt;&lt;<span class="nu0">8</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Moving backwards.</span><br />
&nbsp; &nbsp; uint speed= <span class="nu0">1</span>&lt;&lt;<span class="nu0">8</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unsigned as speed&#8217;s always &gt;= 0, right?</span></p>
<p>&nbsp; &nbsp; px = px + (speed*ux&gt;&gt;<span class="nu0">8</span>);&nbsp; &nbsp; <span class="co1">// Fixed point motion. Result should be -1&lt;&lt;8.</span></p>
<p>&nbsp; &nbsp; <span class="kw3">printf</span>(<span class="st0">&quot;px : %d=%08X<span class="es1">\n</span>&quot;</span>, px, px);</p>
<p>&nbsp; &nbsp; <span class="coMULTI">/* Result: <br />
&nbsp; &nbsp; &nbsp; &nbsp; px: px : 16776960=00FFFF00</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; In other words: NOT the -1&lt;&lt;8 = 0xFFFFFF00 you were after.<br />
&nbsp; &nbsp; */</span><br />
}</div>
</div>
<p>
This mistake is depressingly easy to make, even for those who generally<br />
think about which datatype to use. <i>Especially</i> those people, as<br />
they&#8217;re prone to optimize prematurely and automatically pick unsigned<br />
for a variable that will never be negative. The danger is that<br />
unsigned arithmetic has precedence, which can screw up at later<br />
right-shifts.
</p>
<p>
Bottom line: variables used in fixed-point calculations should be<br />
signed. Always.
</p>
<p><h3 id="ssec-prob-div">2.2
Division
</h3>
</p>
<p>
This isn&#8217;t really a signed-vs-unsigned item per se, but integer<br />
division behaves in a peculiar way for negative numbers. It becomes<br />
one, however when you throw right-shift in the fray, which doesn&#8217;t<br />
quite work like a division equivalent anymore for negative numbers.<br />
To discriminate between integer and normal division, I will use<br />
&lsquo;\ &rsquo; for integer division in this section. Note also the<br />
modulo operation is intimately tied to division, so this section applies<br />
to that as well.
</p>
<p>
What integer division comes down to is taking a normal division and throwing<br />
away the remaining fraction. For example, 7&nbsp;/&nbsp;4&nbsp;=&nbsp;1&frac34;. The<br />
integer division is just 1. This is also true for negative numbers:<br />
&minus;7&nbsp;/&nbsp;4&nbsp;=&nbsp;&minus;1&frac34;, so 7&nbsp;\&nbsp;4&nbsp;=&nbsp;&minus;1. In short,<br />
integer division rounds towards zero. With bit-shifting, however, you get<br />
something slightly different. Theoretically, <i>x</i>&gt;&gt;<i>n</i> is<br />
equivalent to <i>x</i>&nbsp;\&nbsp;2<sup>n</sup>. For positive numbers, this is<br />
true: 7&gt;&gt;2 in binary is<br />
<code><b>000001</b>11</code>&gt;&gt;2&nbsp;=&nbsp;<code>00<b>000001</b></code>.<br />
But with &minus;7&gt;&gt;2 you get<br />
<code>11111001</code>&gt;&gt;2&nbsp;=&nbsp;<code>11<b>111110</b></code>&nbsp;=&nbsp;&minus;2.<br />
Division-by-right-shift always rounds to negative infinity.
</p>
<p>
The upshot of this difference is that for negative numbers, the results<br />
of <i>x</i>&nbsp;\&nbsp;2<sup>n</sup> and <i>x</i>&gt;&gt;<i>n</i> will be out<br />
of sync, as Table&nbsp;3 illustrates. They still give<br />
identical results for positive numbers though.
</p>
<div class=cblock>
<table id="tbl-div-shift" border=1 cellpadding=2 cellspacing=0>
<caption align=bottom>
  <b>Table&nbsp;3</b>: integer and by-shift division by four.<br />
</caption>
<tbody align="right">
<tr>
<th>x (dec)</th>
<th>x \ 4</th>
<th>x&gt;&gt;2 (dec)</th>
<th rowspan=20>&nbsp;</th>
<th>x (bin)</th>
<th>x&gt;&gt;2 (bin)</th>
</tr>
<tr>
<td>-9</td>
<td class=bg0>-2</td>
<td class=bg1>-3</td>
<td>11110111</td>
<td>11111101</td>
</tr>
<tr>
<td>-8</td>
<td class=bg0>-2</td>
<td class=bg0>-2</td>
<td>11111000</td>
<td>11111110</td>
</tr>
<tr>
<td>-7</td>
<td class=bg1>-1</td>
<td class=bg0>-2</td>
<td>11111001</td>
<td>11111110</td>
</tr>
<tr>
<td>-6</td>
<td class=bg1>-1</td>
<td class=bg0>-2</td>
<td>11111010</td>
<td>11111110</td>
</tr>
<tr>
<td>-5</td>
<td class=bg1>-1</td>
<td class=bg0>-2</td>
<td>11111011</td>
<td>11111110</td>
</tr>
<tr>
<td>-4</td>
<td class=bg1>-1</td>
<td class=bg1>-1</td>
<td>11111100</td>
<td>11111111</td>
</tr>
<tr>
<td>-3</td>
<td class=bg0> 0</td>
<td class=bg1>-1</td>
<td>11111101</td>
<td>11111111</td>
</tr>
<tr>
<td>-2</td>
<td class=bg0> 0</td>
<td class=bg1>-1</td>
<td>11111110</td>
<td>11111111</td>
</tr>
<tr>
<td>-1</td>
<td class=bg0> 0</td>
<td class=bg1>-1</td>
<td>11111111</td>
<td>11111111</td>
</tr>
<tr>
<td> 0</td>
<td class=bg0> 0</td>
<td class=bg0> 0</td>
<td>00000000</td>
<td>00000000</td>
</tr>
<tr>
<td> 1</td>
<td class=bg0> 0</td>
<td class=bg0> 0</td>
<td>00000001</td>
<td>00000000</td>
</tr>
<tr>
<td> 2</td>
<td class=bg0> 0</td>
<td class=bg0> 0</td>
<td>00000010</td>
<td>00000000</td>
</tr>
<tr>
<td> 3</td>
<td class=bg0> 0</td>
<td class=bg0> 0</td>
<td>00000011</td>
<td>00000000</td>
</tr>
<tr>
<td> 4</td>
<td class=bg1> 1</td>
<td class=bg1> 1</td>
<td>00000100</td>
<td>00000001</td>
</tr>
<tr>
<td> 5</td>
<td class=bg1> 1</td>
<td class=bg1> 1</td>
<td>00000101</td>
<td>00000001</td>
</tr>
<tr>
<td> 6</td>
<td class=bg1> 1</td>
<td class=bg1> 1</td>
<td>00000110</td>
<td>00000001</td>
</tr>
<tr>
<td> 7</td>
<td class=bg1> 1</td>
<td class=bg1> 1</td>
<td>00000111</td>
<td>00000001</td>
</tr>
<tr>
<td> 8</td>
<td class=bg0> 2</td>
<td class=bg0> 2</td>
<td>00001000</td>
<td>00000010</td>
</tr>
<tr>
<td> 9</td>
<td class=bg0> 2</td>
<td class=bg0> 2</td>
<td>00001001</td>
<td>00000010</td>
</tr>
</tbody>
</table>
</div>
<p>
There are some other consequences besides the obvious difference in<br />
results. First, there&#8217;s how compilers deal with it. Compilers are very<br />
well aware that a bit-shift is faster than division and one of the<br />
optimizations they perform is replacing divisions by shifts<br />
where appropriate<span class="fnote"><a href="#ft-nr3" title="And please let the compiler do its job in this
regard: the low operator-precedence of shifts makes their use awkward and
error-prone. If you mean division, then use division.">(3)</a></span>. For<br />
unsigned numerals the division will be replaced by a single shift.<br />
However, for signed variables some extra instructions have to added to<br />
correct the difference in rounding.
</p>
<p>
Second, note that the standard integer division does not give an equal<br />
distribution of results: there are more results in the zero-bin.<br />
Shift-division spreads the results around evenly. In some cases, you<br />
will want to use the shift version for that reason. One clear example<br />
of this would be tiling: using the &lsquo;proper&rsquo; integer<br />
division would give you odd-looking results.
</p>
<p><div class=note id="nt-div-shift">
<div  class=nh>Negative number division / right-shift equivalents</div>
</p>
<p>
Table&nbsp;3 shows that for negative numbers, integer<br />
division and right-shift don&#8217;t give the same results. If you do want<br />
the same results, the following equations can be used. Given<br />
<i>x</i>&nbsp;&lt;&nbsp;0 and <i>N</i>&nbsp;=&nbsp;2<sup>n</sup>, then
</p>
<p><table class="eqtbl">
<tr>
<td class="eqnrcell"></td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20x%20%5Cbackslash%20N%20%26%3D%26%20%28x%20%2B%20%28N-1%29%29%20%3E%3E%20n%20%5C%5C%20%5C%5C%20x%3E%3En%20%26%3D%26%20%28x%20-%20%28N-1%29%29%20%5Cbackslash%20N%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} x \backslash N &amp;=&amp; (x + (N-1)) &gt;&gt; n \\ \\ x&gt;&gt;n &amp;=&amp; (x - (N-1)) \backslash N \end{eqnarray}"<br />
	alt="\begin{eqnarray} x \backslash N &amp;=&amp; (x + (N-1)) &gt;&gt; n \\ \\ x&gt;&gt;n &amp;=&amp; (x - (N-1)) \backslash N \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>GCC will use the <i>x</i>\<i>N</i> equivalence to produce<br />
signed integer division if possible.</p>
<p></div>
</p>
<p><h3 id="ssec-prob-cmp">2.3
Comparisons
</h3>
</p>
<p>
The last area where signedness can be a factor is comparisons. The<br />
next bit of code is from my implementation of a filled circle renderer<br />
with boundary clipping. The circle is centered on<br />
(<i>x</i><sub>0</sub>,&nbsp;<i>y</i><sub>0</sub>). Variables<br />
<code>x</code> and <code>y</code> are local variables that<br />
keep track of where we are on the circle, because these<br />
can be negative, they must be signed. Variables <code>dstW</code><br />
and <code>dstH</code> are the destination image&#8217;s width and<br />
height. Since width and height are unsigned by definition,<br />
it&#8217;d make sense to make these unsigned, right? Right?
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">//# Part of a clipped filled circle renderer that didn&#8217;t quite work.</span></p>
<p>&nbsp; &nbsp; <span class="kw1">int</span> dstP= srf-&gt;pitch/<span class="nu0">2</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// used in arithmetic, so signed.</span><br />
&nbsp; &nbsp; uint dstW= srf-&gt;width, dstH= srf-&gt;height; &nbsp; <span class="co1">// Unsigned by definition.</span><br />
&nbsp; &nbsp; u16 *dstD= ((u16*)srf-&gt;data)+(y0*dstP);<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">int</span> x=<span class="nu0">0</span>, y= rad, d= <span class="nu0">1</span>-rad, left, right;</p>
<p>&nbsp; &nbsp; &#8230;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Side octants</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; left= x0-y;<br />
&nbsp; &nbsp; &nbsp; &nbsp; right= x0+y;<br />
&nbsp; &nbsp; &nbsp; &nbsp; \&lt;b\&gt;<span class="kw1">if</span>(right&gt;=<span class="nu0">0</span> &amp;&amp; left&lt;=dstW)\&lt;/b\&gt; &nbsp; &nbsp; &nbsp; <span class="co1">// Fully out of bounds</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(left&lt;<span class="nu0">0</span>)&nbsp; &nbsp; &nbsp; left= <span class="nu0">0</span>;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Clip left</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(right&gt;=dstW) right= dstW-<span class="nu0">1</span>;&nbsp; &nbsp; &nbsp; <span class="co1">// Clip right</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Render at scanlines y0-x and y0+x</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(inRange(y0-x, <span class="nu0">0</span>, dstH))<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; armset16(color, &amp;dstD[-x*dstP+left], <span class="nu0">2</span>*(right-left+<span class="nu0">1</span>));<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(inRange(y0+x, <span class="nu0">0</span>, dstH))<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; armset16(color, &amp;dstD[+x*dstP+left], <span class="nu0">2</span>*(right-left+<span class="nu0">1</span>));<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &#8230;</div>
</div>
<p>
Well, apparently not. When I tested this, right and bottom edge<br />
clipping went fine, but when the circle went over the top or<br />
left edge, it disappeared completely.
</p>
<p>
The problem lies with the line in bold, which does the trivial rejection<br />
test. Variables <code>left</code> and <code>right</code> are the left and<br />
right-most edges of the scanline of the circle. If this is completely<br />
to the left of the screen (<code>right</code>&nbsp;&lt;&nbsp;0) or to<br />
the right of the screen (<code>left</code>&nbsp;&ge;&nbsp;<code>dstW</code>)<br />
then there&#8217;s nothing to do. </p>
<p>Technically, the tests on that line are correct, so the code<br />
<i>should</i> work.<br />
The reason it doesn&#8217;t actually occurs a few lines earlier: the<br />
definition of <code>dstW</code> as an unsigned variable. Because of<br />
this, the second condition is an unsigned comparison. Now think of<br />
what happens when <code>left</code> moves over the left of the<br />
screen. <code>left</code> becomes becomes a (small) negative number,<br />
which is converted to postive number for the comparison.<br />
A <i>large positive</i> number for that matter &ndash; one that&#8217;s<br />
quite a bit larger than the width of the image and as a result<br />
the routine thinks the circle is out of bounds.
</p>
<p>
So again, a routine went all wonky because I assumed that, since<br />
a width is always positive, using an unsigned variable would be<br />
a good idea.
</p>
<p><div>&nbsp;</div></p>
<p>
The worst part of this particular bit, however, is that I should have<br />
known this. The compiler actually issues a warning for this type of<br />
thing:
</p>
<p><blockquote>
<br />
warning: comparison between signed and unsigned integer expressions<br />

</blockquote>
</p>
<p>
Or at least it <i>would</i> have if I hadn&#8217;t disabled the warning<br />
because the message was cropping up everywhere in my normal and sign-safe<br />
for-loops. Let this be a lesson: disable warnings at your own risk<br />
and for Offler&#8217;s sake do <i>not</i> ignore them.
</p>
<p><h3 id="ssec-prob-duh">2.4
Well, duh
</h3>
</p>
<p>
The problems covered above are the subtle ones, where you have to be<br />
aware of some of the details that go into the C language itself. There<br />
are also a few issues where the programmer really should have known<br />
they were going to be a problem from the start.
</p>
<p><div>&nbsp;</div></p>
<p>
The first example is, again, one that can occur when optimizing<br />
prematurely. You may have heard that loops work better when you count<br />
down instead of count up, because in machine code a subtraction is an<br />
automatic comparison to zero. So, a clever programmer may turn this:
</p>
<div class="cpp">
<div class="cpp proglist" style=" ">uint i; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unsigned, since it&#8217;s always positive.</span><br />
<span class="kw1">for</span>(i=<span class="nu0">0</span>; i&lt;size; i++)<br />
{<br />
&nbsp; &nbsp; <span class="co1">// Do whatever</span><br />
}</div>
</div>
<p>into this:</p>
<div class="cpp">
<div class="cpp proglist" style=" ">uint i; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unsigned, since it&#8217;s always positive. Right?</span><br />
<span class="kw1">for</span>(i=size-<span class="nu0">1</span>; i&gt;=<span class="nu0">0</span>; i&#8211;)<br />
{<br />
&nbsp; &nbsp; <span class="co1">// Do whatever</span><br />
}</div>
</div>
<p>
There are two problems with this code. First, the change probably will<br />
not matter with modern compilers because they are aware of the<br />
equivalence and can do this conversion themselves<span class="fnote"><a href="#ft-nr4" title="Although they
may well do it
incorrectly: turning the decrementing loop into an incrementing one.
Point is, the compiler may not follow exactly what you&#8217;re doing
anyway.">(4)</a></span>, so there&#8217;s nothing to gain from this.
</p>
<p>
The real problem, however, is the terminating condition:<br />
`<code>i&gt;=0</code>&#8216;. Since <code>i</code> is unsigned, it can<br />
never be negative, and therefore the condition is always true.
</p>
<p><div>&nbsp;</div></p>
<p>
The second example involves bitfields. As it happens, bitfields can be<br />
signed or unsigned as well. For the most part, handling this is like<br />
handling normal signedness, but there is one situation where you have<br />
to be careful.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">void</span> test_bitfield()<br />
{<br />
&nbsp; &nbsp; <span class="kw1">struct</span> Foo {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">int</span> &nbsp; &nbsp; s7 : <span class="nu0">7</span>; &nbsp; &nbsp; <span class="co1">// 7-bit signed</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; uint&nbsp; &nbsp; u7 : <span class="nu0">7</span>; &nbsp; &nbsp; <span class="co1">// 7-bit unsigned</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">int</span> &nbsp; &nbsp; s1 : <span class="nu0">1</span>; &nbsp; &nbsp; <span class="co1">// 1-bit signed</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; uint&nbsp; &nbsp; u1 : <span class="nu0">1</span>; &nbsp; &nbsp; <span class="co1">// 1-bit unsigned</span><br />
&nbsp; &nbsp; };</p>
<p>&nbsp; &nbsp; Foo f= { -<span class="nu0">1</span>, -<span class="nu0">1</span>, <span class="nu0">1</span>, <span class="nu0">1</span> };</p>
<p>&nbsp; &nbsp; <span class="kw3">printf</span>(<span class="st0">&quot;s7: %3d<span class="es1">\n</span>u7: %3d<span class="es1">\n</span>s1: %3d<span class="es1">\n</span>u1: %3d<span class="es1">\n</span><span class="es1">\n</span>&quot;</span>, f.s7, f.u7, f.s1, f.u1);<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="coMULTI">/*&nbsp; Results:<br />
&nbsp; &nbsp; &nbsp; &nbsp; s7: &nbsp;-1 &nbsp; &nbsp; // Inited to -1<br />
&nbsp; &nbsp; &nbsp; &nbsp; u7: 127 &nbsp; &nbsp; // Inited to -1<br />
&nbsp; &nbsp; &nbsp; &nbsp; \&lt;b\&gt;s1: &nbsp;-1&nbsp; &nbsp; &nbsp;// Inited to &nbsp;1\&lt;/b\&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; u1: &nbsp; 1 &nbsp; &nbsp; // Inited to &nbsp;1<br />
&nbsp; &nbsp; */</span><br />
}</div>
</div>
<p>
In the code above I&#8217;ve created a bif-fielded struct with both<br />
signed and unsigned members. There are two 7-bit fields and two<br />
1-bit fields, and these are initialized to &minus;1 and +1,<br />
respectively. The values are then printed.
</p>
<p>
The 7-bit fields work as you might expect. <code>f.s7</code> is<br />
&minus;1, as it&#8217;s signed, and <code>f.u7</code> is 127, which is the<br />
7-bit equivalent of &minus;1. The interesting case is for<br />
<code>f.s1</code>. This is initialized to 1, but comes out as<br />
&minus;1, because for a single signed bit the possibilities<br />
are 0 and &minus;1, and <i>not</i> 0 and +1! Without this knowledge,<br />
a later test like `<code>f.s1==1</code>&#8216; might give unexpected results.
</p>
<p><h2 id="sec-summary">3
Summary
</h2>
</p>
<p>
So, summarizing:
</p>
<ul>
<li>
    Unsigned variables only represents positive numbers; signed ones<br />
	can have positive or negative values.<br />
    Negative numbers are usually represented via two&#8217;s complement,<br />
	which is based on the cyclical nature of counters when you have<br />
	a limited number of digits.
  </li>
<li>
    In C, integers are signed unless specified otherwise, except<br />
	for <code>char</code>, whose signedness is compiler dependent.
  </li>
<li>
    Careless use of signed and unsigned types can result in subtle<br />
	runtime bugs with not-so-subtle results. Usually, what happens<br />
	is that a negative number is reinterpreted as a very large<br />
	positive number and everything goes banana-shaped.
  </li>
<li>
    <b>Unsigned has a higher operator precedence than signed</b>. If<br />
	one of the operands is unsigned, the operation will use unsigned<br />
	arithmetic. This can cause problems for divisions, modulos,<br />
	right-shifts <i>and</i> comparisons.
  </li>
<li>
    For negative numbers, division/modulo by 2<sup>n</sup> is not<br />
	quite the same as right-shifts/ANDs. Analyse which is best for<br />
    your situation, then act accordingly.
  </li>
<li>
    Ignore compiler warnings at your own peril.
  </li>
<li>
    The place where a bug manifests is not always the place where it<br />
	originates. The declaration of variables matters! Do not forget<br />
	this when debugging or when asking for assistance.
  </li>
</ul>
<p><div>&nbsp;</div></p>
<p>
There isn&#8217;t really a hard rule on when to use which signedness, but<br />
here are a few guidelines nonetheless.
</p>
<ul>
<li>
    If a variable can, in principle, have negative values, make it<br />
	signed. If it represents a physical quantity (position, velocity,<br />
	mass, etc), make it signed.
  </li>
<li>
    A variable that represents logical values (bools, pixels, colors,<br />
	raw data) should probably be unsigned.
  </li>
<li>
    And now the big one: just because a variable will always be<br />
	positive doesn&#8217;t mean it should be unsigned. Yes, you may waste<br />
	half the range, but using signed variables is usually safer. If<br />
	you must have the larger range (for the smaller datatypes, for<br />
	example), consider defining the storage variables unsigned, but<br />
	convert them to local signed ints when you&#8217;re really going to<br />
	use them.
  </li>
<li>
	If mathematical symbols were gods, the minus sign would be<br />
	
<a href="http://en.wikipedia.org/wiki/Loki">Loki</a>. Be extra careful when you encounter them.<br />
	If there are minus signs <i>anywhere</i> in the algorithm, or even<br />
	the potential for negative numbers, <i>everything</i> should be<br />
	done with signed numbers.</p>
</li>
</ul>
<p> <!--</p>
<ul>
<li>
    A computer will do what you <i>tell</i> it to do. Make sure this<br />
	corresponds with what you <i>want</i> it to do.
  </li>
</ul>
<p>&#8211;></p>
<hr /><div class="footnotes">
<h5>Notes:</h5>
<ol>
<li id="ft-nr1"> 
  Or any 10&#8217;s complement, really.
</li>
<li id="ft-nr2"> 
  One could say that zero-extension is just a<br />
form of sign-extension; it&#8217;s just that the sign for an unsigned number<br />
is always positive.
</li>
<li id="ft-nr3"> 
  And please let the compiler do its job in this<br />
regard: the low operator-precedence of shifts makes their use awkward and<br />
error-prone. If you mean division, then use division.
</li>
<li id="ft-nr4"> 
  Although they<br />
may well do it<br />
incorrectly: turning the decrementing loop into an incrementing one.<br />
Point is, the compiler may not follow exactly what you&#8217;re doing<br />
anyway.
</li>
</ol>
</div
<hr />
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/08/signs-from-hell/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Another fast fixed-point sine approximation</title>
		<link>http://www.coranac.com/2009/07/sines/</link>
		<comments>http://www.coranac.com/2009/07/sines/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 20:19:05 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[fixed point]]></category>
		<category><![CDATA[nds]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[sine]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=87</guid>
		<description><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />


Gaddammit!

&#160;

So here I am, looking forward to a nice quiet weekend; hang
back, watch some telly and maybe read a bit &#8211; but
NNnnneeeEEEEEUUUuuuuuuuu!! Someone had to write an interesting
article about sine approximation.
With a challenge at the end. And using an inefficient kind
of approximation. And so now, instead of just relaxing, I have to spend
my entire weekend [...]]]></description>
			<content:encoded><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<p></p>
<p>
Gad<i>dammit</i>!
</p>
<p><div>&nbsp;</div></p>
<p>
So here I am, looking forward to a nice quiet weekend; hang<br />
back, watch some telly and maybe read a bit &ndash; but<br />
<i>NNnnneeeEEEEEUUUuuuuuuuu!!</i> <i>Someone</i> had to write an interesting<br />
<a href="http://www.console-dev.de/2009/07/06/sine-approximation-with-fixed-point-math/" rel="pingback">article about sine approximation</a>.<br />
With a <i>challenge</i> at the end. <i>And</i> using an inefficient kind<br />
of approximation. And so now, instead of just relaxing, I have to spend<br />
my entire weekend <i>and</i> most of the week figuring out a better way<br />
of doing it. I hate it when this happens <kbd>&gt;_&lt;</kbd>.
</p>
<p><div>&nbsp;</div></p>
<p>
Okay, maybe not.
</p>
<p><div>&nbsp;</div></p>
<p>
Sarcasm aside, it is an interesting read. While the standard way of<br />
calculating a sine &ndash; via a look-up table &ndash; works and<br />
works well, there&#8217;s just something unsatisfying about it. The<br />
LUT-based approach is just &hellip; dull.<br />
Uninspired. Cowardly. <i>Inelegant</i>.<br />
In contrast, finding a suitable algorithm for it requires effort and a<br />
modicum of creativity, so something like that always piques my interest.
</p>
<p>
In this case it&#8217;s sine approximation. I&#8217;d been wondering about that<br />
when I did my <a href="http://www.coranac.com/documents/arctangent">arctan article</a>,<br />
but figured it would require too many terms to really be worth<br />
the effort. But looking at Mr Schraut&#8217;s post (whose site you should be<br />
visiting from time to time too; there&#8217;s good stuff there) it seems<br />
you can get a decent version quite rapidly. The article centers around<br />
the work found at<br />
<a href="http://www.devmaster.net/forums/showthread.php?t=5784">devmaster thread<br />
5784</a>, which derived the following two equations:
</p>
<p><table class="eqtbl" id="eq-lab">
<tr>
<td class="eqnrcell">(1)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20S_2%28x%29%20%26%3D%26%20%5Cfrac4%5Cpi%20x%20-%20%5Cfrac4%7B%5Cpi%5E2%7D%20x%5E2%20%5C%5C%20%5C%5C%20S_%7B4d%7D%28x%29%20%26%3D%26%20%281-P%29S_2%28x%29%20%2B%20P%20S_2%5E2%28x%29%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} S_2(x) &amp;=&amp; \frac4\pi x - \frac4{\pi^2} x^2 \\ \\ S_{4d}(x) &amp;=&amp; (1-P)S_2(x) + P S_2^2(x) \end{eqnarray}"<br />
	alt="\begin{eqnarray} S_2(x) &amp;=&amp; \frac4\pi x - \frac4{\pi^2} x^2 \\ \\ S_{4d}(x) &amp;=&amp; (1-P)S_2(x) + P S_2^2(x) \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
These approximations work quite well, but I feel that it actually<br />
uses the wrong starting point. There are alternative approximations<br />
that give more accurate results at nearly no extra cost in<br />
complexity. In this post, I&#8217;ll derive higher-order alternatives for<br />
both. In passing, I&#8217;ll also talk about a few of the tools that can<br />
help analyse functions and, of course, provide some source code and<br />
do some comparisons.
</p>
<p><span id="more-87"></span></p>
<p><ul>
  <li> <a href="#sec-theory">1
Theory
</a> </li>
  <li> <a href="#sec-prod">2
Derivations and implementations
</a> </li>
  <li> <a href="#sec-test">3
Testing
</a> </li>
  <li> <a href="#sec-summary">4
Summary and final thoughts
</a> </li>
</ul>
</p>
<p><h2 id="sec-theory">1
Theory
</h2>
</p>
<p><h3 id="ssec-try-symmetry">1.1
Symmetry
</h3>
</p>
<p>
The first analytical tool is symmetry. Symmetry is actually one of the<br />
most powerful concepts ever conceived. Symmetry of time leads to the<br />
conservation of energy; symmetry of space leads to conservation of<br />
momentum; in a 3D world, symmetry of direction gives rise to the<br />
inverse square law. In many cases, symmetry basically defines the kinds<br />
of functions you&#8217;re looking for.
</p>
<p>
One kind of symmetry is parity, and functions can have parity as well.<br />
Take any function <i>f</i>(<i>x</i>). A function is <dfn>even</dfn> if<br />
<i>f</i>(&minus;<i>x</i>)&nbsp;=&nbsp;<i>f</i>(<i>x</i>); it is <dfn>odd</dfn><br />
if <i>f</i>(&minus;<i>x</i>)&nbsp;=&nbsp;&minus;<i>f</i>(<i>x</i>).
</p>
<p>
This may not sound impressive, but a function&#8217;s parity can be a great<br />
source of information and a way of error checking. For example, the<br />
product of two odd or even functions is an even function, and an<br />
odd-even product is odd (compare positive/negative number products).<br />
If in a calculation you notice this doesn&#8217;t hold true, then you know<br />
there&#8217;s an error somewhere.
</p>
<p>
Symmetry can also significantly reduce the amount of work you need<br />
to do. Take the next sum, for example.
</p>
<p><table class="eqtbl" id="eq-sym">
<tr>
<td class="eqnrcell">(2)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?y%20%3D%20%5Cint_%7B-N%7D%5EN%20sin%5E7%28x%5E3%29%20%2B%20%5Cfrac%7Bx%5E5%7D%7Bx%5E2%2B1%7D%20-%20x%20e%5E%7B%5Cfrac%7Bx%5E2%7D%7B2%5Csigma%5E2%7D%7D%20dx'<br />
	title="y = \int_{-N}^N sin^7(x^3) + \frac{x^5}{x^2+1} - x e^{\frac{x^2}{2\sigma^2}} dx"<br />
	alt="y = \int_{-N}^N sin^7(x^3) + \frac{x^5}{x^2+1} - x e^{\frac{x^2}{2\sigma^2}} dx" /><br />
</td>
</tr>
</table></p>
<p>
If you find something like this in the wild on on a test, your first<br />
thought might be &ldquo;WTF?!?&rdquo; (assuming you don&#8217;t run away<br />
screaming). As it happens, <i>y</i>&nbsp;=&nbsp;0, for reasons of symmetry. The<br />
function is odd, so the parts left and right of <i>x</i>&nbsp;=&nbsp;0 cancel out.<br />
Instead of actually trying to do the whole calculation, you can just<br />
write down the answer in one line: &ldquo;0, cuz of symmetry&rdquo;.
</p>
<p>
Another property of symmetrical functions is that, if you break them<br />
down into series expansions, odd functions will only have odd terms,<br />
and even functions only have even terms. This becomes important in<br />
the next subsection.
</p>
<p><h3 id="ssec-try-series">1.2
Polynomial and Taylor expansions
</h3>
</p>
<p>
Every function can be broken down into a sum of more manageable<br />
functions. One fairly obvious choice for these sub-functions is<br />
increasing powers of <i>x</i>: polynomials. The most common of<br />
these is 
<a href="http://en.wikipedia.org/wiki/Taylor%20series">Taylor series</a>, which uses<br />
a reference point (<i>a</i>,&nbsp;<i>f</i>(<i>a</i>)) and extrapolates<br />
to another point some distance <i>h</i> away by using the<br />
derivatives of <i>f</i> at the reference point. In equation form,<br />
it looks like this:
</p>
<p><table class="eqtbl" id="eq-taylor-def">
<tr>
<td class="eqnrcell">(3)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20f%28a%2Bh%29%20%26%3D%26%20f%28a%29%20%2B%20f%27%28a%29%20h%20%2B%20%5Cfrac%7Bf%27%27%28a%29%7D%7B2%7Dh%5E2%20%2B%20%5Cfrac%7Bf%27%27%27%28a%29%7D%7B6%7D%20h%5E3%20%2B%20...%20%5C%5C%20%5C%5C%20%5C%5C%20%26%3D%26%20%5Csum_%7Bn%3D0%7D%20%5Cfrac%7Bf%5E%7B%28n%29%7D%28a%29%7D%7Bn%21%7Dh%5En%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} f(a+h) &amp;=&amp; f(a) + f&#039;(a) h + \frac{f&#039;&#039;(a)}{2}h^2 + \frac{f&#039;&#039;&#039;(a)}{6} h^3 + ... \\ \\ \\ &amp;=&amp; \sum_{n=0} \frac{f^{(n)}(a)}{n!}h^n \end{eqnarray}"<br />
	alt="\begin{eqnarray} f(a+h) &amp;=&amp; f(a) + f&#039;(a) h + \frac{f&#039;&#039;(a)}{2}h^2 + \frac{f&#039;&#039;&#039;(a)}{6} h^3 + ... \\ \\ \\ &amp;=&amp; \sum_{n=0} \frac{f^{(n)}(a)}{n!}h^n \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
Chances are you&#8217;ve actually used part of the Taylor series in game<br />
programming. On implementing movement with acceleration, you&#8217;ll<br />
often see something like Eq&nbsp;4. These are the<br />
first three terms of the Taylor expansion.
</p>
<p><table class="eqtbl" id="eq-taylor-xva">
<tr>
<td class="eqnrcell">(4)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?x_%7Bnew%7D%20%3D%20x_%7Bold%7D%20%5C%3A%2B%5C%3A%20v%20%5CDelta%20t%20%5C%3A%2B%5C%3A%20%5Cfrac12%20a%20%28%5CDelta%20t%29%5E2'<br />
	title="x_{new} = x_{old} \:+\: v \Delta t \:+\: \frac12 a (\Delta t)^2"<br />
	alt="x_{new} = x_{old} \:+\: v \Delta t \:+\: \frac12 a (\Delta t)^2" /><br />
</td>
</tr>
</table></p>
<p>
Ihe step-size (<i>h</i> in Eq&nbsp;3 and<br />
&Delta;<i>t</i> in Eq&nbsp;4) is small, the<br />
higher-order terms will have less effect on the end result. This<br />
allows you to cut the expansion short at some point. This leaves<br />
you with a short equation that you do the calculations with and<br />
some sort of error term, composed of the part you have removed.<br />
The error term is usually linked to the order you&#8217;ve truncated<br />
the series at; the higher the order, the more accurate the<br />
approximation.
</p>
<p><table class="eqtbl" id="eq-taylor-error">
<tr>
<td class="eqnrcell">(5)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?f%28a%2Bh%29%20%3D%20f%28a%29%20%2B%20f%27%28a%29%20h%20%2B%20%5Cfrac%7Bf%27%27%28a%29%7D%7B2%7Dh%5E2%20%2B%20%5Cfrac%7Bf%27%27%27%28a%29%7D%7B6%7D%20h%5E3%20%2B%20O%28h%5E4%29'<br />
	title="f(a+h) = f(a) + f&#039;(a) h + \frac{f&#039;&#039;(a)}{2}h^2 + \frac{f&#039;&#039;&#039;(a)}{6} h^3 + O(h^4)"<br />
	alt="f(a+h) = f(a) + f&#039;(a) h + \frac{f&#039;&#039;(a)}{2}h^2 + \frac{f&#039;&#039;&#039;(a)}{6} h^3 + O(h^4)" /><br />
</td>
</tr>
</table><div>&nbsp;</div></p>
<p>
If you work out the math for a sine Taylor series, with <i>a</i>&nbsp;=&nbsp;0<br />
as the reference point, you end up with Eq&nbsp;6.
</p>
<p><table class="eqtbl" id="eq-taylor-sine">
<tr>
<td class="eqnrcell">(6)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?sin%28h%29%20%3D%20h%20%5C%2C-%5C%2C%20%5Cfrac16%20h%5E3%20%5C%2C%2B%5C%2C%20%5Cfrac1%7B5%21%7D%20h%5E5%20%5C%2C-%5C%2C%20%5Cfrac1%7B7%21%7D%20h%5E7%20%5C%2C%2B%5C%2C%20...'<br />
	title="sin(h) = h \,-\, \frac16 h^3 \,+\, \frac1{5!} h^5 \,-\, \frac1{7!} h^7 \,+\, ..."<br />
	alt="sin(h) = h \,-\, \frac16 h^3 \,+\, \frac1{5!} h^5 \,-\, \frac1{7!} h^7 \,+\, ..." /><br />
</td>
</tr>
</table></p>
<p>
Note that all the even powers are conspicuously absent. This is what<br />
I meant by symmetry being useful: a sine function is odd, therefore<br />
only odd terms are needed in the expansion. But there&#8217;s more to it<br />
than that. The accuracy is given by the highest order in the<br />
approximating polynomial. This shows that there&#8217;s just no point in even<br />
starting with any even-powered polynomial, because you can get one extra<br />
order basically for free!
</p>
<p>
This is why using a quadratic approximation for a sine is somewhat<br />
useless; a cubic will have two terms as well, and be more accurate to<br />
boot. Just because it&#8217;s curved doesn&#8217;t mean a parabola is the most<br />
suitable approximation.
</p>
<p><h3 id="ssec-try-fit">1.3
Curve fitting (and a 3rd order example)
</h3>
</p>
<p>
Using the Taylor series as a basis for a sine approximation is nice,<br />
but it also has a problem. The series is meant to have an infinite<br />
number of terms and when you truncate the series, you will lose<br />
some accuracy. Of course, this was to be expected, but this isn&#8217;t<br />
the real problem; the real problem is that if your function<br />
has some crucial points it <i>must</i> pass through (which is<br />
certainly true for trigonometry functions), the truncation will<br />
move the curve away from those points.
</p>
<p>
To fix this, you need to use a polynomial with as-yet unknown<br />
coefficients (that is, multipliers to the powers) and a set of<br />
conditions that need to be satisfied. These conditions will determine<br />
the exact value of the coefficients. The Taylor expansion can serve<br />
as the basic for your initial approximation, and the final terms<br />
should be pretty close to the Taylor coefficients.
</p>
<p><div>&nbsp;</div></p>
<p>
Let&#8217;s try this for a third-order (cubic) sine approximation.<br />
Technically, a third-order polynomial means four unknowns, <i>but</i>,<br />
since the sine is odd, all the coefficients for the even powers<br />
are zero. That takes care of half the coefficients already. I told<br />
you symmetry was useful <kbd>:)</kbd>. The starting polynomial is<br />
reduced to Eq&nbsp;7, which has two coefficients<br />
<i>a</i> and <i>b</i> that have to be determined. For good measure<br />
I&#8217;ve also added the derivative, as that&#8217;s often useful to have as<br />
well.
</p>
<p><table class="eqtbl" id="eq-s3-base">
<tr>
<td class="eqnrcell">(7)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20S_3%28x%29%20%26%3D%26%20ax%20-%20b%20x%5E3%20%26%3D%26%20x%20%28a%20-%20b%20x%5E2%29%20%5C%5C%20S_3%27%28x%29%20%26%3D%26%20a%20-%203bx%5E2%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} S_3(x) &amp;=&amp; ax - b x^3 &amp;=&amp; x (a - b x^2) \\ S_3&#039;(x) &amp;=&amp; a - 3bx^2 \end{eqnarray}"<br />
	alt="\begin{eqnarray} S_3(x) &amp;=&amp; ax - b x^3 &amp;=&amp; x (a - b x^2) \\ S_3&#039;(x) &amp;=&amp; a - 3bx^2 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
Two unknowns means we need two conditional to solve the system.<br />
The most useful conditions are usually the behaviour at the<br />
boundaries. In the case of a sine, that means look at <i>x</i>&nbsp;=&nbsp;0<br />
and/or <i>x</i>&nbsp;=&nbsp;&frac12;&pi;. The latter happens to be more<br />
useful here, so let&#8217;s look at that. First, sin(&frac12;&pi;)&nbsp;=&nbsp;1,<br />
so that&#8217;s a good one. Also, we know that at &frac12;&pi; a sine is<br />
flat (a derivative of 0). This is the second condition.
</p>
<p>
The conditions are listed in Eq&nbsp;8. Solving this<br />
system is rather straightforward and will give you values for<br />
<i>a</i> and <i>b</i>, which are also given in Eq&nbsp;8.<br />
Notice that the values are roughly 5% and 30% away from the pure<br />
Taylor coefficients.
</p>
<p><table class="eqtbl" id="eq-s3-cnd">
<tr>
<td class="eqnrcell">(8)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cleft.%20%5Cbegin%7Beqnarray%7D%20S_3%28%5Cfrac%7B%5Cpi%7D2%29%20%26%3D%26%201%20%26%3D%26%20%5Cfrac%7B%5Cpi%7D2%20a%20-%20%28%5Cfrac%7B%5Cpi%7D2%29%5E3%20b%20%5C%5C%20S_3%27%28%5Cfrac%7B%5Cpi%7D2%29%20%26%3D%26%200%20%26%3D%26%20a%20-%203%28%5Cfrac%7B%5Cpi%7D2%29%5E2%20b%20%5Cend%7Beqnarray%7D%20%5C%3B%20%5C%3B%20%5Crightarrow%20%5C%3B%20%5C%3B%20%5Cbegin%7Beqnarray%7D%20a%20%26%3D%26%20%5Cfrac3%5Cpi%20%26%5Capprox%26%200.955%20%5C%5C%20%5C%5C%20b%20%26%3D%26%20%5Cfrac4%7B%5Cpi%5E3%7D%20%26%5Capprox%26%200.129%20%5Cend%7Beqnarray%7D'<br />
	title="\left. \begin{eqnarray} S_3(\frac{\pi}2) &amp;=&amp; 1 &amp;=&amp; \frac{\pi}2 a - (\frac{\pi}2)^3 b \\ S_3&#039;(\frac{\pi}2) &amp;=&amp; 0 &amp;=&amp; a - 3(\frac{\pi}2)^2 b \end{eqnarray} \; \; \rightarrow \; \; \begin{eqnarray} a &amp;=&amp; \frac3\pi &amp;\approx&amp; 0.955 \\ \\ b &amp;=&amp; \frac4{\pi^3} &amp;\approx&amp; 0.129 \end{eqnarray}"<br />
	alt="\left. \begin{eqnarray} S_3(\frac{\pi}2) &amp;=&amp; 1 &amp;=&amp; \frac{\pi}2 a - (\frac{\pi}2)^3 b \\ S_3&#039;(\frac{\pi}2) &amp;=&amp; 0 &amp;=&amp; a - 3(\frac{\pi}2)^2 b \end{eqnarray} \; \; \rightarrow \; \; \begin{eqnarray} a &amp;=&amp; \frac3\pi &amp;\approx&amp; 0.955 \\ \\ b &amp;=&amp; \frac4{\pi^3} &amp;\approx&amp; 0.129 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
The final equation is then:
</p>
<p><table class="eqtbl" id="eq-s3">
<tr>
<td class="eqnrcell">(9)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?S_3%28x%29%20%3D%20%5Cfrac3%5Cpi%20x%20-%20%5Cfrac4%7B%5Cpi%5E3%7D%20x%5E3'<br />
	title="S_3(x) = \frac3\pi x - \frac4{\pi^3} x^3"<br />
	alt="S_3(x) = \frac3\pi x - \frac4{\pi^3} x^3" /><br />
</td>
</tr>
</table></p>
<p>
In Fig&nbsp;1 you can see a number of different<br />
approximations to the sine. Note that I&#8217;ve done a little coordinate<br />
transformation for the <i>x</i>-axis: <i>z</i>&nbsp;=&nbsp;<i>x</i>/(&frac12;&pi;),<br />
so <i>z</i>&nbsp;=&nbsp;1 means <i>x</i>&nbsp;=&nbsp;&frac12;&pi;. The benefit of this<br />
will become clear later.
</p>
<p>
As you can see, the third order Taylor expansion starts out all-right,<br />
but veers off course near the end. In contrast, the third-order fit<br />
matches the sine at both end points. There is also the second-order<br />
fit from the devmaster site. As you can see, the third-order approximation<br />
is closer.
</p>
<div class=cblock>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-sine-t23"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;1. </b>
</div>

</div>
<p><div>&nbsp;</div></p>
<p>
Now, please remember that coefficients from Eq&nbsp;8<br />
are not the only ones you can use. The conditions define what the<br />
values will be; different conditions lead to different values. For<br />
example, instead using the derivative at &frac12;&pi;, I could have<br />
used it at <i>x</i>&nbsp;=&nbsp;0. This forms the<br />
set of equations of Eq&nbsp;10 and, as you can see,<br />
the coefficients are now different. This set is actually more accurate<br />
(a 0.6% average error instead of 1.1%), but it also has some rather<br />
unsavoury characteristics of having a maximum that&#8217;s not at<br />
&frac12;&pi; and goes over 1.0; this can be <i>really</i> unsettling<br />
if you intend to use the sine in something like rotation.
</p>
<p><table class="eqtbl" id="eq-s3-cnd-alt">
<tr>
<td class="eqnrcell">(10)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cleft.%20%5Cbegin%7Beqnarray%7D%20S_3%28%5Cfrac%7B%5Cpi%7D2%29%20%26%3D%26%201%20%26%3D%26%20%5Cfrac%7B%5Cpi%7D2%20a%20-%20%28%5Cfrac%7B%5Cpi%7D2%29%5E3%20b%20%5C%5C%20S_3%27%280%29%20%26%3D%26%201%20%26%3D%26%20a%20%5Cend%7Beqnarray%7D%20%5C%3B%20%5C%3B%20%5Crightarrow%20%5C%3B%20%5C%3B%20%5Cbegin%7Beqnarray%7D%20a%20%26%3D%26%201%20%5C%5C%20%5C%5C%20b%20%26%3D%26%20%5Cfrac4%7B%5Cpi%5E2%7D%281-%5Cfrac2%5Cpi%29%20%5Capprox%200.147%20%5Cend%7Beqnarray%7D'<br />
	title="\left. \begin{eqnarray} S_3(\frac{\pi}2) &amp;=&amp; 1 &amp;=&amp; \frac{\pi}2 a - (\frac{\pi}2)^3 b \\ S_3&#039;(0) &amp;=&amp; 1 &amp;=&amp; a \end{eqnarray} \; \; \rightarrow \; \; \begin{eqnarray} a &amp;=&amp; 1 \\ \\ b &amp;=&amp; \frac4{\pi^2}(1-\frac2\pi) \approx 0.147 \end{eqnarray}"<br />
	alt="\left. \begin{eqnarray} S_3(\frac{\pi}2) &amp;=&amp; 1 &amp;=&amp; \frac{\pi}2 a - (\frac{\pi}2)^3 b \\ S_3&#039;(0) &amp;=&amp; 1 &amp;=&amp; a \end{eqnarray} \; \; \rightarrow \; \; \begin{eqnarray} a &amp;=&amp; 1 \\ \\ b &amp;=&amp; \frac4{\pi^2}(1-\frac2\pi) \approx 0.147 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p><h3 id="ssec-try-dimless">1.4
Dimensionless variables and coordinate transformations
</h3>
</p>
<p>
For higher accuracy, a higher-order polynomial should be used. Before<br />
doing that, though, I&#8217;d like to mention one more trick that can make your<br />
mathematical analysis considerably easier: dimensionless variables.
</p>
<p><div>&nbsp;</div></p>
<p>
The problem with most quantities and equations is units. Metres, feet, litres,<br />
gallons; those kinds of units. Units suck. For one, there are different<br />
units for identical quantities which can be a total pain to convert<br />
and can sometimes lead to disaster.<br />

<a href="http://en.wikipedia.org/wiki/Gimli_Glider">Literally</a>.</p>
<p>Then there&#8217;s the fact that the unit sizes are basically picked at random<br />
and have nothing to do with the physical situation they&#8217;re used for.<br />
So you have weird values for constants like <i>G</i> in<br />

<a href="http://en.wikipedia.org/wiki/Newton%26%238217%3Bs%20law%20of%20universal%20gravitation">Newton&#8217;s law of universal gravitation</a>, the speed of<br />
light <i>c</i> and the 
<a href="http://en.wikipedia.org/wiki/Planck%20constant">Planck constant</a>, <i>h</i>. Keeping<br />
track of these things in equations is annoying, especially since they<br />
tend to pile up and everybody would rather that they&#8217;d just <i>go<br />
away</i>!
</p>
<p>
Enter dimensionless variables. The idea here is that instead of using<br />
standard units, you express quantities as ratios to some meaningful<br />
size. For example, in relativity you often get <i>v</i>/<i>c</i> :<br />
velocity over speed of light. Equations become much simpler if you<br />
just denote velocities as fractions of the speed of light:<br />
&beta;&nbsp;=&nbsp;<i>v</i>/<i>c</i>. Using &beta; in the equations simplifies<br />
them immensely and has the bonus that you&#8217;re not tied to any<br />
specific speed-unit anymore.
</p>
<p>
The dimensionless variable is a type of coordinate transformation.<br />
In particular, it&#8217;s a scaling of the original variable into something<br />
more useful. Another useful transformation is translation: moving<br />
the variable to a more suitable position. We will come accross this<br />
later; but first: an example of dimensionless variables.
</p>
<p><div>&nbsp;</div></p>
<p>
A sine wave has lots of symmetry lines, all revolving around the<br />
quarter-circles. Because of this, the term that keeps showing up<br />
everywhere is &frac12;&pi;. This is the characteristic size of<br />
the wave. By using <i>z</i>&nbsp;=&nbsp;<i>x</i>/(&frac12;&pi;), all those<br />
important points are now at integral <i>z</i> values. Having ones<br />
in your equations is generally a good thing because they tend to<br />
disappear in multiplications. Look at what Eq&nbsp;9<br />
becomes when expressed in terms of <i>z</i>
</p>
<p><table class="eqtbl" id="eq-s3-dimless">
<tr>
<td class="eqnrcell">(11)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20S_3%28x%29%20%26%3D%26%20%5Cfrac3%5Cpi%20x%20-%20%5Cfrac4%7B%5Cpi%5E3%7D%20x%5E3%20%5C%5C%20%5C%5C%20%26%3D%26%20%5Cfrac32%20%5Cfrac%7B2x%7D%5Cpi%20-%20%5Cfrac12%20%28%5Cfrac%7B2x%7D%5Cpi%29%5E3%20%5C%5C%20%5C%5C%20%26%3D%26%20%5Cfrac12%20z%20-%20%5Cfrac12%20z%5E3%20%5C%5C%20%5C%5C%20S_3%28z%29%20%26%3D%26%20%5Cfrac12z%20%283%20-%20z%5E2%29%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} S_3(x) &amp;=&amp; \frac3\pi x - \frac4{\pi^3} x^3 \\ \\ &amp;=&amp; \frac32 \frac{2x}\pi - \frac12 (\frac{2x}\pi)^3 \\ \\ &amp;=&amp; \frac12 z - \frac12 z^3 \\ \\ S_3(z) &amp;=&amp; \frac12z (3 - z^2) \end{eqnarray}"<br />
	alt="\begin{eqnarray} S_3(x) &amp;=&amp; \frac3\pi x - \frac4{\pi^3} x^3 \\ \\ &amp;=&amp; \frac32 \frac{2x}\pi - \frac12 (\frac{2x}\pi)^3 \\ \\ &amp;=&amp; \frac12 z - \frac12 z^3 \\ \\ S_3(z) &amp;=&amp; \frac12z (3 - z^2) \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
Doesn&#8217;t that look a lot nicer? It goes deeper than that though.<br />
With dimensionless units, the units your measurements are in simply<br />
cease to matter! For angles, this means that whether you&#8217;re working in<br />
radians, degrees or brads, they&#8217;ll all result in the same circle-fraction,<br />
<i>z</i>. This makes converting algorithms to fixed-point notation<br />
considerably easier.
</p>
<p><h2 id="sec-prod">2
Derivations and implementations
</h2>
</p>
<p>
In the section above, I discussed the tools used for analysis and<br />
gave an example of a cubic approximation. In this section I&#8217;ll also<br />
derive high-accuracy fourth and fifth order approximations and<br />
show some implementations. Before that, though, there&#8217;s some<br />
terminology to go through.
</p>
<p>
Since multiple different approximations will be covered, there needs<br />
to be a way to separate all of them. In principle, the sine<br />
approximation will be named <i>S</i><sub>n</sub>, where <i>n</i> is<br />
the order of the polynomial. So that&#8217;ll give <i>S</i><sub>2</sub> to<br />
<i>S</i><sub>5</sub>. I will also use <i>S</i><sub>4d</sub> for the<br />
fourth-order approximation from devmaster. In the derivation of my<br />
own fourth-order function, I&#8217;ll use <i>C</i><sub>n</sub>, because<br />
what will actually be derived is a cosine.
</p>
<p><h3>
Third-order implementation
</h3>
</p>
<p>
Let&#8217;s start with finishing up the story of the third-order<br />
approximation. The main equation for this is Eq&nbsp;11.<br />
Because this equation is still rather simple, I&#8217;ll make this a fixed-point<br />
implementation. The main problem with turning a floating-point function<br />
into a fixed-point one is keeping track of the fixed-point during the<br />
calculations, always making sure there&#8217;s no overflow, but no underflow<br />
either. This is one of the reasons why I wrote Eq&nbsp;11<br />
like it is: by using nested parentheses you can maximize the accuracy<br />
of intermediate calculations and possibly minimize the number of<br />
of intermediate calculations and possibly minimize the number of<br />
operations to boot.
</p>
<p>
To coorectly account for the fixed-point positions, you need to<br />
be aware of the following factors:
</p>
<ul>
<li>
    The scale of the outcome (i.e., the amplitude): 2<sup>A</sup>
  </li>
<li>
    The scale on the inside the parentheses: 2<sup>p</sup>. This is<br />
    necessary to keep the multiplications from overflowing.
  </li>
<li>
    The angle-scale: 2<sup>n</sup>. This is basically the value of<br />
	&frac12;&pi; in the fixed-point system. Using <i>x</i> for the<br />
	angle, you have &nbsp;=&nbsp;<i>x</i>/2<sup>n</sup>.
  </li>
</ul>
<p>
Filling this into Eq&nbsp;11 will give the following:
</p>
<p><table class="eqtbl" id="eq-s3-fp">
<tr>
<td class="eqnrcell">(12)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20S_3%28y%29%20%26%3D%26%20%5Cfrac12%20z%20%283%20-%204z%5E2%29%202%5EA%20%5C%5C%20%26%3D%26%20z%20%283%5Ccdot2%5Ep%20-%20z%5E2%202%5Ep%29%202%5E%7BA-p-1%7D%20%5C%5C%20%26%3D%26%20x%20%283%5Ccdot2%5Ep%20-%20x%5E2%202%5E%7Bp-2n%7D%29%202%5E%7BA-p-n-1%7D%20%5C%5C%20%26%3D%26%20x%20%283%5Ccdot2%5Ep%20-%20x%5E2%20%2F%202%5Er%20%29%20%5Cmiddle%2F%202%5Es%2C%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} S_3(y) &amp;=&amp; \frac12 z (3 - 4z^2) 2^A \\ &amp;=&amp; z (3\cdot2^p - z^2 2^p) 2^{A-p-1} \\ &amp;=&amp; x (3\cdot2^p - x^2 2^{p-2n}) 2^{A-p-n-1} \\ &amp;=&amp; x (3\cdot2^p - x^2 / 2^r ) \middle/ 2^s, \end{eqnarray}"<br />
	alt="\begin{eqnarray} S_3(y) &amp;=&amp; \frac12 z (3 - 4z^2) 2^A \\ &amp;=&amp; z (3\cdot2^p - z^2 2^p) 2^{A-p-1} \\ &amp;=&amp; x (3\cdot2^p - x^2 2^{p-2n}) 2^{A-p-n-1} \\ &amp;=&amp; x (3\cdot2^p - x^2 / 2^r ) \middle/ 2^s, \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
with <i>r</i>&nbsp;=&nbsp;2<i>n</i>&minus;<i>p</i> and<br />
<i>s</i>&nbsp;=&nbsp;<i>n</i>+<i>p</i>+1&minus;<i>A</i>. These represent the<br />
fixed-point shifts you need to apply to keep everything on the level.<br />
With <i>p</i> as high as multiplication with <i>x</i> will allow and the<br />
standard libnds units leads to the following numbers.
</p>
<div class=lblock>
<table border=1 cellpadding=2 cellspacing=0>
<tr>
<th> A</th>
<th> n</th>
<th> p</th>
<th> r</th>
<th> s</th>
</tr>
<tr>
<td>12</td>
<td>13</td>
<td>15</td>
<td>11</td>
<td>17</td>
</tr>
</table>
</div>
<p><div class="cptfr" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-quadrants"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;2. </b>
</div>
</p>
<p>
That&#8217;s the calculation necessary for the first quadrant, but the domain<br />
of a sine is infinite. To get the rest of the domain, you can use<br />
the symmetries of the sine: the 2&pi; periodicity and the<br />
&frac12;&pi; mirror symmetries. The first is taken care of by doing<br />
<i>z</i>&nbsp;%&nbsp;4. This reduces the domain to the four quadrants of a<br />
circle. The next part is somewhat tricky, so pay attention.
</p>
<p>
Look at Fig&nbsp;2. <i>S</i><sub>3</sub> works for<br />
quadrant 0. Because it&#8217;s antisymmetric, it will also correctly<br />
calculate quadrant 3, which is equivalent to quadrant &minus;1.<br />
Quadrants 1 and 2 are the problem. As you can see in<br />
Fig&nbsp;2, what needs to happen is for those<br />
quadrants to mirror onto quadrants 0 and &minus;1. A reflection<br />
of <i>x</i> at <i>D</i> is defined by Eq&nbsp;13.<br />
In this case, that means that <i>z</i>&nbsp;=&nbsp;2&nbsp;&minus;&nbsp;<i>z</i>
</p>
<p><table class="eqtbl" id="eq-reflect">
<tr>
<td class="eqnrcell">(13)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?x%20%3D%20D%20-%20%28x-D%29%20%3D%202D-x'<br />
	title="x = D - (x-D) = 2D-x"<br />
	alt="x = D - (x-D) = 2D-x" /><br />
</td>
</tr>
</table></p>
<p>
Some test need to be done to see when the reflection should take<br />
place. The quadrant numbers in binary are 00, 01, 10, 11. If you<br />
build a truth-table around that, you&#8217;ll see that a XOR of the<br />
two bits will do the trick. If you really want to show off,<br />
you can combine the periodicity modulo and the quadrant test by<br />
doing the arithmetic in the top bits. The implementation is<br />
now complete.
</p>
<div class="cpp">
<div class="cpp proglist" style=" ">s32 isin_S3(s32 x)<br />
{<br />
&nbsp; &nbsp; <span class="co1">// S(x) = x * ( (3&lt;&lt;p) &#8211; (x*x&gt;&gt;r) ) &gt;&gt; s</span><br />
&nbsp; &nbsp; <span class="co1">// n : Q-pos for quarter circle &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 13</span><br />
&nbsp; &nbsp; <span class="co1">// A : Q-pos for output &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 12</span><br />
&nbsp; &nbsp; <span class="co1">// p : Q-pos for parentheses intermediate &nbsp; 15</span><br />
&nbsp; &nbsp; <span class="co1">// r = 2n-p &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 11</span><br />
&nbsp; &nbsp; <span class="co1">// s = A-1-p-n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 17</span></p>
<p>&nbsp; &nbsp; <span class="kw1">static</span> <span class="kw1">const</span> <span class="kw1">int</span> qN = <span class="nu0">13</span>, qA= <span class="nu0">12</span>, qP= <span class="nu0">15</span>, qR= <span class="nu0">2</span>*qN-qP, qS= qN+qP+<span class="nu0">1</span>-qA;</p>
<p>&nbsp; &nbsp; x= x&lt;&lt;(<span class="nu0">30</span>-qN);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// shift to full s32 range (Q13-&gt;Q30)</span></p>
<p>&nbsp; &nbsp; <span class="kw1">if</span>( (x^(x&lt;&lt;<span class="nu0">1</span>)) &lt; <span class="nu0">0</span>) &nbsp; &nbsp; <span class="co1">// test for quadrant 1 or 2</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; x= (<span class="nu0">1</span>&lt;&lt;<span class="nu0">31</span>) &#8211; x;</p>
<p>&nbsp; &nbsp; x= x&gt;&gt;(<span class="nu0">30</span>-qN);</p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> x * ( (<span class="nu0">3</span>&lt;&lt;qP) &#8211; (x*x&gt;&gt;qR) ) &gt;&gt; qS;<br />
}</div>
</div>
<p>
And, of course, there&#8217;s an assembly version as well. It&#8217;s only ten<br />
instructions, which I think is actually shorter than a LUT+lerp<br />
implementation.
</p>
<div class="gccarm">
<div class="gccarm proglist" style=" "><span class="co1">@ ARM assembly version, using n=13, p=15, A=12</span><br />
<span class="co1">@ Input: gamma in Q13</span><br />
&nbsp; &nbsp; <span class="kw4">.arm</span><br />
&nbsp; &nbsp; <span class="kw4">.align</span><br />
&nbsp; &nbsp; <span class="kw4">.global</span> isin_S3a<br />
isin_S3a:<br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">lsl</span> #(<span class="nu0">30</span>-<span class="nu0">13</span>)<br />
&nbsp; &nbsp; <span class="re1">teq</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">lsl</span> #<span class="nu0">1</span><br />
&nbsp; &nbsp; <span class="re1">rsbmi</span> &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, #<span class="nu0">1</span>&lt;&lt;<span class="nu0">31</span><br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">asr</span> #(<span class="nu0">30</span>-<span class="nu0">13</span>)<br />
&nbsp; &nbsp; <span class="re1">mul</span> &nbsp; &nbsp; <span class="kw2">r1</span>, <span class="kw2">r0</span>, <span class="kw2">r0</span><br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r1</span>, <span class="kw2">r1</span>, <span class="kw1">asr</span> #<span class="nu0">11</span><br />
&nbsp; &nbsp; <span class="re1">rsb</span> &nbsp; &nbsp; <span class="kw2">r1</span>, <span class="kw2">r1</span>, #<span class="nu0">3</span>&lt;&lt;<span class="nu0">15</span><br />
&nbsp; &nbsp; <span class="re1">mul</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r1</span>, <span class="kw2">r0</span><br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">asr</span> #<span class="nu0">17</span><br />
&nbsp; &nbsp; <span class="re2">bx</span>&nbsp; &nbsp; &nbsp; <span class="kw2">lr</span></div>
</div>
<h4>Radians?</h4>
<p>
Oh wait, the requirement was for the input to be in Q12 radians,<br />
right? Weeell, that&#8217;s no biggy. You just have to do the<br />
<i>x</i>&nbsp;&rarr;&nbsp;<i>z</i> conversion yourself. Take, say,<br />
2<sup>20</sup>/(2&pi;). Multiply <i>x</i> by this gives <i>z</i><br />
as a Q30 number; exactly what the first line in the C code resulted in.<br />
This means that all you have to do is change the first line to<br />
`<code>x *= 166886;</code>&#8216;.
</p>
<h4>NDS special</h4>
<p>
The assembly version given above uses standard ARM instructions, but<br />
one of the interesting things is that the NDS&#8217; ARM9 core has special<br />
multiplication instructions. In particular, there is the<br />
<code>SMULWx</code> instruction, which does a word*halfword<br />
multiplication, where the halfword can be either the top or bottom<br />
halfword of operand 2.The main result is 32&times;16&rarr;48 bits<br />
long, of which only the top 32 bits are put in the destination<br />
register. Effectively it&#8217;s like <i>a</i>*<i>b</i>&gt;&gt;16 without<br />
overflow problems. As a bonus, it&#8217;s also slightly faster than the<br />
standard <code>MUL</code>. By slightly changing the parameters,<br />
the down-shift factors <i>r</i> and <i>s</i> can be made 16, fitting<br />
perfectly with this instruction, although the internal accuracy is<br />
made slightly worse. Additionally, careful placement of each<br />
instruction can avoid the interlock cycle that happens for<br />
multiplications.
</p>
<p>
The alternate <code>isin_S3a()</code> becomes:
</p>
<div class="gccarm">
<div class="gccarm proglist" style=" "><span class="co1">@ Special ARM assembly version, using n=13 and lots of Q14</span><br />
<span class="co1">@ Input: gamma in Q13</span><br />
&nbsp; &nbsp; <span class="kw4">.arm</span><br />
&nbsp; &nbsp; <span class="kw4">.align</span><br />
&nbsp; &nbsp; <span class="kw4">.global</span> isin_S3a9<br />
isin_S3a9:<br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">lsl</span> #(<span class="nu0">30</span>-<span class="nu0">13</span>)&nbsp; &nbsp; <span class="co1">@ x &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ; Q30</span><br />
&nbsp; &nbsp; <span class="re1">teq</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">lsl</span> #<span class="nu0">1</span><br />
&nbsp; &nbsp; <span class="re1">rsbmi</span> &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, #<span class="nu0">1</span>&lt;&lt;<span class="nu0">31</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="re2">smulwt</span>&nbsp; <span class="kw2">r1</span>, <span class="kw2">r0</span>, <span class="kw2">r0</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ y=x*x &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ; Q30*Q14/Q16 = Q28</span><br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r2</span>, #<span class="nu0">3</span>&lt;&lt;<span class="nu0">13</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ B_14=3/2</span><br />
&nbsp; &nbsp; <span class="re1">sub</span> &nbsp; &nbsp; <span class="kw2">r1</span>, <span class="kw2">r2</span>, <span class="kw2">r1</span>, <span class="kw1">asr</span> #<span class="nu0">15</span> &nbsp; &nbsp; <span class="co1">@ 3/2-y/2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ; Q14+Q28/Q14/2</span><br />
&nbsp; &nbsp; <span class="re2">smulwt</span>&nbsp; <span class="kw2">r0</span>, <span class="kw2">r1</span>, <span class="kw2">r0</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ; Q14*Q14/Q16 = Q12</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="re2">bx</span>&nbsp; &nbsp; &nbsp; <span class="kw2">lr</span></div>
</div>
<p>
Technically it&#8217;s only two instruction less, but is quite a bit<br />
faster due to the difference in speed between <code>MUL</code><br />
and <code>SMULWx</code>.
</p>
<p><h3 id="ssec-prod-s5">2.1
High-precision, fifth order
</h3>
</p>
<p>
The third order approximation actually still has a substantial error,<br />
so it may be useful to use an additional term. This would be<br />
the fifth-order approximation, <i>S</i><sub>5</sub>. It and its<br />
derivative are given in Eq&nbsp;14.
</p>
<p><table class="eqtbl" id="eq-s5-base">
<tr>
<td class="eqnrcell">(14)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20S_5%28x%29%20%26%3D%26%20ax%20-%20b%20x%5E3%20%2B%20c%20x%5E5%20%5C%5C%20%5C%5C%20%5C%5C%20S_5%27%28x%29%20%26%3D%26%20a%20-%203b%20x%5E2%20%2B%205c%20x%5E4%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} S_5(x) &amp;=&amp; ax - b x^3 + c x^5 \\ \\ \\ S_5&#039;(x) &amp;=&amp; a - 3b x^2 + 5c x^4 \end{eqnarray}"<br />
	alt="\begin{eqnarray} S_5(x) &amp;=&amp; ax - b x^3 + c x^5 \\ \\ \\ S_5&#039;(x) &amp;=&amp; a - 3b x^2 + 5c x^4 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
To find the terms, I will again use <i>z</i> instead of <i>x</i>.<br />
The conditions of note are the position and derivative at <i>z</i>&nbsp;=&nbsp;1<br />
and the derivative at 0. With these conditions the approximation<br />
should behave amicably at both edges.
</p>
<p><table class="eqtbl" id="eq-s5-cnd">
<tr>
<td class="eqnrcell">(15)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20S_5%28z%3D1%29%20%26%3D%26%201%20%26%3D%26%20a%20%26-%26%20b%20%26%2B%26%20c%20%5C%5C%20%5C%5C%20%5C%5C%20S%27_5%28z%3D1%29%20%26%3D%26%200%20%26%3D%26%20a%20%26-%26%203b%20%26%2B%26%205c%20%5C%5C%20%5C%5C%20%5C%5C%20S%27_5%28z%3D0%29%20%26%3D%26%20%5Cfrac%5Cpi2%20%26%3D%26%20a%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} S_5(z=1) &amp;=&amp; 1 &amp;=&amp; a &amp;-&amp; b &amp;+&amp; c \\ \\ \\ S&#039;_5(z=1) &amp;=&amp; 0 &amp;=&amp; a &amp;-&amp; 3b &amp;+&amp; 5c \\ \\ \\ S&#039;_5(z=0) &amp;=&amp; \frac\pi2 &amp;=&amp; a \end{eqnarray}"<br />
	alt="\begin{eqnarray} S_5(z=1) &amp;=&amp; 1 &amp;=&amp; a &amp;-&amp; b &amp;+&amp; c \\ \\ \\ S&#039;_5(z=1) &amp;=&amp; 0 &amp;=&amp; a &amp;-&amp; 3b &amp;+&amp; 5c \\ \\ \\ S&#039;_5(z=0) &amp;=&amp; \frac\pi2 &amp;=&amp; a \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
Notice that these equations are linear with respect to <i>a</i>,<br />
<i>b</i> and <i>c</i>, which means that it can be solved via matrices.<br />
Technically this system of equations forms a 3&times;3 matrix, but since<br />
<i>a</i> is already immediately known it can be reduced to a<br />
2&times;2 system. I&#8217;ll spare you the details, but it leads to the<br />
coefficients of Eq&nbsp;16. Note the complete absence of<br />
any horrid &pi;<sup>5</sup> terms that would have appeared if you had<br />
decided <i>not</i> to use dimensionless terms.
</p>
<p><table class="eqtbl" id="eq-s5-coef">
<tr>
<td class="eqnrcell">(16)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20a%20%26%3D%26%20%5Cpi%2F2%20%5C%5C%20%5C%5C%20b%20%26%3D%26%20%5Cpi%20-%205%2F2%20%5C%5C%20%5C%5C%20c%20%26%3D%26%20%5Cpi%2F2%20-%203%2F2%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} a &amp;=&amp; \pi/2 \\ \\ b &amp;=&amp; \pi - 5/2 \\ \\ c &amp;=&amp; \pi/2 - 3/2 \end{eqnarray}"<br />
	alt="\begin{eqnarray} a &amp;=&amp; \pi/2 \\ \\ b &amp;=&amp; \pi - 5/2 \\ \\ c &amp;=&amp; \pi/2 - 3/2 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p><table class="eqtbl" id="eq-s5-final">
<tr>
<td class="eqnrcell">(17)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?S_5%28z%29%20%3D%20%5Cfrac12%20z%20%28%5Cpi%20-%20z%5E2%20%5B%20%282%5Cpi-5%29%20-%20z%5E2%20%28%5Cpi%20-%203%29%20%5D%20%29'<br />
	title="S_5(z) = \frac12 z (\pi - z^2 [ (2\pi-5) - z^2 (\pi - 3) ] )"<br />
	alt="S_5(z) = \frac12 z (\pi - z^2 [ (2\pi-5) - z^2 (\pi - 3) ] )" /><br />
</td>
</tr>
</table></p>
<p>
Eq&nbsp;17 is the final quintic approximation in the<br />
form that&#8217;s most accurate and easiest to implement. The implementation<br />
is basically an extension of the <i>S</i><sub>3</sub> function<br />
and left as an exercise for the reader.
</p>
<p><h3 id="ssec-prod-s4">2.2
High precision, fourth order
</h3>
</p>
<p><div class="cptfr" style="width:192px;">
  <a href="" target="_blank">  <img src="" id="img-sincos"
    alt="" width="192" /></a><br />
  <b>Fig&nbsp;3. </b>
</div>
</p>
<p>
Lastly, a fourth-order approximation. Normally, I wouldn&#8217;t even consider<br />
this for a sine (odd function == odd power series and all that), but<br />
since the devmaster post uses them and they even seem to work, there<br />
seems to be something to them after all.
</p>
<p>
The reason those approximations work is simple: they don&#8217;t actually<br />
approximate a sine at all; they approximate a <b>co</b>sine. And,<br />
because of all the symmetries and parallels with sines and cosines,<br />
one can be used to implement the other.
</p>
<p><table class="eqtbl" id="eq-sincos">
<tr>
<td class="eqnrcell">(18)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20sin%28x%29%20%26%3D%26%20cos%28x%20-%20%5Cpi%2F2%29%20%5C%5C%20%5C%5C%20sin%28z%29%20%26%3D%26%20cos%28z%20-%201%29%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} sin(x) &amp;=&amp; cos(x - \pi/2) \\ \\ sin(z) &amp;=&amp; cos(z - 1) \end{eqnarray}"<br />
	alt="\begin{eqnarray} sin(x) &amp;=&amp; cos(x - \pi/2) \\ \\ sin(z) &amp;=&amp; cos(z - 1) \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
Eq&nbsp;18 is<br />
the transformation you need to perform to turn a cosine into a sine<br />
wave. This can be easily done in at the start of an algorithm.<br />
What&#8217;s left is to derive a cosine approximation. Because a cosine<br />
is even, only even powers will be needed. The base form and its<br />
derivative are given in Eq&nbsp;19.
</p>
<p><table class="eqtbl" id="eq-c4-base">
<tr>
<td class="eqnrcell">(19)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20C_4%20%28x%29%20%26%3D%26%20a%20-%20b%20x%5E2%20%2B%20c%20x%5E4%20%5C%5C%20%5C%5C%20%5C%5C%20C_4%27%28x%29%20%26%3D%26%20-%202b%20x%20%2B%204c%20x%5E3%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} C_4 (x) &amp;=&amp; a - b x^2 + c x^4 \\ \\ \\ C_4&#039;(x) &amp;=&amp; - 2b x + 4c x^3 \end{eqnarray}"<br />
	alt="\begin{eqnarray} C_4 (x) &amp;=&amp; a - b x^2 + c x^4 \\ \\ \\ C_4&#039;(x) &amp;=&amp; - 2b x + 4c x^3 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
For the conditions, we once again look at <i>z</i>&nbsp;=&nbsp;0 and <i>z</i>&nbsp;=&nbsp;1,<br />
which comes down to the eqt of equations in Eq&nbsp;20.<br />
One of the interesting thing about even functions is that the<br />
derivative at 0 is zero, so that&#8217;s a freebie. A very important<br />
freebie, as it means that one of the required symmetries happens<br />
automatically.
</p>
<p><table class="eqtbl" id="eq-c4-cnd">
<tr>
<td class="eqnrcell">(20)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20C_4%28z%3D0%29%20%26%3D%26%201%20%26%3D%26%20a%20%5C%5C%20%5C%5C%20%5C%5C%20C_4%28z%3D1%29%20%26%3D%26%200%20%26%3D%26%20a%20%26-%26%20b%20%26%2B%26%20c%20%5C%5C%20%5C%5C%20%5C%5C%20C%27_4%28z%3D1%29%20%26%3D%26%20-%5Cfrac%5Cpi2%20%26%3D%26%20%26-%26%202b%20%26%2B%26%204c%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} C_4(z=0) &amp;=&amp; 1 &amp;=&amp; a \\ \\ \\ C_4(z=1) &amp;=&amp; 0 &amp;=&amp; a &amp;-&amp; b &amp;+&amp; c \\ \\ \\ C&#039;_4(z=1) &amp;=&amp; -\frac\pi2 &amp;=&amp; &amp;-&amp; 2b &amp;+&amp; 4c \end{eqnarray}"<br />
	alt="\begin{eqnarray} C_4(z=0) &amp;=&amp; 1 &amp;=&amp; a \\ \\ \\ C_4(z=1) &amp;=&amp; 0 &amp;=&amp; a &amp;-&amp; b &amp;+&amp; c \\ \\ \\ C&#039;_4(z=1) &amp;=&amp; -\frac\pi2 &amp;=&amp; &amp;-&amp; 2b &amp;+&amp; 4c \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
The resulting set of coefficients are listed in Eq&nbsp;21.<br />
Note that <i>b</i>&nbsp;=&nbsp;<i>c</i>+1, which may be of use later. The final<br />
equation for the fourth order cosine approximation is<br />
Eq&nbsp;22. Only three MULs and two SUBs; nice.
</p>
<p><table class="eqtbl" id="eq-c4-coef">
<tr>
<td class="eqnrcell">(21)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20a%20%26%3D%26%201%20%5C%5C%20%5C%5C%20b%20%26%3D%26%202%20-%20%5Cpi%2F4%20%5C%5C%20%5C%5C%20c%20%26%3D%26%201%20-%20%5Cpi%2F4%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} a &amp;=&amp; 1 \\ \\ b &amp;=&amp; 2 - \pi/4 \\ \\ c &amp;=&amp; 1 - \pi/4 \end{eqnarray}"<br />
	alt="\begin{eqnarray} a &amp;=&amp; 1 \\ \\ b &amp;=&amp; 2 - \pi/4 \\ \\ c &amp;=&amp; 1 - \pi/4 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p><table class="eqtbl" id="eq-c4-final">
<tr>
<td class="eqnrcell">(22)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?C_4%28z%29%20%3D%201%20-%20z%5E2%20%5B%20%282-%5Cpi%2F4%29%20-%20z%5E2%20%281-%5Cpi%2F4%29%20%5D'<br />
	title="C_4(z) = 1 - z^2 [ (2-\pi/4) - z^2 (1-\pi/4) ]"<br />
	alt="C_4(z) = 1 - z^2 [ (2-\pi/4) - z^2 (1-\pi/4) ]" /><br />
</td>
</tr>
</table></p>
<h4>Implementation</h4>
<p>
The floating-point implementation of Eq&nbsp;22 is<br />
again too easy to mention here, so I&#8217;ll focus on fixed-point<br />
variations. Like with <i>S</i><sub>3</sub>, you can mix and match<br />
fixed-point positions until you get something you like. In this<br />
case I&#8217;ll stick to Q14 for almost everything to keep things simple.
</p>
<p>
The real trick here is to find out what you need to do about all the<br />
other quadrants. Cutting down to four quadrants is, again, easy.<br />
For the rest, remember that the cosine approximation calculates the top<br />
quadrants and you need to flip the sign for the bottom quadrants.<br />
If you think in terms of the parameter that a sine gets, you see that<br />
only for odd semi-circles the sign needs to change. Tracing this<br />
can be done with a single bitwise AND or a clever shift.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">//! A sine approximation via &nbsp;a fourth-order cosine approx.</span><br />
s32 isin_S4(s32 x)<br />
{<br />
&nbsp; &nbsp; <span class="kw1">int</span> c, x2, y;<br />
&nbsp; &nbsp; <span class="kw1">static</span> <span class="kw1">const</span> <span class="kw1">int</span> qN= <span class="nu0">13</span>, qA= <span class="nu0">12</span>, B=<span class="nu0">19900</span>, C=<span class="nu0">3516</span>;</p>
<p>&nbsp; &nbsp; c= x&lt;&lt;(<span class="nu0">30</span>-qN);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Semi-circle info into carry.</span><br />
&nbsp; &nbsp; x -= <span class="nu0">1</span>&lt;&lt;qN; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// sine -&gt; cosine calc</span></p>
<p>&nbsp; &nbsp; x= x&lt;&lt;(<span class="nu0">31</span>-qN);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Mask with PI</span><br />
&nbsp; &nbsp; x= x&gt;&gt;(<span class="nu0">31</span>-qN);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Note: SIGNED shift! (to qN)</span><br />
&nbsp; &nbsp; x= x*x&gt;&gt;(<span class="nu0">2</span>*qN-<span class="nu0">14</span>);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// x=x^2 To Q14</span></p>
<p>&nbsp; &nbsp; y= B &#8211; (x*C&gt;&gt;<span class="nu0">14</span>); &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// B &#8211; x^2*C</span><br />
&nbsp; &nbsp; y= (<span class="nu0">1</span>&lt;&lt;qA)-(x*y&gt;&gt;<span class="nu0">16</span>); &nbsp; &nbsp; &nbsp; <span class="co1">// A &#8211; x^2*(B-x^2*C)</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> c&gt;=<span class="nu0">0</span> ? y : -y;<br />
}</div>
</div>
<p>
And an ARM9 assembly version too. As it happens, it&#8217;s only two<br />
instuctions longer than <code>isin_S3a9()</code>.
</p>
<div class="gccarm">
<div class="gccarm proglist" style=" "><span class="co1">@ ARM assembly version of S4 = C4(gamma-1), using n=13, A=12 and &#8230; miscellaneous.</span><br />
<span class="co1">@ Input: gamma in Q13</span><br />
&nbsp; &nbsp; <span class="kw4">.arm</span><br />
&nbsp; &nbsp; <span class="kw4">.align</span><br />
&nbsp; &nbsp; <span class="kw4">.global</span> isin_S4a9<br />
isin_S4a9:<br />
&nbsp; &nbsp; <span class="re1">movs</span>&nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw1">lsl</span> #(<span class="nu0">31</span>-<span class="nu0">13</span>)&nbsp; &nbsp; <span class="co1">@ r0=x%2 &lt;&lt;31 &nbsp; &nbsp; &nbsp; ; carry=x/2</span><br />
&nbsp; &nbsp; <span class="re1">sub</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, #<span class="nu0">1</span>&lt;&lt;<span class="nu0">31</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ r0 -= 1.0 &nbsp; &nbsp; &nbsp; &nbsp; ; sin &lt;-&gt; cos</span><br />
&nbsp; &nbsp; <span class="re2">smulwt</span>&nbsp; <span class="kw2">r1</span>, <span class="kw2">r0</span>, <span class="kw2">r0</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ r1 = x*x&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ; Q31*Q15/Q16=Q30</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="re2">ldr</span> &nbsp; &nbsp; <span class="kw2">r2</span>,=<span class="nu0">14016</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ C = (1-pi/4)&lt;&lt;16</span><br />
&nbsp; &nbsp; <span class="re2">smulwt</span>&nbsp; <span class="kw2">r0</span>, <span class="kw2">r2</span>, <span class="kw2">r1</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ C*x^2&gt;&gt;16 &nbsp; &nbsp; &nbsp; &nbsp; ; Q16*Q14/Q16 = Q14</span><br />
&nbsp; &nbsp; <span class="re1">add</span> &nbsp; &nbsp; <span class="kw2">r2</span>, <span class="kw2">r2</span>, #<span class="nu0">1</span>&lt;&lt;<span class="nu0">16</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ B = C+1</span><br />
&nbsp; &nbsp; <span class="re1">rsb</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, <span class="kw2">r2</span>, <span class="kw1">asr</span> #<span class="nu0">2</span>&nbsp; &nbsp; &nbsp; <span class="co1">@ B &#8211; C*x^2 &nbsp; &nbsp; &nbsp; &nbsp; ; Q14</span><br />
&nbsp; &nbsp; <span class="re2">smulwb</span>&nbsp; <span class="kw2">r0</span>, <span class="kw2">r1</span>, <span class="kw2">r0</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ x^2 * (B-C*x^2) &nbsp; ; Q30*Q14/Q16 = Q28</span><br />
&nbsp; &nbsp; <span class="re1">mov</span> &nbsp; &nbsp; <span class="kw2">r1</span>, #<span class="nu0">1</span>&lt;&lt;<span class="nu0">12</span><br />
&nbsp; &nbsp; <span class="re1">sub</span> &nbsp; &nbsp; <span class="kw2">r0</span>, <span class="kw2">r1</span>, <span class="kw2">r0</span>, <span class="kw1">asr</span> #<span class="nu0">16</span> &nbsp; &nbsp; <span class="co1">@ 1 &#8211; x^2 * (B-C*x^2)</span><br />
&nbsp; &nbsp; <span class="re1">rsbcs</span> &nbsp; <span class="kw2">r0</span>, <span class="kw2">r0</span>, #<span class="nu0">0</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">@ Flip sign for odd semi-circles.</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="re2">bx</span>&nbsp; &nbsp; &nbsp; <span class="kw2">lr</span></div>
</div>
<p><h2 id="sec-test">3
Testing
</h2>
</p>
<p>
Deriving approximations is nice and all, but there&#8217;s really no point<br />
unless you do some sort of test to see how well they perform. I&#8217;ll<br />
look at two things: accuracy and some speed-tests. For the speed-test,<br />
I&#8217;ll only consider the functions given here along with some traditional<br />
ones. The accuracy test is done only for the first quadrant and in<br />
floating-point, but the results should carry over well to a fixed-point<br />
case. Finally, I&#8217;ll show how you can optimize the functions for accuracy.
</p>
<p><h3 id="ssec-test-speed">3.1
Third and fourth-order speed
</h3>
</p>
<p>
For the speed test I calculated the sine at 256 points for<br />
<i>x</i>&nbsp;&isin;&nbsp;[0,&nbsp;2&pi;). There will be some loop-overhead<br />
in the numbers, but it should be small. Tests were performed on the<br />
NDS.
</p>
<p>
Functions under investigation are the three <i>S</i><sub>3</sub> and<br />
two <i>S</i><sub>4</sub> functions given earlier. I&#8217;ve also tested<br />
the standard floating-point <code>sin()</code> library function,<br />
the libnds <code>sinLerp()</code> and my own <code>isin()</code><br />
function that you can find in<br />
<a href="http://www.coranac.com/documents/arctangent#ssec-atan-sin">arctan:sine</a>.<br />
The cumulative and average times can be found in<br />
Table&nbsp;1.
</p>
<div class=lblock>
<table id="tbl-isin-speed"<br />
  border=1 cellpadding=3 cellspacing=0><br />
<caption align=bottom>
  <b>Table&nbsp;1</b>: sine cycle-times (roughly).<br />
</caption>
<tr>
<th>Function (thumb/ARM) </th>
<th>Total cycles</th>
<th>average cycles</th>
<tr class=rnum>
<th>sin (F)</th>
<td>300321</td>
<td>1175.1</td>
</tr>
<tr class=rnum>
<th>sinLerp (T)</th>
<td>10051</td>
<td>39.2</td>
</tr>
<tr class=rnum>
<th>isin (T)</th>
<td>7401</td>
<td>28.9</td>
</tr>
<tr class=rnum>
<th>isin_S3 (T)</th>
<td>5267</td>
<td>20.5</td>
</tr>
<tr class=rnum>
<th>isin_S4 (T)</th>
<td>6456</td>
<td>25.2</td>
</tr>
<tr class=rnum>
<th>isin_S3a (A)</th>
<td>3438</td>
<td>13.4</td>
</tr>
<tr class=rnum>
<th>isin_S3a9 (A)</th>
<td>2591</td>
<td>10.1</td>
</tr>
<tr class=rnum>
<th>isin_S4a9 (A)</th>
<td>3123</td>
<td>12.1</td>
</tr>
</table>
</div>
<p>
The first thing that should be clear is just why we don&#8217;t use the<br />
floating-point sine. I mean, seriously. There is also a clear difference<br />
between the Thumb-compiled and ARM assembly versions, the latter being<br />
significantly faster.
</p>
<p>
Within the compiled versions, I find it interesting to see that the<br />
algorithmic calculations are actually faster than the LUT+lerp-based<br />
implementations. I guess loading all those numbers from memory<br />
really does suck.
</p>
<p>
And <i>then</i> there&#8217;s the assembly versions. Wow. Compared to the<br />
compiled version they&#8217;re twice as fast, and up to four times as fast<br />
as the LUT-based functions.
</p>
<p><div class=note>
<div  class=nhcare>NDS timers measure half-cycles</div>
</p>
<p>
The cycle-times from Table&nbsp;1 do not make sense<br />
if you count instruction cycles. For example, for <code>isin_S3a</code><br />
the function overhead alone should already be around 10 cycles. The<br />
thing here is that the numbers are taken from the hardware timers,<br />
which use the bus-frequency (33 MHz) rather than the ARM9 cpu (66 MHz).<br />
As such, it measures in half-cycles. For details, see<br />
<a href="http://nocash.emubase.de/gbatek.htm/#dsmemorytimings">gbatek:nds-timings</a>.
</p>
<p></div>
</p>
<p><h3 id="ssec-test-acc">3.2
Accuracy
</h3>
</p>
<p>
Fig&nbsp;4 shows all the approximations in one graph.<br />
It only shows one quadrant because the rest can be retrieved by<br />
symmetry. I&#8217;ve also scaled the sine and its approximations by<br />
2<sup>12</sup> because that&#8217;s the scale that usual fixed-point scale<br />
right now. And to be sure, yes, this is a different chart than<br />
Fig&nbsp;1; it&#8217;s just hard to tell because the<br />
fourth and fifth order functions are virtually identical to the<br />
real sine line.
</p>
<div class=cblock>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-sine-all"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;4. </b>
</div>

</div>
<p>
For the high-accuracy approximations, it&#8217;s better to look at<br />
Fig&nbsp;5, which shows the errors. Here you can<br />
clearly see a difference between <i>S</i><sub>4d</sub> and<br />
<i>S</i><sub>5</sub>, the latter is roughly 3 times better.
</p>
<p>
There&#8217;s also a large difference between the devmaster fourth-order<br />
sine and my own. The reason behind this is a difference in conditions.<br />
In my case, I&#8217;ve fixed the derivatives at both end-points, which<br />
always results in an over- or underestimate. The devmaster&#8217;s<br />
<i>S</i><sub>4d</sub> let go of those conditions and minimized the<br />
error. I&#8217;ll also do this in the next sub-section.
</p>
<div class=cblock>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-sine-all-err"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;5. </b>
</div>

</div>
<p><div>&nbsp;</div></p>
<p>
Table&nbsp;2 and Table&nbsp;3<br />
list some interesting statistics about<br />
the various approximations, namely the minimum, average and maximum<br />
errors. It also contains a 
<a href="http://en.wikipedia.org/wiki/Root%20Mean%20Square%20Deviation">Root Mean Square Deviation</a> (RMSD), which is a<br />
special kind of distance. If you consider the data-points as a<br />
vector, the RMSD is the average Pythagorean length for each point.<br />
Table&nbsp;2 is normed to 2<sup>12</sup>, whereas<br />
Table&nbsp;3 is table for the traditional floating-point<br />
sine scale.
</p>
<p>
The RMSD values are probably the most useful to look at. From them<br />
you can see that there is a huge gap between the low-accuracy and<br />
high-accuracy functions of about a factor 60. And if you do your<br />
math right, all it costs is one multiplication and one addition,<br />
and maybe some extra shifts in the fixed-point case. That&#8217;s quite<br />
a bargain. Compared to that, the difference between the odd and<br />
even functions is somewhat meager: only a factor three or so.<br />
Still, it is something.
</p>
<p>
If you look at the fixed-point table, you can see that the error<br />
you make with <i>S</i><sub>4d</sub> and<br />
<i>S</i><sub>5</sub> is in the single digits. This means<br />
that this is probably accurate enough for practical purposes.<br />
Combined with the fact that even fifth order polynomials can be<br />
made pretty fast, this makes them worth considering over LUTs.
</p>
<div class=cblock>
<table width=80%>
<tr>
<td>
<table id="tbl-stats" border=1 cellpadding=3 cellspacing=0>
<caption align=bottom>
  <b>Table&nbsp;2</b>: error statistics for 2<sup>12</sup>sin(x) approx.<br />
</caption>
<tr class=top>
<th></th>
<th>min</th>
<th>avg</th>
<th>max</th>
<th>rms</th>
</tr>
<tr class=rnum>
<th>Taylor3</th>
<td>-302.1</td>
<td>-51.5</td>
<td>0</td>
<td>92.7</td>
</tr>
<tr class=rnum>
<th>S2</th>
<td>0</td>
<td>123.1</td>
<td>229.4</td>
<td>146.8</td>
</tr>
<tr class=rnum>
<th>S3</th>
<td>-82.0</td>
<td>-47.6</td>
<td>0</td>
<td>55.0</td>
</tr>
<tr class=rnum>
<th>S4d</th>
<td>-4.47</td>
<td>0.19</td>
<td>3.11</td>
<td>2.44</td>
</tr>
<tr class=rnum>
<th>S4</th>
<td>0</td>
<td>5.87</td>
<td>11.4</td>
<td>7.11</td>
</tr>
<tr class=rnum>
<th>S5</th>
<td>0</td>
<td>0.74</td>
<td>1.62</td>
<td>0.94</td>
</tr>
</table>
</td>
<td>
<table id="tbl-statsp" border=1 cellpadding=3 cellspacing=0>
<caption align=bottom>
  <b>Table&nbsp;3</b>: error statistics in percentages.<br />
</caption>
<tr class=top>
<th></th>
<th>min%</th>
<th>avg%</th>
<th>max%</th>
<th>rms%</th>
</tr>
<tr class=rnum>
<th>Taylor3</th>
<td>-7.37</td>
<td>-1.26</td>
<td>0</td>
<td>2.26</td>
</tr>
<tr class=rnum>
<th>S2</th>
<td>0</td>
<td>3</td>
<td>5.6</td>
<td>3.58</td>
</tr>
<tr class=rnum>
<th>S3</th>
<td>-2</td>
<td>-1.16</td>
<td>0</td>
<td>1.34</td>
</tr>
<tr class=rnum>
<th>S4d</th>
<td>-0.11</td>
<td>0.0047</td>
<td>0.076</td>
<td>0.06</td>
</tr>
<tr class=rnum>
<th>S4</th>
<td>0</td>
<td>0.143</td>
<td>0.278</td>
<td>0.174</td>
</tr>
<tr class=rnum>
<th>S5</th>
<td>0</td>
<td>0.018</td>
<td>0.039</td>
<td>0.023</td>
</tr>
</table>
</td>
</table>
</div>
<p><h3 id="ssec-test-opt">3.3
Optimizing higher-order approximations
</h3>
</p>
<p>
From the charts, you can see that <i>S</i><sub>4</sub> and<br />
<i>S</i><sub>5</sub> all err on the same side of the sine line. You<br />
can increase the accuracy of the approximation by tweaking the<br />
coefficients in such a way that the errors are redistributed in<br />
a preferable way. Two methods are possible here: shoot for a zero<br />
error average, or minimize the RMSD. Technically minimizing the<br />
RMSD is standard (it comes down to least-squares optimization), but<br />
because a zero-average allows for an analytical solution, I&#8217;ll use<br />
that. In any case, the differences in outcomes will be small.
</p>
<p><div>&nbsp;</div></p>
<p>
First, think of what an average of a function means. The average<br />
of a set of numbers is the sum divided by the size of the set. For<br />
functions, it&#8217;s the integral of that function divided by the<br />
interval. When you want a zero-average for an approximation, the<br />
integral of the function and that of the approximation should<br />
be equal. With a polynomial approximation to a sine, we get:
</p>
<p><table class="eqtbl" id="eq-cnd-avg0">
<tr>
<td class="eqnrcell">(23)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20%5Cint_0%5E1%20%5Csum_n%20a_n%20x%5En%20dx%20%26%3D%26%20%5Cint_0%5E1%20sin%28x%5Cpi%2F2%29%20dx%20%5C%3B%5C%3B%20%5Crightarrow%20%5C%5C%20%5C%5C%20%5Csum_n%20%5Cfrac%7Ba_n%7D%7Bn%2B1%7D%20%26%3D%26%202%2F%5Cpi%20%2C%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} \int_0^1 \sum_n a_n x^n dx &amp;=&amp; \int_0^1 sin(x\pi/2) dx \;\; \rightarrow \\ \\ \sum_n \frac{a_n}{n+1} &amp;=&amp; 2/\pi , \end{eqnarray}"<br />
	alt="\begin{eqnarray} \int_0^1 \sum_n a_n x^n dx &amp;=&amp; \int_0^1 sin(x\pi/2) dx \;\; \rightarrow \\ \\ \sum_n \frac{a_n}{n+1} &amp;=&amp; 2/\pi , \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
with <i>a</i><sub>n</sub> reducing to the coefficients of the<br />
polynomials we had before. This can be used as an alternate condition<br />
to the derivative at 0. For <i>S</i><sub>4</sub> and<br />
<i>S</i><sub>5</sub>, you&#8217;ll end up with the following coefficients.
</p>
<p><!--</p>
<p>
The simplest way to do this is to just dump the data into Excel<br />
and let the Solver do its magic on all the coefficients. A better<br />
way is first see what actually needs to change. We still have some<br />
conditions that need to be satisfied: 0 and 1 at the boundaries<br />
and a maximum at <i>z</i>&nbsp;=&nbsp;1. Only the derivative at 0 is flexible,<br />
and this leads to some restrictions in the search. In fact, it<br />
turns out that only one coefficient needs to be minimized for, and<br />
the rest follow from its value.
</p>
<p>
Taking all that into account, you get the following<br />
coefficients for <i>S</i><sub>4</sub> and<br />
<i>S</i><sub>5</sub>.
</p>
<p>&#8211;></p>
<p><table class="eqtbl" id="eq-s4-opt-coef">
<tr>
<td class="eqnrcell">(24)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20a_4%20%26%3D%26%201%20%5C%5C%20b_4%20%26%3D%26%20c_4%2B1%20%5C%5C%20c_4%20%26%3D%26%205%281-%5Cfrac3%5Cpi%29%20%5Capprox%200.225351707%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} a_4 &amp;=&amp; 1 \\ b_4 &amp;=&amp; c_4+1 \\ c_4 &amp;=&amp; 5(1-\frac3\pi) \approx 0.225351707 \end{eqnarray}"<br />
	alt="\begin{eqnarray} a_4 &amp;=&amp; 1 \\ b_4 &amp;=&amp; c_4+1 \\ c_4 &amp;=&amp; 5(1-\frac3\pi) \approx 0.225351707 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p><table class="eqtbl" id="eq-s5-opt-coef">
<tr>
<td class="eqnrcell">(25)</td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Cbegin%7Beqnarray%7D%20a_5%20%26%3D%26%204%28%5Cfrac3%5Cpi%20-%20%5Cfrac9%7B16%7D%29%20%5Capprox%201.569718634%20%5C%5C%20b_5%20%26%3D%26%202%20a_5%20-%205%2F2%20%5C%5C%20c_5%20%26%3D%26%20a_5%20-%203%2F2%20%5Cend%7Beqnarray%7D'<br />
	title="\begin{eqnarray} a_5 &amp;=&amp; 4(\frac3\pi - \frac9{16}) \approx 1.569718634 \\ b_5 &amp;=&amp; 2 a_5 - 5/2 \\ c_5 &amp;=&amp; a_5 - 3/2 \end{eqnarray}"<br />
	alt="\begin{eqnarray} a_5 &amp;=&amp; 4(\frac3\pi - \frac9{16}) \approx 1.569718634 \\ b_5 &amp;=&amp; 2 a_5 - 5/2 \\ c_5 &amp;=&amp; a_5 - 3/2 \end{eqnarray}" /><br />
</td>
</tr>
</table></p>
<p>
If you&#8217;re still awake and remember the devmaster <i>S</i><sub>4d</sub><br />
coefficients, there should be something familiar about<br />
<i>a</i><sub>4</sub>. Yes, they&#8217;re practically identical. If you<br />
optimize <i>S</i><sub>4</sub> for the RMSD, you actually get the exact<br />
same function as <i>S</i><sub>4d</sub>.
</p>
<p>
Table&nbsp;4 shows the statistics for the original<br />
approximations and the new optimized versions, <i>S</i><sub>4o</sub><br />
and <i>S</i><sub>5o</sub>. The numbers for <i>S</i><sub>4o</sub><br />
are basically those from <i>S</i><sub>4d</sub> seen earlier. More<br />
interesting are the details for <i>S</i><sub>5o</sub>. The maximum<br />
and minimum errors are now within &plusmn;1. That is to say,<br />
this approximation gives values that are at most 1 off from the<br />
proper Q12 sine. This is about as good as any Q12<br />
approximation is able to get.
</p>
<div class=lblock>
<table id="tbl-stats-opt" border=1 cellpadding=3 cellspacing=0>
<caption align=bottom>
  <b>Table&nbsp;4</b>: Optimized Q12 <i>S</i><sub>4</sub> and <i>S</i><sub>5</sub>.<br />
</caption>
<tr>
<th class=top></th>
<th class=top>min</th>
<th class=top>avg</th>
<th class=top>max</th>
<th class=top>rmsd</th>
</tr>
<tr class=rnum>
<th>S4</th>
<td>0</td>
<td>5.87</td>
<td>11.4</td>
<td>7.11</td>
</tr>
<tr class=rnum>
<th>S5</th>
<td>0</td>
<td>0.74</td>
<td>1.616</td>
<td>0.94</td>
</tr>
<tr class=rnum>
<th>S4o</th>
<td>-4.72</td>
<td>0</td>
<td>2.89</td>
<td>2.47</td>
</tr>
<tr class=rnum>
<th>S5o</th>
<td>-0.73</td>
<td>0</td>
<td>0.79</td>
<td>0.52</td>
</tr>
</table>
</div>
<div class=cblock>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-sine-err45-opt"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;6. </b>
</div>

</div>
<p><h2 id="sec-summary">4
Summary and final thoughts
</h2>
</p>
<p>
Here&#8217;s a few things to take from all this.
</p>
<ul>
<li>Symmetry is your friend.</li>
<li>
    When constructing a polynomial approximation, more terms mean<br />
    higher accuracy. Symmetry properties of the function approximated<br />
    allow you to remove terms from consideration, simplifying the<br />
    equation.
  </li>
<li>
    Coordinate transformations are your friends too. Sometimes<br />
    it&#8217;s much easier to work on a scaled or moved version of the<br />
    original problem.<br />
    If your situation has a characteristic length (or time,<br />
	velocity, whatever) consider using dimensionless variables:<br />
	expressing parameters as ratios of the characteristic length.<br />
	This makes the initial units pretty much irrelevant. For<br />
	angles, think circle-fractions.
  </li>
<li>
    Zero and one (0 and 1) are the best values to have in your<br />
	equations, as they tend to vanish to easily.
  </li>
<li>
    Any approximation formula will have coefficients to be determined.<br />
    In general, the Taylor series terms are <i>not</i> the best<br />
    set; values slightly offset from these terms will be better as<br />
    they can correct for the truncation. To determine the values of<br />
    the coefficients, define some conditions that need to be<br />
	satisfied. Examples of conditions are values of the function and<br />
	its derivative at the boundaries, or its integrals. Or you can<br />
	wuss out and just dump the thing in the Excel Solver.
  </li>
<li>
    When converting to fixed-point, accuracy and overflow comes into<br />
    the fray. If you know the domain of the function beforehand, you can<br />
    optimize for accuracy. Also, it helps if you construct the<br />
	algorithm in a sort of recursive form instead of a pure<br />
	polynomial: not <i>a</i><i>x</i>&nbsp;+&nbsp;<i>b</i><i>x</i><sup>2</sup><br />
	but	<i>x</i>(<i>a</i>&nbsp;+&nbsp;<i>x</i><i>b</i>). Ordered like this,<br />
	each new additional term only requires one multiplication and<br />
	one addition extra.
  </li>
<li>
    For fixed-point work, <code>SMULWx</code> is teh awesome.
  </li>
<li>
    Even a fourth order (and presumably fifth order as well) polynomial<br />
    implementation in C is faster than the LUT-based sines on the NDS.<br />
	And specialized assembly versions are considerably faster still.
  </li>
<li>
    The difference in accuracy of<br />
    <i>S</i><sub>4</sub> vs <i>S</i><sub>2</sub> or<br />
    <i>S</i><sub>5</sub> vs <i>S</i><sub>3</sub> is huge: a factor of<br />
    60. Going from an even to the next odd approximation only gains you<br />
	a factor 3. Shame; I&#8217;d hoped it&#8217;d be more.
  </li>
<li>
    Unlike I initially thought, the even-powered polynomials work<br />
	out quite well. This is because they&#8217;re actually modified cosine<br />
	approximations.
  </li>
</ul>
<h4>Exercises for the reader</h4>
<ol>
<li>
    Express the parabolic approximation <i>S</i><sub>2</sub>(<i>x</i>)<br />
	of Eq&nbsp;1 in terms of <i>z</i>. &#8217;s Not hard, I promise.
  </li>
<li>
    Implement the fixed-point version of the fifth-order sine<br />
	approximation, <i>S</i><sub>5</sub>(<i>x</i>).
  </li>
<li>
    For the masochists: derive the coefficients for <i>S</i><sub>5</sub>(<i>x</i>)<br />
	<i>without</i> dimensionless variables. That is to say, with<br />
	the conditions at <i>x</i>&nbsp;=&nbsp;&frac12;&pi; instead of <i>z</i>&nbsp;=&nbsp;1.
  </li>
<li>
    Solve Eq&nbsp;24 and Eq&nbsp;25 for<br />
    minimal RMDS. Also, try to derive an analytical form for minimal RMDS;<br />
    I think it&#8217;s exists, but it may be tricky to come up with the right form.
  </li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/07/sines/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Bughunting</title>
		<link>http://www.coranac.com/2009/06/bughunting/</link>
		<comments>http://www.coranac.com/2009/06/bughunting/#comments</comments>
		<pubDate>Sun, 28 Jun 2009 16:56:47 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tainment]]></category>
		<category><![CDATA[alien]]></category>
		<category><![CDATA[avp]]></category>
		<category><![CDATA[game]]></category>
		<category><![CDATA[predator]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=76</guid>
		<description><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />


Yes!
 YES!! 
 OH GOD, YES!!! 

&#160;
I mean &#8230; uhm &#8230;
&#8230;

While browsing throught the E3 reports, I was moderately pleased to see
the Aliens versus Predator series (of games, not movies) is
getting another sequel.

&#160;

I&#8217;ve always had a soft spot for 
xenomorphs.
This extends to the 
Aliens versus Predator games
that have been released on the PC. The first [...]]]></description>
			<content:encoded><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<p></p>
<p>
Yes!<br />
<span style="font-size:200%"> <b>YES!!</b> </span><br />
<span style="font-size:400%; font-style:oblique;"> <b>OH GOD, YES!!!</b> </span>
</p>
<p><div>&nbsp;</div></p>
<p>I mean &hellip; uhm &hellip;<br />
&hellip;</p>
<p>
While browsing throught the E3 reports, I was moderately pleased to see<br />
the Aliens versus Predator series (of <i>games</i>, not movies) is<br />
getting another sequel.
</p>
<p><div>&nbsp;</div></p>
<p>
I&#8217;ve always had a soft spot for 
<a href="http://en.wikipedia.org/wiki/Xenomorph_%28Alien%29">xenomorphs</a>.<br />
This extends to the 
<a href="http://en.wikipedia.org/wiki/Aliens_versus_Predator_%28computer_game%29">Aliens versus Predator games</a><br />
that have been released on the PC. The first AVP came out in 1999<br />
and I think this was one of the first PC games I ever bought. While<br />
not really as popular or as rich in storyline as, say,<br />
HalfLife, I still think it has many redeeming qualities even today.<br />
For example &hellip;
</p>
<p></p>
<p><div class="cptfr" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</p>
<p>
You can play as Human, Alien, or Predator, each with very different<br />
styles of play. This was pretty unique back then for FPSs. Actually, I<br />
think it still is. The Alien in particular was unusual: <i>very</i><br />
quick, able to walk on walls and ceilings and a strange fish-eye lens<br />
point of view. It also had no ranged attack, which meant you had to get<br />
up close and personal to attack. Moreover, the alien did not have much<br />
in the way of hitpoints, which effectively meant that you had to not<br />
only get close, but get close <i>undetected</i>. You had to hide in dark<br />
corners and on ceilings waiting for people to walk by and then bite their<br />
heads off.
</p>
<p>
Most games will put you as the Hero Marine against bug-like critters<br />
to be slaughtered en-masse. This game gives you the opportunity to see<br />
what it looks like from the other side, which is definitely an educational experience. One thing that comes very clear,<br />
for example, is <i>why</i> fire and flamethrowers are not your friend.<br />
Bullets could often be avoided (except from<br />
turrets), but flamethrowers put up an entire wall of fire and one<br />
hit would keep burning for quite some time which, for creatures with<br />
few hitpoints, would fit nicely into the bad things category.
</p>
<p></p>
<p>
Playing against the aliens was also a different-than-usual experience.<br />
Able to hide anywhere (how many FPSs require you to check the ceilings?),<br />
nearly invisible against the background, fast and very, very deadly.
</p>
<p>
And &hellip; oh yeah! They bleed acid. Yeah.
</p>
<p>
They also had a very peculiar<br />
reaction to being shot: exploding and scattering themselves over a wide<br />
area. All while bleeding acid. If you remember your physics classes,<br />
you should be aware of this thing called inertia: things in motion will<br />
continue in the same direction. You know which direction the Aliens will<br />
usually be moving towards when you shoot them? You. You know in which<br />
direction all the parts and blood will be moving? You!
</p>
<p>
In other words: even when you kill them, there&#8217;s a good chance they&#8217;ll<br />
kill you right back.
</p>
<div class=cblock>
<table>
<tr>
<td>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>

</td>
<td>
<div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>

</td>
</tr>
</table>
</div>
<p>
The game&#8217;s also quite hard. I&#8217;d almost say<br />
<a href="http://tvtropes.org/pmwiki/pmwiki.php/Main/NintendoHard">Nintendo<br />
Hard</a>. AVP 1 had no in-level saving. I don&#8217;t think there&#8217;s <i>ever</i><br />
been a PC FPS that didn&#8217;t allow you to save at will. Combined with the<br />
fact that the characters were realistically weak (a 
<a href="http://en.wikipedia.org/wiki/rocket%20jump">rocket jump</a>,<br />
would only get <i>parts</i> of you to far-off distances or heights; on<br />
average you remain in the same spot), the lack of saving increased the tension<br />
considerably.
</p>
<p>
Of course a sizable group of gamers, cowardly pussies that they are,<br />
complained and eventually a save feature was added later. Shame, really;<br />
the levels are short enough for it to work, and it&#8217;s actually way more<br />
fun to play when you&#8217;re running for dear life.
</p>
<p></p>
<p>
And then there&#8217;s the motion tracker. If you&#8217;ve seen the movie Aliens,<br />
you&#8217;ll know what I&#8217;m talking about. Basically, it&#8217;s a device that<br />
measures how deep the shit you&#8217;re in is. If it emits a low<br />
<i>bup</i>, you&#8217;re safe; if it starts giving off a high-pitched<br />
<i>beep</i> (or worse, multiple beeps), you&#8217;re in trouble.
</p>
<p>
This truly is the stuff of nightmares. There&#8217;s actually something worse<br />
than darkness: darkness and having a reminder that you&#8217;re probably going to<br />
die in the next few steps if you&#8217;re not careful. This feeling was enhanced<br />
by the night vision goggles which turn off the tracker. So now you only<br />
know where something <i>was</i>, roughly, and you have to find it again.<br />
The motion tracker is without a doubt the single most evil and mind-screwing<br />
feature ever put in a game. For those who want to argue in favor of, say,<br />
Resident Evil or Silent Hill or other horror games: No! You&#8217;re wrong!<br />
It really is that simple.
</p>
<p></p>
<p>
On second thought, there is something worse. It&#8217;s called a<br />
facehugger. The spiderlike critters from the Alien series that jump you<br />
from out of nowhere and basically rape your face. The game has those too.<br />
To show just how bad these are, here&#8217;s a little anecdote about my first<br />
encounter with one.
</p>
<blockquote>
<h4>Why facehuggers are evil</h4>
<p>
It&#8217;s the third Marine mission: Invasion. You have to get to the top of<br />
the tower for a rescue. Up to this point I had done what I&#8217;d always<br />
done in an FPS: make slow but steady progress to avoid any nasty surprises.<br />
From this level onward that strategy doesn&#8217;t work anymore. At all.
</p>
<p>
The reason for that is that now the aliens start to respawn at<br />
semi-random locations. So not only am I a puny hooman faced with<br />
very quick aliens that can come from out of nowhere trying to take<br />
my head off, now there&#8217;s an endless supply of them too and only one<br />
of me. Oh and did I mention there&#8217;s no save? Not that it&#8217;d matter<br />
because they&#8217;d spawn at a different location anyway, but still.
</p>
<p>
In any case, at some point you&#8217;d clue in to the fact that the only<br />
way this is gonna work is to just make a blind run for it and<br />
<s>hope</s> <s>pray</s> curse for the best. Amazingly, this worked<br />
out pretty well. That is, until I took an elevator down to face this:
</p>
<div class=cblock><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</div>
<p>
An open Alien egg. And I could hear the pitter-patter of tiny feet down the<br />
corridor to the right, which also lit up on the tracker. When turning<br />
round the corner, the sound became louder. But still, I could not<br />
actually see the little bastard yet. &ldquo;Well. Shit.&rdquo; is<br />
something of an understatement at this point.
</p>
<p>
But <i>then</i> I hear something above me as well: an Alien was<br />
climbing down to the room in the image. &ldquo;Well. Shit.&rdquo; has<br />
now become completely inappropriate and I headed back to the elevator<br />
room to kill it. I basically sprayed the whole room with fire, hoping<br />
I&#8217;d catch it at some point. And I did. I continued to dance around to<br />
avoid it until it burst. And then all was quite again. I didn&#8217;t even<br />
hear the face hugger moving around anymore in the distance. Thinking all<br />
was safe, I turned round to hunt down the face hugger again an&hellip;
</p>
<div class=cblock style="color:red;">
<b><i><br />
<span style="font-size:200%">SSSSSSSSSSKKKKKKKKKKKKKKRRRRREEEEEEE</span><br />
<span style="font-size:180%">EEEEEEEEEEEEEEEAAAAAAAAAAAAAARRRRRRR</span><br />
<span style="font-size:145%">RRRGGGGGGGHHHHHHHHHHHHHHWITHTHEHURTING</span><br />
<span style="font-size:110%">ANDTHEPAINANDTHEDYINGOHBLOODYFUCKINGHELL</span><br />
<span style="font-size:80%">WHATWASTHAT?!?!?!!!ELEVENTYONE!!?!!</span><br />
</i></b>
</div>
<p>
Turns out the reason I didn&#8217;t hear the facehugger anymore wasn&#8217;t<br />
that I&#8217;d killed it or that it had moved too far away, but because it<br />
was already in mid-jump. Not only did it get me, it got me completely<br />
by surprise and the blood-curdling scream it emitted<br />
actually made me fall off my chair. Literally. It scared me so much that I<br />
actually leaped out of my chair. It took about a minute before I<br />
could even hold the mouse again because my hands were shaking so much. No<br />
game has <i>ever</i> had that strong of an effect before or since. This was<br />
just awful. And yet awesome at the same time.
</p>
</blockquote>
<p>
Facehuggers are just plain evil. Just hearing one moving in the area<br />
is enough to give me hives.
</p>
<p></p>
<p>
The only really bad thing about the game is that it<br />
won&#8217;t play on current computers &ndash; some graphics and sound glitches<br />
that made crashed the game or made it unplayable. However, this has<br />
actually been remedied recently. People have been tinkering with the<br />
source code and fixed the most important issues. See<br />
<a href="http://forumplanet.gamespy.com/tech_support/b49029/1049364"><br />
forumplanet.gamespy.com/tech_support/b49029/1049364</a> for details<br />
and links to patches.
</p>
<p><div>&nbsp;</div></p>
<p>
Compared to AVP 1, its sequel was, well, ultimately something of a<br />
let-down. It&#8217;s still good playing, but I felt that it could have been<br />
so much more. Sure, it had better graphics. Well, more detailed models<br />
and textures anyway; unfortunately, the textures also looked really<br />
coarse and flat, and decals would often seem to be placed <i>over</i><br />
the polygons, rather than on it, which just looked awful. AVP 1 had<br />
destructable light sources &ndash; something the Alien could make use of<br />
very well &ndash; but AVP 2 didn&#8217;t. It also did not have adjustable gamma<br />
settings, which <i>really</i> hurt because often I literally could<br />
not see anything. And they took out the cheat modes and skirmish<br />
*sigh*.
</p>
<p>
The Aliens had also changed in some very bad ways. In the original,<br />
they were fast and furious, but in AVP 2 it often seemed thay they were<br />
just hobbling along on their way too skinny legs. Instead of looking<br />
like the vicious and fast killing machines, they came off more as clumsy<br />
puppies. Okay, yes, puppies with <i>really</i> sharp claws and teeth,<br />
but not the terrors they&#8217;re supposed to be.
</p>
<p>
They also didn&#8217;t explode into parts with that <i>delightful</i> crackling<br />
sound anymore, or bleed over everything (mostly you). Mostly they just<br />
flopped down. Also, there seemed to be only one or two death poses and I<br />
think only a single animation timer for all critters. Often you&#8217;d find<br />
yourself in a field of dead aliens which lay down in exactly the same way.<br />
I know it doesn&#8217;t sound like a big deal, but little things like this can<br />
spoil the mood completely. Worst of all though was what they did with the<br />
facehuggers: they took away the scream when they kill you. This completely<br />
removed the scare factor <kbd>:(</kbd>
</p>
<p>
Having said that, it also did some things very right. There was one<br />
interconnected story line, with the threads of the three playmodes<br />
intersecting at several instances. Very nicely done. Also, as the<br />
Alien you actually played through the facehugger and chestburster<br />
stages. This was also fun.
</p>
<div class=cblock>
<table>
<tr>
<td><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</td>
<td><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</td>
<td><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</td>
</tr>
<tr>
<td><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</td>
<td><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</td>
<td><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" 
    alt="" width="" /></a><br />
  
</div>
</td>
</tr>
</table>
</div>
<p>
And now there&#8217;s gonna be a third installment. Like in AVP 2, there<br />
will be an interwoven story, so that&#8217;s good. From the pictures I&#8217;ve<br />
seen, the graphics are going to be awesome. Textures are crips, motion<br />
looks fluid &ndash; it just looks <i>right</i> again.<br />
It also looks like it&#8217;s not going to be for the faint of heart, with<br />
lovely gratuitous displays of blood and guts and trophy-taking and<br />
everything (see <a href="http://blip.tv/file/2202917">here</a><br />
for video).
</p>
<p>
I was somewhat surprised to see Sega as the publisher for the<br />
game. It seemed a little odd at first, but if you take into account that<br />
they&#8217;re a 
<a href="http://en.wikipedia.org/wiki/F-Zero_GX">bunch of</a><br />

<a href="http://en.wikipedia.org/wiki/Super_Monkey_Ball">fucking</a><br />

<a href="http://en.wikipedia.org/wiki/Super_Monkey_Ball_2">sadists</a>, I think it is actually<br />
quite fitting.
</p>
<p><div>&nbsp;</div></p>
<p>
So yeah, I&#8217;m looking forward to this one.
</p>
<p><!-- http://www.gamez.nl/xbox/special/27199/e3-special-aliens-vs-predator.html --><br />
<!-- http://www.cracked.com/article_15696_10-most-irritatingly-impossible-old-school-video-games.html --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/06/bughunting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>new and improved geshi</title>
		<link>http://www.coranac.com/2009/06/new-geshi/</link>
		<comments>http://www.coranac.com/2009/06/new-geshi/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 19:13:08 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[blag]]></category>
		<category><![CDATA[geshi]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=69</guid>
		<description><![CDATA[
With Tonc I pretty much did all the syntax highlighting of code manually.
As you might expect, this experience was &#8211; well, the proper description is
something not suitable for anyone under the age of several thousand,
so let&#8217;s keep it at &#8220;somewhat less than pleasant&#8221;. So the first
thing I looked when starting this whole blogging gig for [...]]]></description>
			<content:encoded><![CDATA[<p>
With Tonc I pretty much did all the syntax highlighting of code manually.<br />
As you might expect, this experience was &ndash; well, the proper description is<br />
something not suitable for anyone under the age of several thousand,<br />
so let&#8217;s keep it at &ldquo;somewhat less than pleasant&rdquo;. So the first<br />
thing I looked when starting this whole blogging gig for was something<br />
that could do that automatically. In my case, that was<br />
<a href="http://wordpress.org/extend/plugins/codesnippet-20/">codesnippet</a>,<br />
which was build on the very awesome<br />
<a href="http://qbnz.com/highlighter/">Geshi</a>. There were some<br />
small problems with number formatting and whitespace handling, but<br />
overall it&#8217;s served me well.
</p>
<p>
The Geshi that came with it was &hellip; 1.0.7.20, I think. In any case, Geshi&#8217;s<br />
is now at 1.0.8.3, so I figured it was time for an upgrade. Most notable was<br />
that the way numbers were parsed has been greatly modified, with different<br />
types of representations now being parsed separately &ndash; and correctly<br />
to boot. Right now, it&#8217;s almost fully correct, as you can see from the list<br />
below:
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">// Regular int</span></p>
<p><span class="nu0">123</span><br />
<span class="nu0">123l</span><br />
<span class="nu0">123L</span><br />
123ll &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span><br />
123LL &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span></p>
<p>123u&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span><br />
123U&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// fail</span><br />
+<span class="nu0">123</span></p>
<p>-<span class="nu0">123</span></p>
<p><span class="co1">// Octal</span><br />
<span class="nu0">0123</span></p>
<p><span class="co1">// Hex</span><br />
<span class="nu0">0&#215;12</span><br />
<span class="nu0">0&#215;123</span><br />
0&#215;123.4</p>
<p><span class="co1">// Float</span><br />
<span class="nu0">123.4</span><br />
<span class="nu0">123.4f</span><br />
<span class="nu0">123.4F</span><br />
+<span class="nu0">123.4</span><br />
-<span class="nu0">123.4</span><br />
<span class="nu0">1.2e3</span></p>
<p><span class="nu0">1.2E3</span><br />
<span class="nu0">1.2e+3</span><br />
<span class="nu0">1.2e-3</span></p>
<p><span class="co1">// Inner</span><br />
(<span class="nu0">1.23</span>)<br />
abc123de</div>
</div>
<p>Only some of the more special integer literals aren&#8217;t parsed correctly,<br />
specifically the unsigned (<code>-U</code>) and long long<br />
(<code>-LL</code>) suffixes aren&#8217;t accepted. I don&#8217;t suppose hex floats will<br />
work either, but that&#8217;s a GCC extension anyway.
</p>
<p>
To fix this, you need to modify geshi a little; specifically the<br />
GESHI_NUMBER_INT_CSTYLE regular expression:
</p>
<div class="none">
<div class="none proglist" style=" ">&nbsp; GESHI_NUMBER_INT_CSTYLE =&gt;<br />
&nbsp; &nbsp; <span class="st0">&#8216;(?&lt;![0-9a-z_<span class="es0">\.</span>%])(?&lt;![<span class="es0">\d</span><span class="es0">\.</span>]e[+<span class="es0">\-</span>])([1-9]<span class="es0">\d</span>*?|0)l(?![0-9a-z<span class="es0">\.</span>])&#8217;</span>,</div>
</div>
<p>
&hellip; yeah. I&#8217;m not sure why it&#8217;s formulated like that either. I&#8217;d have thought<br />
&#8216;<code>\b</code>&#8216; would have worked just as well, but alright. Anyway, notice the single &#8216;<code>l</code>&#8216; character in there? That needs to be extended to something<br />
that matches a potential single &#8216;<code>u</code>&#8216;, possibly followed by one or two<br />
&#8216;<code>l</code>&#8217;s. In other words: &#8216;<code>u?l{0,2}</code>&#8216;.
</p>
<div class="none">
<div class="none proglist" style=" ">&nbsp; GESHI_NUMBER_INT_CSTYLE =&gt;<br />
&nbsp; &nbsp; <span class="st0">&#8216;(?&lt;![0-9a-z_<span class="es0">\.</span>%])(?&lt;![<span class="es0">\d</span><span class="es0">\.</span>]e[+<span class="es0">\-</span>])([1-9]<span class="es0">\d</span>*?|0)<span class="es0">\&lt;</span>b<span class="es0">\&gt;</span>u?l{0,2}<span class="es0">\&lt;</span>/b<span class="es0">\&gt;</span>(?![0-9a-z<span class="es0">\.</span>])&#8217;</span>,</div>
</div>
<h4 id="sssec-html">HTML in code</h4>
<p>
An astute readed may have noted the bold in the previous snippet. Normally,<br />
you can&#8217;t do that in Geshi.. One of the things that Geshi does is translate<br />
HTML entities like &#8216;<code>&lt;</code>&#8216; into things like &#8220;<code>&amp;lt;</code>&#8221;<br />
so that it&#8217;ll turn up right on the resulting page. This, of course, is one of the<br />
things Geshi is expected to do. However, in this case it also makes it impossible<br />
to add HTML parts in the code snippet, which at times can be very useful.
</p>
<p>
So what do we do now? Well, we can use <i>escaped</i> HTML tags. Much like<br />
&#8220;<code>\n</code>&#8221; doesn&#8217;t actually mean backslash + &#8216;<code>n</code>&#8216; but a<br />
newline character, &#8220;<code>\&lt;</code>&#8221; can be used for the actual<br />
&#8216;<code>&lt;</code>&#8216;. And to <i>un</i>escape that, a double backslash can be used,<br />
much like it is in C.
</p>
<div class="none">
<div class="none proglist" style=" ">\\&lt;b\\&gt;BOLD\\&lt;/b\\&gt; &nbsp; &nbsp;becomes &nbsp; &nbsp; \&lt;b\&gt;BOLD\&lt;/b\&gt;</div>
</div>
<p>
There are several ways to implement this. One would be to modify it in the geshi<br />
code. I haven&#8217;t tried that route yet because I expect it could get messy. That&#8217;s<br />
arguably how it <i>should</i> be done, but it&#8217;s easier to do it after the fact:<br />
when all the conversions have been done. Basically, you need something like this:
</p>
<div class="php">
<div class="php proglist" style=" "><span class="co1">// Initialize geshi with the text to convert and language file to use.</span><br />
<span class="re0">$geshi</span> = <span class="kw2">new</span> GeSHi(<span class="re0">$text</span>, <span class="re0">$lang</span>, <span class="re0">$this</span>-&gt;geshi_path);</p>
<p><span class="co1">// This does the actual work.</span><br />
<span class="re0">$text</span>= <span class="re0">$geshi</span>-&gt;parse_code();</p>
<p><span class="co1">// Replace (un)escaped html entities.</span><br />
<span class="re0">$text</span>= <a href="http://www.php.net/str_replace"><span class="kw3">str_replace</span></a>(<br />
&nbsp; &nbsp; <a href="http://www.php.net/array"><span class="kw3">array</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Normal entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;\\\&amp;lt;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;gt;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;amp;&#8217;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// In-string escapes get crap added, gaddammittohell &gt;_&lt;.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;span class=&quot;es0&quot;&gt;&lt;&lt;/span&gt;&#8217;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;span class=&quot;es0&quot;&gt;&gt;&lt;/span&gt;&#8217;</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;span class=&quot;es0&quot;&gt;&amp;&lt;/span&gt;&#8217;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unescaped entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;\\\&amp;&#8217;</span>, <span class="st_h">&#8216;\\\&lt;&#8217;</span>, <span class="st_h">&#8216;\\\&gt;&#8217;</span>), <br />
&nbsp; &nbsp; <a href="http://www.php.net/array"><span class="kw3">array</span></a>(<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&gt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&amp;&#8217;</span>, &nbsp; &nbsp;&nbsp; &nbsp; <span class="co1">// Normal entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;&lt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&gt;&#8217;</span> &nbsp; &nbsp; , <span class="st_h">&#8216;&amp;&#8217;</span>, &nbsp; &nbsp;&nbsp; &nbsp; <span class="co1">// In-string entities.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st_h">&#8216;\\\&amp;amp;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;lt;&#8217;</span>, <span class="st_h">&#8216;\\\&amp;gt;&#8217;</span>&nbsp; &nbsp; <span class="co1">// Unescaped entities</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; ), <br />
&nbsp; &nbsp; <span class="re0">$text</span>);</div>
</div>
<p>
There are three sets of items to search &amp; replace here. The first two are<br />
the basic escaped tag delimiters, so that they&#8217;ll actually result in HTML tags,<br />
and unescaped delimiters, so that you can print the combination itself. The<br />
third category are for HTML in string literals. Since the backslash has a<br />
specific meaning there as well, Geshi puts some highlighting stuff around it<br />
that would make the standard search fail. So that whole thing would need<br />
to be searched for and <s>destroyed</s>replaced.
</p>
<p>
It&#8217;s ugly, I know, but it seems to work. It&#8217;d be nicer if this could be done<br />
in the parser itself, but I have a feeling that&#8217;d take changes in multiple<br />
places. Since I don&#8217;t know the code that well yet, I&#8217;m not touching that<br />
one with a ten-foot pole.
</p>
<p>
Lastly, let&#8217;s test the ARM asm highlighter:
</p>
<div class="gccarm">
<div class="gccarm proglist" style=" "><span class="co2">// Regular int</span><br />
<span class="nu0">123</span><br />
<span class="nu0">123l</span><br />
<span class="nu0">123L</span><br />
<span class="nu0">123ll</span><br />
<span class="nu0">123LL</span> &nbsp; <br />
<span class="nu0">123u</span><br />
<span class="nu0">123U</span><br />
+<span class="nu0">123</span><br />
-<span class="nu0">123</span></p>
<p><span class="co2">// Binary</span><br />
<span class="nu0">0b01100110</span><br />
<span class="nu0">0B10101010</span></p>
<p><span class="co2">// Octal</span><br />
<span class="nu0">0123</span></p>
<p><span class="co2">// Hex</span><br />
<span class="nu0">0&#215;12</span><br />
<span class="nu0">0&#215;123</span><br />
0&#215;123.4</p>
<p><span class="co2">// Float</span><br />
<span class="nu0">123.4</span><br />
<span class="nu0">123.4f</span><br />
<span class="nu0">123.4F</span><br />
+<span class="nu0">123.4</span><br />
-<span class="nu0">123.4</span><br />
<span class="nu0">1.2e3</span><br />
<span class="nu0">1.2E3</span><br />
<span class="nu0">1.2e+3</span><br />
<span class="nu0">1.2e-3</span></p>
<p><span class="co2">// Inner</span><br />
(<span class="nu0">1.23</span>)<br />
abc123de</div>
</div>
<p>
Still works too. Bitchin&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/06/new-geshi/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
</head>
<body>
<p>
My database has called in sick. Please imagine some 
annoying elevator tune till he gets back.
</p>
<p>
<small>[[Doo-di-doo tooo. Dum-di-dum-di-doo-dooo.]]</small>
</p>
</body>
</html>

-->