<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coranac &#187; tonc</title>
	<atom:link href="http://www.coranac.com/category/proj/tonc/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coranac.com</link>
	<description>my own little world</description>
	<lastBuildDate>Tue, 06 Jul 2010 21:30:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>mode 7 addendum</title>
		<link>http://www.coranac.com/2009/04/mode-7-addendum/</link>
		<comments>http://www.coranac.com/2009/04/mode-7-addendum/#comments</comments>
		<pubDate>Sun, 19 Apr 2009 16:32:53 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[tonc]]></category>
		<category><![CDATA[fix]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[mode7]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=67</guid>
		<description><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />

Okay. Apparently, I am an idiot who can&#8217;t do math.

&#160;

One of the longer chapters in Tonc is
Mode 7 part 2, which covers
pretty much all the hairy details of producing mode 7 effects on the
GBA. The money shot for in terms of code is the following functions,
which calculates the affine parameters of the background for each
scanline [...]]]></description>
			<content:encoded><![CDATA[<br />
<b>Warning</b>:  getimagesize() [<a href='function.getimagesize'>function.getimagesize</a>]: Filename cannot be empty in <b>/home/coranac/public_html/wordpress/wp-content/plugins/crnfilters.php</b> on line <b>466</b><br />
<p>
Okay. Apparently, I am an idiot who can&#8217;t do math.
</p>
<p><div>&nbsp;</div></p>
<p>
One of the longer chapters in Tonc is<br />
<a href="/tonc/text/mode7ex.htm">Mode 7 part 2</a>, which covers<br />
pretty much all the hairy details of producing mode 7 effects on the<br />
GBA. The money shot for in terms of code is the following functions,<br />
which calculates the affine parameters of the background for each<br />
scanline in section <a href="/tonc/text/mode7ex.htm#ssec-code-bg">21.7.3</a>.
</p>
<div class="cpp">
<div class="cpp proglist" style=" ">IWRAM_CODE <span class="kw1">void</span> m7_prep_affines(M7_LEVEL *level)<br />
{<br />
&nbsp; &nbsp; <span class="kw1">if</span>(level-&gt;horizon &gt;= SCREEN_HEIGHT)<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span>;</p>
<p>&nbsp; &nbsp; <span class="kw1">int</span> ii, ii0= (level-&gt;horizon&gt;=<span class="nu0">0</span> ? level-&gt;horizon : <span class="nu0">0</span>);</p>
<p>&nbsp; &nbsp; M7_CAM *cam= level-&gt;camera;<br />
&nbsp; &nbsp; FIXED xc= cam-&gt;pos.x, yc= cam-&gt;pos.y, zc=cam-&gt;pos.z;</p>
<p>&nbsp; &nbsp; BG_AFFINE *bga= &amp;level-&gt;bgaff[ii0];</p>
<p>&nbsp; &nbsp; FIXED yb, zb; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// b&#8217; = Rx(theta) * &nbsp;(L, ys, -D)</span><br />
&nbsp; &nbsp; FIXED cf, sf, ct, st; &nbsp; <span class="co1">// sines and cosines</span><br />
&nbsp; &nbsp; FIXED lam, lcf, lsf; &nbsp; &nbsp;<span class="co1">// scale and scaled (co)sine(phi)</span><br />
&nbsp; &nbsp; cf= cam-&gt;u.x; &nbsp; &nbsp; &nbsp;sf= cam-&gt;u.z;<br />
&nbsp; &nbsp; ct= cam-&gt;v.y; &nbsp; &nbsp; &nbsp;st= cam-&gt;w.y;<br />
&nbsp; &nbsp; <span class="kw1">for</span>(ii= ii0; ii&lt;SCREEN_HEIGHT; ii++)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; yb= (ii-M7_TOP)*ct + M7_D*st;<br />
&nbsp; &nbsp; &nbsp; &nbsp; lam= DivSafe( yc&lt;&lt;<span class="nu0">12</span>, &nbsp;yb); &nbsp; &nbsp; <span class="co1">// .12f &nbsp; &nbsp;&lt;- OI!!!</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; lcf= lam*cf&gt;&gt;<span class="nu0">8</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// .12f</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; lsf= lam*sf&gt;&gt;<span class="nu0">8</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// .12f</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; bga-&gt;pa= lcf&gt;&gt;<span class="nu0">4</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1">// .8f</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; bga-&gt;pc= lsf&gt;&gt;<span class="nu0">4</span>; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1">// .8f</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// lambda·Rx·b</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; zb= (ii-M7_TOP)*st &#8211; M7_D*ct; &nbsp; <span class="co1">// .8f</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; bga-&gt;dx= xc + (lcf&gt;&gt;<span class="nu0">4</span>)*M7_LEFT &#8211; (lsf*zb&gt;&gt;<span class="nu0">12</span>); &nbsp;<span class="co1">// .8f</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; bga-&gt;dy= zc + (lsf&gt;&gt;<span class="nu0">4</span>)*M7_LEFT + (lcf*zb&gt;&gt;<span class="nu0">12</span>); &nbsp;<span class="co1">// .8f</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// hack that I need for fog. pb and pd are unused anyway</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; bga-&gt;pb= lam;<br />
&nbsp; &nbsp; &nbsp; &nbsp; bga++;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; level-&gt;bgaff[SCREEN_HEIGHT]= level-&gt;bgaff[<span class="nu0">0</span>];<br />
}</div>
</div>
<p>
For details on what all the terms mean, go the page in question.<br />
For now, just note that call to <code>DivSafe()</code> to calculate<br />
the scaling factor &lambda; and recall that division on the GBA is<br />
pretty slow. In <a href="/tonc/text/mode7.htm">Mode 7 part 1</a>,<br />
I used a LUT, but here I figured that since the <code>yb</code> term<br />
can be anything thanks to the pitch you can&#8217;t do that. After helping<br />
Ruben with his mode 7 demo, it turns out that you can.
</p>
<p><div>&nbsp;</div></p>
<p><div class="cpt" style="width:px;">
  <a href="" target="_blank">  <img src="" id="img-crd-c2p2"
    alt="" width="" /></a><br />
  <b>Fig&nbsp;1. </b>
</div>
</p>
<p>
Fig&nbsp;1 shows the situation. There is a camera<br />
(the black triangle) that is tilted down by pitch angle &theta;. I&#8217;ve<br />
put the origin at the back of the camera because it makes things<br />
easier to read. The<br />
front of the camera is the projection plane, which is essentially<br />
the screen. A ray is cast from the back of the camera on to the floor<br />
and this ray intersects the projection plane. The coordinates<br />
of this point are <b>x</b><sub>p</sub> =<br />
(<i>y</i><sub>p</sub>, <i>D</i>) in projection plane space, which<br />
corresponds to point (<i>y</i><sub>b</sub>, <i>z</i><sub>b</sub>) in<br />
world space. This is simply rotating point <b>x</b><sub>p</sub> by<br />
&theta;. The scaling factor is the ratio between the <i>y</i> or<br />
<i>z</i> coordinates of the points on the floor and on the projection<br />
plane, so that&#8217;s:
</p>
<p><table class="eqtbl">
<tr>
<td class="eqnrcell"></td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?%5Clambda%20%3D%20y_c%20%2F%20y_b%2C'<br />
	title="\lambda = y_c / y_b,"<br />
	alt="\lambda = y_c / y_b," /><br />
</td>
</tr>
</table></p>
<p>
and for <i>y</i><sub>b</sub> the rotation gives us:
</p>
<p><table class="eqtbl">
<tr>
<td class="eqnrcell"></td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?y_b%20%3D%20y_p%20cos%20%5Ctheta%20%2B%20D%20sin%20%5Ctheta%2C'<br />
	title="y_b = y_p cos \theta + D sin \theta,"<br />
	alt="y_b = y_p cos \theta + D sin \theta," /><br />
</td>
</tr>
</table></p>
<p>
where <i>y</i><sub>c</sub> is the camera height,<br />
<i>y</i><sub>p</sub> is a scanline offset (measured from the center of the screen) and <i>D</i> is the focus<br />
length.
</p>
<p>
Now, the point is that while <i>y</i><sub>b</sub> is variable<br />
and non-integral when &theta; &ne; 0, it is still bounded! What&#8217;s more,<br />
you can easily calculate its maximum value, since it&#8217;s simply the<br />
maximum length of <b>x</b><sub>p</sub>. Calling this factor <i>R</i>,<br />
we get:
</p>
<p><table class="eqtbl">
<tr>
<td class="eqnrcell"></td>
  <td class="eqcell"><br />
<img src='http://www.coranac.com/cgi-bin/mimetex.cgi?R%20%3D%20%5Csqrt%7Bmax%28y_p%29%5E2%20%2B%20D%5E2%7D'<br />
	title="R = \sqrt{max(y_p)^2 + D^2}"<br />
	alt="R = \sqrt{max(y_p)^2 + D^2}" /><br />
</td>
</tr>
</table></p>
<p>
This factor <i>R</i>, rounded up, is the size of the required LUT.<br />
In my particular case, I&#8217;ve used <i>y</i><sub>p</sub>= scanline&minus;80<br />
and <i>D</i> = 256, which gives<br />
<i>R</i>&nbsp;=&nbsp;sqrt((160&minus;80)&sup2;&nbsp;+&nbsp;256&sup2;)<br />
= 268.2. In other words, I need a division LUT with 269 entries. Using .16<br />
fixed point numbers for this LUT, the replacement code is essentially:
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">// The new division LUT. For 1/0 and 1/1, 0xFFFF is used.</span><br />
u16 m7_div_lut[<span class="nu0">270</span>]= <br />
{<br />
&nbsp; &nbsp; <span class="nu0">0xFFFF</span>, <span class="nu0">0xFFFF</span>, <span class="nu0">0&#215;8000</span>, <span class="nu0">0&#215;5556</span>, &#8230;<br />
};</p>
<p>
<span class="co1">// Inside the function</span><br />
&nbsp; &nbsp; <span class="kw1">for</span>(ii= ii0; ii&lt;SCREEN_HEIGHT; ii++)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; yb= (ii-M7_TOP)*ct + M7_D*st; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// .8</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; lam= (yc*m7_div_lut[yb&gt;&gt;<span class="nu0">8</span>])&gt;&gt;<span class="nu0">12</span>;&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// .8*.16/.12 = .12</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &#8230; <span class="co1">// business as usual</span><br />
&nbsp; &nbsp; }</div>
</div>
<p>
At this point, several questions may arise.
</p>
<ul>
<li>
    <b>What about negative <i>y</i><sub>b</sub>?</b> The beauty here<br />
    is that while <i>y</i><sub>b</sub> may be negative in principle,<br />
    such values would correspond to lines above the horizon and we don&#8217;t<br />
    calculate those anyway.
  </li>
<li>
    <b>Won&#8217;t non-integral <i>y</i><sub>b</sub> cause inaccurate look-ups?</b><br />
    True, <i>y</i><sub>b</sub> will have a fractional part that<br />
    is simply cut off during a simple look-up and some sort of<br />
    interpolation would be better. However, in testing there were no<br />
    noticeable differences between direct look-up, lerped look-up or<br />
    using <code>Div()</code>, so the simplest method suffices.
  </li>
<li>
    <b>Are .16 fixed point numbers enough?</b>. Yes, apparently so.
  </li>
<li>
    <b>ZOMG OVERFLOW! Are .16 fixed point numbers too high?</b><br />
    Technically, yes, there is a risk of overflow when the camera height<br />
    gets too high. However, at high altitudes the map is going to look<br />
    like crap anyway due to the low resolution of the screen.<br />
    Furthermore, the hardware only uses 8.8 fixeds, so scales above<br />
    256.0 wouldn&#8217;t work anyway.
  </li>
</ul>
<p>
And finally:
</p>
<ul>
<li>
  <b>What do I win?</b><br />
  With <code>Div()</code> <code>m7_prep_affines()</code> takes<br />
  about 51k cycles. With the direct look-up this reduces to about 13k:<br />
  a speed increase by a factor of 4.
  </li>
</ul>
<p><div>&nbsp;</div></p>
<p>
So yeah, this is what I <i>should</i> have figured out years ago, but<br />
somehow kept overlooking it. I&#8217;m not sure if I&#8217;ll add this whole thing to<br />
Tonc&#8217;s text and code, but I&#8217;ll at least put up a link to here. Thanks<br />
Ruben, for showing me how to do this properly.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2009/04/mode-7-addendum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>tonc 1.4 official release</title>
		<link>http://www.coranac.com/2008/08/tonc-14-official-release/</link>
		<comments>http://www.coranac.com/2008/08/tonc-14-official-release/#comments</comments>
		<pubDate>Tue, 19 Aug 2008 13:05:24 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[gba]]></category>
		<category><![CDATA[tonc 1.4]]></category>
		<category><![CDATA[update]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=54</guid>
		<description><![CDATA[
The files have need downloadable for a while now as a preview, but I finally put the text up on the main site as well so I guess that makes it official. Tonc is now at version 1.4. As mentioned before, the main new thing is TTE, a system for text for all occasions. I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>
The files have need downloadable for a while now as a preview, but I finally put the text up on the <a href="http://www.coranac.com/tonc">main site</a> as well so I guess that makes it official. Tonc is now at version 1.4. As mentioned before, the main new thing is <a href="http://www.coranac.com/tonc/text/tte.htm">TTE</a>, a system for text for all occasions. I&#8217;ve also used grit in some of the advanced demos, so if you want to see how you can do advanced work with it, check out the mode 7 demos and the tte demo.
</p>
<p>
This will be the last version of Tonc. It&#8217;s really gone on long enough now.
</p>
<p></p>
<p>Files and linkies :</p>
<ul>
<li>Main site : <a href="http://www.coranac.com/tonc">www.coranac.com/tonc</a>
  </li>
<li>
    Tonclib manual : <a href="http://www.coranac.com/man/tonclib/">www.coranac.com/man/tonclib/</a>
  </li>
</ul>
<ul>
<li>Example binaries : <a href="/files/tonc-bin.zip">tonc-bin.zip</a> (521k)
  </li>
<li>Example code + tonclib : <a href="/files/tonc-code.zip">tonc-code.zip</a> (1.1M)
  </li>
<li>Text + images : <a href="/files/tonc-text.zip">tonc-text.zip</a> (1.4M)
  </li>
<li>Text in CHM : <a href="/files/tonc.chm">tonc.chm</a> (1.7M)
  </li>
<li>Text in PDF : <a href="/files/tonc.pdf">tonc.pdf</a> (6.9M. No, I don&#8217;t know where the extra 3 MB came from either.)
  </li>
</ul>
<p></p>
<p>
Right! Now what &hellip;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2008/08/tonc-14-official-release/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>tonc 1.4 preview</title>
		<link>http://www.coranac.com/2008/05/tonc-14-preview/</link>
		<comments>http://www.coranac.com/2008/05/tonc-14-preview/#comments</comments>
		<pubDate>Mon, 26 May 2008 20:54:59 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>
		<category><![CDATA[tte]]></category>

		<guid isPermaLink="false">http://www.coranac.com/?p=49</guid>
		<description><![CDATA[
I&#8217;m close to releasing the latest (and probably last; this really has gone on long enough) version of Tonc. As a preview, I&#8217;m releasing the PDF a little early in the hope that someone may take a look and offer some feedback before the official release (aw, c&#8217;mon, it&#8217;s only 400 pages).


The changes mostly relate [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;m close to releasing the latest (and probably last; this really has gone on long enough) version of Tonc. As a preview, I&#8217;m releasing the PDF a little early in the hope that someone may take a look and offer some feedback before the official release (aw, c&#8217;mon, it&#8217;s only 400 pages).
</p>
<p>
The changes mostly relate to the new<br />
<a href="http://www.coranac.com/man/tonclib/">Tonc Text Engine</a>, a text system for all occasions. There&#8217;s a new chapter describing how TTE works, how to write general character printers for (almost) for arbitrary sized fonts and every type of graphics, and a few other things. It&#8217;s fairly long and could use sanity checking from someone else.
</p>
<p>
Also, many of the older demos now use TTE for their text as well. As a result they look cleaner and prettier, but it&#8217;s possible there are some left-overs from older versions. So have at it.
</p>
<ul>
<li><a href="http://www.coranac.com/files/tonc/tonc-1.4.pdf">Tonc 1.4 preview (PDF, 4.3 MB)</a>.</li>
<li><a href="http://www.coranac.com/files/misc/tte.pdf">TTE chapter only (PDF, 584 kB)</a>.</li>
<li>All relevant 1.4 files: <a href="http://www.coranac.com/files/tonc/">here</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2008/05/tonc-14-preview/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Surface drawing routines.</title>
		<link>http://www.coranac.com/2008/05/surface-drawing-routines/</link>
		<comments>http://www.coranac.com/2008/05/surface-drawing-routines/#comments</comments>
		<pubDate>Wed, 14 May 2008 16:19:28 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[tonc]]></category>
		<category><![CDATA[gba]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[surface]]></category>

		<guid isPermaLink="false">http://www.coranac.com/2008/05/14/48/</guid>
		<description><![CDATA[
I&#8217;ve been building a basic interface for dealing with graphic surfaces lately. I
already had most of the routines for 16bpp and 8bpp bitmaps in older Toncs, but
but their use was still somewhat awkward because you had to provide some
details of the destination manually; most notably a base pointer and the pitch.
This got more than a [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ve been building a basic interface for dealing with graphic surfaces lately. I<br />
already had most of the routines for 16bpp and 8bpp bitmaps in older Toncs, but<br />
but their use was still somewhat awkward because you had to provide some<br />
details of the destination manually; most notably a base pointer and the pitch.<br />
This got more than a little annoying, especially when trying to make blitters as<br />
well. So I made some changes.
</p>
<p></p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">typedef</span> <span class="kw1">struct</span> TSurface<br />
{<br />
&nbsp; &nbsp; u8&nbsp; *data;&nbsp; &nbsp; &nbsp; <span class="co1">//!&lt; Surface data pointer.</span><br />
&nbsp; &nbsp; u32 pitch;&nbsp; &nbsp; &nbsp; <span class="co1">//!&lt; Scanline pitch in bytes (PONDER: alignment?).</span><br />
&nbsp; &nbsp; u16 width;&nbsp; &nbsp; &nbsp; <span class="co1">//!&lt; Image width in pixels. </span><br />
&nbsp; &nbsp; u16 height; &nbsp; &nbsp; <span class="co1">//!&lt; Image width in pixels.</span><br />
&nbsp; &nbsp; u8&nbsp; bpp;&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//!&lt; Bits per pixel.</span><br />
&nbsp; &nbsp; u8&nbsp; type; &nbsp; &nbsp; &nbsp; <span class="co1">//!&lt; Surface type (not used that much).</span><br />
&nbsp; &nbsp; u16 palSize;&nbsp; &nbsp; <span class="co1">//!&lt; Number of colors.</span><br />
&nbsp; &nbsp; u16 *palData; &nbsp; <span class="co1">//!&lt; Pointer to palette.</span><br />
} TSurface;</div>
</div>
<p>
I&#8217;ve rebuilt the routines around a surface description struct called<br />
<code>TSurface</code> (see above). This way, I can just initialize the<br />
surface somewhere and just pass the pointer to that surface around.<br />
There are a number of different kinds of surfaces. The most important<br />
ones are these three:
</p>
<ul>
<li><b>bmp16</b>. 16bpp bitmap surfaces.</li>
<li><b>bmp8</b>. 8bpp bitmap surfaces.</li>
<li><b>chr4c</b>. 4bpp tiled surfaces, in column-major order (i.e., tile 1 is<br />
<i>under</i> tile 0 instead of to the right). Column-major order may seem<br />
strange, but it actually simplifies the code considerably. There is also a<br />
<code>chr4<b>r</b></code> mode for normal, row-major tiling, but that&#8217;s unfinished<br />
and will probably remain so.
</ul>
<div class="cptfr" style="width:240px;">
  <img src="http://www.coranac.com/img/post/surface.gif"<br />
    alt="surface.gba movie" /><br />
  Demonstrating surface routines for 4bpp tiles.
</div>
<p>
For each of these three, I have the most important rendering functions:<br />
plotting pixels, lines, rectangles and <b>blits</b>. Yes, blits too. Even for<br />
<code>chr4c</code>-mode. There are routines for frames (empty<br />
rectangles) and floodfill as well. The functions have a uniform interface<br />
with respect to surface-type, so switching between them should be<br />
easy were it necessary. There are also tables with function pointers<br />
to these routines, so by using those you need not really care about<br />
the details of the surface after its creation. I&#8217;ll probably add a<br />
pointer to such a table in <code>TSurface</code> in the future.
</p>
<p></p>
<p>Linkies</p>
<ul>
<li>Demo project: <a href="http://www.coranac.com/files/misc/surface.zip">surface.zip</a>.</li>
<li>Tonclib: <a href="http://www.coranac.com/files/misc/tonclib-20080514.zip">tonclib</a>. </li>
<li><a href="http://www.coranac.com/man/tonclib/group__grpSurface.htm">Tonclib manual, TSurface module</a>.</li>
</ul>
<p>
<p>The image on the right is the result of the following routine.<br />
Turret pic semi-knowingly provided by<br />
<a href="http://helmetedrodent.kickassgamers.com/Pika/blog/">Kawa</a>.
</p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="kw1">void</span> test_surface_procs(<span class="kw1">const</span> TSurface *src, TSurface *dst, <br />
&nbsp; &nbsp; <span class="kw1">const</span> TSurfaceProcTab *procs, u16 colors[])<br />
{<br />
&nbsp; &nbsp; <span class="co1">// Init object text</span><br />
&nbsp; &nbsp; tte_init_obj(&amp;oam_mem[<span class="nu0">127</span>], ATTR0_TALL, ATTR1_SIZE_8, <span class="nu0">512</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; CLR_YELLOW, <span class="nu0">0</span>, &amp;vwf_default, <span class="kw2">NULL</span>);<br />
&nbsp; &nbsp; tte_init_con();<br />
&nbsp; &nbsp; tte_set_margins(<span class="nu0">8</span>, <span class="nu0">140</span>, <span class="nu0">160</span>, <span class="nu0">152</span>);</p>
<p>&nbsp; &nbsp; <span class="co1">// And go!</span><br />
&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{es;P}%s surface primitives#{w:60}&quot;</span>, procs-&gt;name);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{es;P}Rect#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;rect(dst, <span class="nu0">20</span>, <span class="nu0">20</span>, <span class="nu0">100</span>, <span class="nu0">100</span>, colors[<span class="nu0">0</span>]);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Frame#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;frame(dst, <span class="nu0">21</span>, <span class="nu0">21</span>, <span class="nu0">99</span>, <span class="nu0">99</span>, colors[<span class="nu0">1</span>]);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Hlines#{w:20}&quot;</span>);</p>
<p>&nbsp; &nbsp; procs-&gt;hline(dst, <span class="nu0">23</span>, <span class="nu0">23</span>, <span class="nu0">96</span>, colors[<span class="nu0">2</span>]);<br />
&nbsp; &nbsp; procs-&gt;hline(dst, <span class="nu0">23</span>, <span class="nu0">96</span>, <span class="nu0">96</span>, colors[<span class="nu0">2</span>]);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Vlines#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;vline(dst, <span class="nu0">23</span>, <span class="nu0">25</span>, <span class="nu0">94</span>, colors[<span class="nu0">3</span>]);<br />
&nbsp; &nbsp; procs-&gt;vline(dst, <span class="nu0">96</span>, <span class="nu0">25</span>, <span class="nu0">94</span>, colors[<span class="nu0">3</span>]);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Lines#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;line(dst, <span class="nu0">25</span>, <span class="nu0">25</span>, <span class="nu0">94</span>, <span class="nu0">40</span>, colors[<span class="nu0">4</span>]);<br />
&nbsp; &nbsp; procs-&gt;line(dst, <span class="nu0">94</span>, <span class="nu0">25</span>, <span class="nu0">79</span>, <span class="nu0">94</span>, colors[<span class="nu0">4</span>]);<br />
&nbsp; &nbsp; procs-&gt;line(dst, <span class="nu0">94</span>, <span class="nu0">94</span>, <span class="nu0">25</span>, <span class="nu0">79</span>, colors[<span class="nu0">4</span>]);<br />
&nbsp; &nbsp; procs-&gt;line(dst, <span class="nu0">25</span>, <span class="nu0">94</span>, <span class="nu0">40</span>, <span class="nu0">25</span>, colors[<span class="nu0">4</span>]);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Full blit#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;blit(dst, <span class="nu0">120</span>, <span class="nu0">16</span>, src-&gt;width, src-&gt;height, src, <span class="nu0">0</span>, <span class="nu0">0</span>);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Partial blit#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;blit(dst, <span class="nu0">40</span>, <span class="nu0">40</span>, <span class="nu0">40</span>, <span class="nu0">40</span>, src, <span class="nu0">12</span>, <span class="nu0">8</span>);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Floodfill#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;flood(dst, <span class="nu0">40</span>, <span class="nu0">32</span>, colors[<span class="nu0">5</span>]);<br />
&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P}Again !#{w:20}&quot;</span>);<br />
&nbsp; &nbsp; procs-&gt;flood(dst, <span class="nu0">40</span>, <span class="nu0">32</span>, colors[<span class="nu0">6</span>]);</p>
<p>&nbsp; &nbsp; tte_printf(<span class="st0">&quot;#{w:30;es;P;w:30}Ta-dah!!!#{w:20}&quot;</span>);</p>
<p>&nbsp; &nbsp; key_wait_till_hit(KEY_ANY);<br />
}</p>
<p><span class="co1">// Test 4bpp tiled, column-major surfaces</span><br />
<span class="kw1">void</span> test_chr4c_procs()<br />
{<br />
&nbsp; &nbsp; TSurface turret, dst;</p>
<p>&nbsp; &nbsp; <span class="co1">// Init turret for blitting.</span><br />
&nbsp; &nbsp; srf_init(&amp;turret, SRF_CHR4C, turretChr4cTiles, <span class="nu0">128</span>, <span class="nu0">128</span>, <span class="nu0">4</span>, <span class="kw2">NULL</span>);</p>
<p>&nbsp; &nbsp; <span class="co1">// Init destination surface</span><br />
&nbsp; &nbsp; srf_init(&amp;dst, SRF_CHR4C, tile_mem[<span class="nu0">0</span>], <span class="nu0">240</span>, <span class="nu0">160</span>, <span class="nu0">4</span>, pal_bg_mem);<br />
&nbsp; &nbsp; schr4c_prep_map(&amp;dst, se_mem[<span class="nu0">31</span>], <span class="nu0">0</span>);<br />
&nbsp; &nbsp; GRIT_CPY(pal_bg_mem, turretChr4cPal);</p>
<p>&nbsp; &nbsp; <span class="co1">// Set video stuff</span><br />
&nbsp; &nbsp; REG_DISPCNT= DCNT_MODE0 | DCNT_BG2 | DCNT_OBJ | DCNT_OBJ_1D;<br />
&nbsp; &nbsp; REG_BG2CNT= BG_CBB(<span class="nu0">0</span>)|BG_SBB(<span class="nu0">31</span>);</p>
<p>&nbsp; &nbsp; u16 colors[<span class="nu0">8</span>]= { <span class="nu0">6</span>, <span class="nu0">13</span>, <span class="nu0">1</span>, <span class="nu0">14</span>, <span class="nu0">15</span>, <span class="nu0">0</span>, <span class="nu0">14</span>, <span class="nu0">0</span> };</p>
<p>&nbsp; &nbsp; <span class="co1">// Run internal tester</span><br />
&nbsp; &nbsp; test_surface_procs(&amp;turret, &amp;dst, &amp;chr4c_tab, colors);<br />
}</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2008/05/surface-drawing-routines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Artsy fartsy</title>
		<link>http://www.coranac.com/2008/04/artsy-fartsy/</link>
		<comments>http://www.coranac.com/2008/04/artsy-fartsy/#comments</comments>
		<pubDate>Thu, 10 Apr 2008 21:20:22 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>

		<guid isPermaLink="false">http://www.coranac.com/2008/04/10/artsy-fartsy/</guid>
		<description><![CDATA[
I&#8217;ve been working on a few functions for rendering onto tiles recently. Yesterday was the turn of a rectangle filler. The traditional routine of double-looping over a pixel-plotter would be slow in every case, but for tiled surfaces it&#8217;s positively evil, so I made something that divides the rectangle in 5 areas and fills them [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ve been working on a few functions for rendering onto tiles recently. Yesterday was the turn of a rectangle filler. The traditional routine of double-looping over a pixel-plotter would be slow in every case, but for tiled surfaces it&#8217;s positively evil, so I made something that divides the rectangle in 5 areas and fills them using by words or better.  Yes, this is a little tricky but I figured the speed increase of up to 300 would be worth it.
</p>
<p>
For testing purposes, I filled each region with a different color so that <s>if</s>when something went wrong, I could easily identify the problem. When playing around with the test app, I more or less accidentally came up with this:
</p>
<div class=cpt style="width:240px;">
  <img src="/img/post/mondriaan.png" alt="accidental mondriaan" />
</div>
<p>
Hmmm &#8230; <a href="http://images.google.nl/images?q=mondriaan">Mondriaany</a>.
</p>
<p>
Anyway, it seems that this thing went alright. So now <a href="/files/misc/tonclib20080409.zip">tonclib</a> also has plot, hline, vline, line, rect and frame functions for 4bpp tiled modes. No, there&#8217;s no blitting yet. In anyone wants that, I&#8217;m going to insist on some mental hazard pay.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2008/04/artsy-fartsy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tonc:setup update</title>
		<link>http://www.coranac.com/2008/02/toncsetup-update/</link>
		<comments>http://www.coranac.com/2008/02/toncsetup-update/#comments</comments>
		<pubDate>Sun, 17 Feb 2008 10:04:44 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>

		<guid isPermaLink="false">http://www.coranac.com/2008/02/17/toncsetup-update/</guid>
		<description><![CDATA[Finally got round to updating Tonc&#8217;s dev setup page.
It finally mentions devkitPro&#8217;s template makefiles and the basics of how to
use them. I&#8217;ve also added a
list of potential problems
you may encounter when installing/upgrading devkitARM or just building projects. I have not updated the downloadables yet because there&#8217;s still a few unfinished edits there. I just wanted [...]]]></description>
			<content:encoded><![CDATA[<p>Finally got round to updating Tonc&#8217;s <a href="http://www.coranac.com/tonc/text/setup.htm">dev setup page</a>.<br />
It finally mentions devkitPro&#8217;s template makefiles and the basics of how to<br />
use them. I&#8217;ve also added a<br />
<a href="http://www.coranac.com/tonc/text/setup.htm#ssec-dkp-error">list of potential problems</a><br />
you may encounter when installing/upgrading devkitARM or just building projects. I have not updated the downloadables yet because there&#8217;s still a few unfinished edits there. I just wanted to get this one out of the way because it&#8217;s so very, very overdue.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2008/02/toncsetup-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>memcpy and memset replacements for GBA/NDS</title>
		<link>http://www.coranac.com/2008/01/tonccpy/</link>
		<comments>http://www.coranac.com/2008/01/tonccpy/#comments</comments>
		<pubDate>Fri, 25 Jan 2008 14:54:57 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[tonc]]></category>

		<guid isPermaLink="false">http://www.coranac.com/2008/01/25/tonccpy/</guid>
		<description><![CDATA[
The standard C functions for copying and filling are memcpy()
and memset(). They&#8217;re part of the standard library, are easy
to use and are often implemented with some optimizations so that they&#8217;re
usually faster than manual looping. The DKA version, for example will fill as
words if the alignments and sizes allow for it. This can be much
faster than [...]]]></description>
			<content:encoded><![CDATA[<p>
The standard C functions for copying and filling are <code>memcpy()</code><br />
and <code>memset()</code>. They&#8217;re part of the standard library, are easy<br />
to use and are often implemented with some optimizations so that they&#8217;re<br />
usually faster than manual looping. The DKA version, for example will fill as<br />
words if the alignments and sizes allow for it. This can be <a href="http://www.coranac.com/tonc/text/text.htm#tbl-txt-se2"><i>much</i><br />
faster than doing the loops yourself</a>.
</p>
<p>
There is, however, one small annoying fact about these two: they&#8217;re not<br />
VRAM-safe. If the alignment and size aren&#8217;t right for the word transfers,<br />
they will transfer bytes. Not only will this be slow, of course, but because<br />
you can&#8217;t write to VRAM in bytes, the data will be corrupted.
</p>
<p>
The solutions for this have mostly come down to &ldquo;so don&#8217;t do that<br />
then&rdquo;. Often, this can be sufficient: tiles in VRAM are word-aligned by definition, and source graphics data can and should be word-aligned anyway. However, now that<br />
I&#8217;m finally working on a bitmap blitter for 8bpp and 16bpp, I find that it&#8217;s<br />
simply not enough. So I wrote the following set of functions to serve as<br />
replacements.
</p>
<h4>The code</h4>
<p>
My main goal here was to create smallish and portable replacements, not to<br />
have the greatest and fastestest code around because that&#8217;s rather platform<br />
dependent. Yes, even the difference between GBA and NDS should matter,<br />
because of the differences in <code>ldr/str</code> times and caching.
</p>
<p>
There are 5 functions here. The main functions here are<br />
<code>tonccpy</code> and <code>__toncset</code> for copying and<br />
filling words, respectively. The other 3 are interfaces for <code>__toncset</code><br />
for filling 8-bit, 16-bit and 32-bit data; you need these for, say, filling with a color<br />
instead of 8-bit data. For the rest of the discussion, I will use the name<br />
&ldquo;toncset&rdquo; for the internal routine for convenience.
</p>
<p><span id="more-38"></span></p>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">//# Stuff you may not have yet.</span><br />
<span class="kw1">typedef</span> <span class="kw1">unsigned</span> <span class="kw1">int</span> uint;<br />
<span class="kw1">#define</span> BIT_MASK(len) &nbsp; &nbsp; &nbsp; ( (<span class="nu0">1</span>&lt;&lt;(len))-<span class="nu0">1</span> )<br />
<span class="kw1">static</span> <span class="kw1">inline</span> u32 quad8(u8 x) &nbsp; { &nbsp; x |= x&lt;&lt;<span class="nu0">8</span>; <span class="kw1">return</span> x | x&lt;&lt;<span class="nu0">16</span>;&nbsp; &nbsp; }</p>
<p>
<span class="co1">//# Declarations and inlines.</span></p>
<p><span class="kw1">void</span> tonccpy(<span class="kw1">void</span> *dst, <span class="kw1">const</span> <span class="kw1">void</span> *src, uint size);</p>
<p><span class="kw1">void</span> __toncset(<span class="kw1">void</span> *dst, u32 fill, uint size);<br />
<span class="kw1">static</span> <span class="kw1">inline</span> <span class="kw1">void</span> toncset(<span class="kw1">void</span> *dst, u8 src, uint size);<br />
<span class="kw1">static</span> <span class="kw1">inline</span> <span class="kw1">void</span> toncset16(<span class="kw1">void</span> *dst, u16 src, uint size);<br />
<span class="kw1">static</span> <span class="kw1">inline</span> <span class="kw1">void</span> toncset32(<span class="kw1">void</span> *dst, u32 src, uint size);</p>
<p>
<span class="co1">//! VRAM-safe memset, byte version. Size in bytes.</span><br />
<span class="kw1">static</span> <span class="kw1">inline</span> <span class="kw1">void</span> toncset(<span class="kw1">void</span> *dst, u8 src, uint size)<br />
{ &nbsp; __toncset(dst, quad8(src), size); &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</p>
<p><span class="co1">//! VRAM-safe memset, halfword version. Size in hwords.</span><br />
<span class="kw1">static</span> <span class="kw1">inline</span> <span class="kw1">void</span> toncset16(<span class="kw1">void</span> *dst, u16 src, uint size)<br />
{ &nbsp; __toncset(dst, src|src&lt;&lt;<span class="nu0">16</span>, size*<span class="nu0">2</span>);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</p>
<p><span class="co1">//! VRAM-safe memset, word version. Size in words.</span><br />
<span class="kw1">static</span> <span class="kw1">inline</span> <span class="kw1">void</span> toncset32(<span class="kw1">void</span> *dst, u32 src, uint size)<br />
{ &nbsp; __toncset(dst, src, size*<span class="nu0">4</span>);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }</div>
</div>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">//# tonccpy.c</span></p>
<p><span class="co1">//! VRAM-safe cpy.</span><br />
<span class="coMULTI">/*! This version mimics memcpy in functionality, with <br />
&nbsp; &nbsp; the benefit of working for VRAM as well. It is also <br />
&nbsp; &nbsp; slightly faster than the original memcpy, but faster <br />
&nbsp; &nbsp; implementations can be made.<br />
&nbsp; &nbsp; \param dst&nbsp; Destination pointer.<br />
&nbsp; &nbsp; \param src&nbsp; Source pointer.<br />
&nbsp; &nbsp; \param size Fill-length in bytes.<br />
&nbsp; &nbsp; \note &nbsp; The pointers and size need not be word-aligned.<br />
*/</span><br />
<span class="kw1">void</span> tonccpy(<span class="kw1">void</span> *dst, <span class="kw1">const</span> <span class="kw1">void</span> *src, uint size)<br />
{<br />
&nbsp; &nbsp; <span class="kw1">if</span>(size==<span class="nu0">0</span> || dst==<span class="kw2">NULL</span> || src==<span class="kw2">NULL</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span>;</p>
<p>&nbsp; &nbsp; uint count;<br />
&nbsp; &nbsp; u16 *dst16; &nbsp; &nbsp; <span class="co1">// hword destination</span><br />
&nbsp; &nbsp; u8 &nbsp;*src8;&nbsp; &nbsp; &nbsp; <span class="co1">// byte source</span></p>
<p>&nbsp; &nbsp; <span class="co1">// Ideal case: copy by 4x words. Leaves tail for later.</span><br />
&nbsp; &nbsp; <span class="kw1">if</span>( ((u32)src|(u32)dst)%<span class="nu0">4</span>==<span class="nu0">0</span> &amp;&amp; size&gt;=<span class="nu0">4</span>)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; u32 *src32= (u32*)src, *dst32= (u32*)dst;</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; count= size/<span class="nu0">4</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; uint tmp= count&amp;<span class="nu0">3</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; count /= <span class="nu0">4</span>;</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Duff, bitch!</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">switch</span>(tmp) {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">do</span> {&nbsp; &nbsp; *dst32++ = *src32++;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">3</span>: &nbsp; &nbsp; *dst32++ = *src32++;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">2</span>: &nbsp; &nbsp; *dst32++ = *src32++;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">1</span>: &nbsp; &nbsp; *dst32++ = *src32++;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">0</span>: &nbsp; &nbsp; ; } <span class="kw1">while</span>(count&#8211;);<br />
&nbsp; &nbsp; &nbsp; &nbsp; }</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Check for tail</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; size &amp;= <span class="nu0">3</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(size == <span class="nu0">0</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span>;</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; src8= (u8*)src32;<br />
&nbsp; &nbsp; &nbsp; &nbsp; dst16= (u16*)dst32;<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; <span class="kw1">else</span>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Unaligned.</span><br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; uint dstOfs= (u32)dst&amp;<span class="nu0">1</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; src8= (u8*)src;<br />
&nbsp; &nbsp; &nbsp; &nbsp; dst16= (u16*)(dst-dstOfs);</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Head: 1 byte.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(dstOfs != <span class="nu0">0</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; *dst16= (*dst16 &amp; <span class="nu0">0xFF</span>) | *src8++&lt;&lt;<span class="nu0">8</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; dst16++;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(&#8211;size==<span class="nu0">0</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; }</p>
<p>&nbsp; &nbsp; <span class="co1">// Unaligned main: copy by 2x byte.</span><br />
&nbsp; &nbsp; count= size/<span class="nu0">2</span>;<br />
&nbsp; &nbsp; <span class="kw1">while</span>(count&#8211;)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; *dst16++ = src8[<span class="nu0">0</span>] | src8[<span class="nu0">1</span>]&lt;&lt;<span class="nu0">8</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; src8 += <span class="nu0">2</span>;<br />
&nbsp; &nbsp; }</p>
<p>&nbsp; &nbsp; <span class="co1">// Tail: 1 byte.</span><br />
&nbsp; &nbsp; <span class="kw1">if</span>(size&amp;<span class="nu0">1</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; *dst16= (*dst16 &amp;~ <span class="nu0">0xFF</span>) | *src8;<br />
}</div>
</div>
<div class="cpp">
<div class="cpp proglist" style=" "><span class="co1">//# toncset.c</span></p>
<p><span class="co1">//! VRAM-safe memset, internal routine.</span><br />
<span class="coMULTI">/*! This version mimics memset in functionality, with <br />
&nbsp; &nbsp; the benefit of working for VRAM as well. It is also <br />
&nbsp; &nbsp; slightly faster than the original memset.<br />
&nbsp; &nbsp; \param dst&nbsp; Destination pointer.<br />
&nbsp; &nbsp; \param fill Word to fill with.<br />
&nbsp; &nbsp; \param size Fill-length in bytes.<br />
&nbsp; &nbsp; \note &nbsp; The \a dst pointer and \a size need not be <br />
&nbsp; &nbsp; &nbsp; &nbsp; word-aligned. In the case of unaligned fills, \a fill <br />
&nbsp; &nbsp; &nbsp; &nbsp; will be masked off to match the situation.<br />
*/</span><br />
<span class="kw1">void</span> __toncset(<span class="kw1">void</span> *dst, u32 fill, uint size)<br />
{<br />
&nbsp; &nbsp; <span class="kw1">if</span>(size==<span class="nu0">0</span> || dst==<span class="kw2">NULL</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span>;</p>
<p>&nbsp; &nbsp; uint left= (u32)dst&amp;<span class="nu0">3</span>;<br />
&nbsp; &nbsp; u32 *dst32= (u32*)(dst-left);<br />
&nbsp; &nbsp; u32 count, mask;</p>
<p>&nbsp; &nbsp; <span class="co1">// Unaligned head.</span><br />
&nbsp; &nbsp; <span class="kw1">if</span>(left != <span class="nu0">0</span>)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">// Adjust for very small stint.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span>(left+size&lt;<span class="nu0">4</span>)<br />
&nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mask= BIT_MASK(size*<span class="nu0">8</span>)&lt;&lt;(left*<span class="nu0">8</span>); &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; *dst32= (*dst32 &amp;~ mask) | (fill &amp; mask);<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; }</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; mask= BIT_MASK(left*<span class="nu0">8</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; *dst32= (*dst32 &amp; mask) | (fill&amp;~mask);<br />
&nbsp; &nbsp; &nbsp; &nbsp; dst32++;<br />
&nbsp; &nbsp; &nbsp; &nbsp; size -= <span class="nu0">4</span>-left;<br />
&nbsp; &nbsp; }</p>
<p>&nbsp; &nbsp; <span class="co1">// Main stint.</span><br />
&nbsp; &nbsp; count= size/<span class="nu0">4</span>;<br />
&nbsp; &nbsp; uint tmp= count&amp;<span class="nu0">3</span>;<br />
&nbsp; &nbsp; count /= <span class="nu0">4</span>;</p>
<p>&nbsp; &nbsp; <span class="kw1">switch</span>(tmp) {<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">do</span> {&nbsp; &nbsp; *dst32++ = fill;<br />
&nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">3</span>: &nbsp; &nbsp; *dst32++ = fill;<br />
&nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">2</span>: &nbsp; &nbsp; *dst32++ = fill;<br />
&nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">1</span>: &nbsp; &nbsp; *dst32++ = fill;<br />
&nbsp; &nbsp; <span class="kw1">case</span> <span class="nu0">0</span>: &nbsp; &nbsp; ; } <span class="kw1">while</span>(count&#8211;);<br />
&nbsp; &nbsp; }</p>
<p>&nbsp; &nbsp; <span class="co1">// Tail</span><br />
&nbsp; &nbsp; size &amp;= <span class="nu0">3</span>;<br />
&nbsp; &nbsp; <span class="kw1">if</span>(size)<br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; mask= BIT_MASK(size*<span class="nu0">8</span>);<br />
&nbsp; &nbsp; &nbsp; &nbsp; *dst32= (*dst32 &amp;~ mask) | (fill &amp; mask);<br />
&nbsp; &nbsp; }<br />
}</div>
</div>
<h4>Discussion</h4>
<p>
Both <code>tonccpy</code> and <code>toncset</code> have the<br />
following structure: the destination memory is divided into an<br />
unaligned <dfn>head</dfn>, followed by an aligned <dfn>main stint</dfn><br />
and then a <dfn>tail</dfn> for any trailing bytes. In the optimal case there<br />
is no head (i.e., the destination (and source) are word-aligned) and perhaps<br />
no tail either (<code>dstv+size</code> is aligned). If that happens you&#8217;re in<br />
luck: a 4x unrolled word copier will blaze through memory. And yes, that is a<br />
<a href="http://en.wikipedia.org/wiki/Duff's_device">Duff&#8217;s Device</a> I&#8217;m<br />
using there; I know it&#8217;s evil, but I&#8217;ve always wanted to try one. Interestingly,<br />
it is also the only way I&#8217;ve been able to get DKA to create the optimal output<br />
for an unrolled loop.
</p>
<p>
When you aren&#8217;t lucky, you&#8217;ll have to go through the motions of masking off<br />
halfwords and/or words to insert the bytes. Technically it&#8217;s faster to use<br />
words rather than halfwords, but this can be extremely unpleasant because<br />
it&#8217;s also possible for the transfer to be in the middle of a word. For example,<br />
<code>dst%4== 1</code> and <code>size = 2</code> would require a<br />
dst-mask of <code>0xFF0000FF</code>. I still have this in<br />
<code>toncset</code>, but for the copier you have to deal with such<br />
annoying cases that I preferred to go with the simplicity of halfwords for now.
</p>
<h4>Tests</h4>
<p>
I&#8217;ve tested the routines with all alignment variations and many different sizes<br />
and found them to work accurately. I&#8217;ve also tested them for speed. The<br />
following graphics show the results of the tests; while reading, remember<br />
that all items have been compiled with <code>-O2 -mthumb</code> and are<br />
in ROM. The source data for the copiers was in EWRAM and the destination<br />
was VRAM.
</p>
<p>
Fig&nbsp;1a and Fig&nbsp;1b show<br />
<code>tonccpy()</code>, <code>memcpy()</code> and something called<br />
<code>vramcpy</code>. <code>vramcpy()</code> is a more optimized<br />
version of <code>tonccpy()</code>, but it&#8217;s not ready yet. It uses<br />
<code>memcpy32()</code> for the aligned main stint, so that should<br />
represent optimal copying speed. The &ldquo;a&rdquo; and &ldquo;u&rdquo;<br />
affixes mean aligned and unaligned pointers, respectively.
</p>
<p>
As you can see, even though <code>tonccpy()</code> is more complicated<br />
than <code>memcpy()</code>, it&#8217;s a little faster in almost all cases. Only for short stints is <code>memcpy()</code> faster because<br />
<code>tonccpy()</code> has a larger overhead. The <code>vramcpy()</code><br />
lines show that there is much room for optimization for longer stints. Using<br />
a dedicated word copier (<code>memcpy32()</code>, DMA) would help,<br />
but I wanted <code>tonccpy()</code> to be portable and self-sufficient.
</p>
<p>
You can also see the incredible differences between when the source<br />
and destination have equal alignments (<i>a</i>) and when they don&#8217;t<br />
(<i>u</i>). I know it is possible to speed up the unaligned parts by<br />
50% to 100% or so as well, but I haven&#8217;t quite zeroed in on the right solution yet.
</p>
<div class=lblock>
<div class=cpt style="width:600px;">
  <img src="/img/post/tonccpy/tonccpy_long.png" id="img-cpy"<br />
    alt="" /><br />
  <b>Fig 1a</b>: copier comparisons, long stints.<br />
  <i>a</i>: src=dst alignment; <i>u</i>: unaligned.
</div>
<div class=cpt style="width:600px;">
  <img src="/img/post/tonccpy/tonccpy_short.png"<br />
    alt="" /><br />
  <b>Fig 1b</b>: copier comparisons, short stints.<br />
  <i>a</i>: src=dst alignment; <i>u</i>: unaligned.
</div>
</div>
<p></p>
<p>
The results of the fillers are in fig&nbsp;2a and<br />
fig&nbsp;2b. These pictures show<br />
<code>toncset()</code>, <code>memset()</code> and<br />
<code>toncset2()</code>. <code>toncset2()</code> is essentially<br />
<code>toncset()</code> using <code>memset32()</code> for the<br />
main stint. Because there&#8217;s no source alignment to worry about,<br />
there is no chance for <i>src</i>-<i>dst</i> misalignments. This is why there<br />
is little difference between the aligned and unaligned cases for the<br />
<code>toncset()</code> variants; only <code>memset()</code><br />
is very slow in the unaligned case.
</p>
<div class=lblock>
<div class=cpt style="width:600px;">
  <img src="/img/post/tonccpy/toncset_long.png" id="img-set"<br />
    alt="" /><br />
  <b>Fig 2a</b>: fill comparisons, long stints.<br />
  <i>a</i>: dst = aligned; <i>u</i>: unaligned.
</div>
<div class=cpt style="width:600px;">
  <img src="/img/post/tonccpy/toncset_short.png"<br />
    alt="" /><br />
  <b>Fig 2b</b>: fill comparisons, short stints.<br />
  <i>a</i>: dst = aligned; <i>u</i>: unaligned.
</div>
</div>
<p></p>
<p>
Lastly, the numbers for the transfer-rate and the overhead for calling<br />
them. Please remember that the actual numbers depend very much on<br />
what the waitstates are for the source and destination and where the<br />
code resides. That said, the figures should be useful for relative<br />
comparisons. Also, these are GBA timings; the NDS figures will be<br />
different, but I&#8217;ll test for them when I get round to it.
</p>
<div class=lblock>
<table>
<tr>
<td>
<table id="tbl-tonccpy"<br />
  border=1 cellpadding=4 cellspacing=0><br />
<caption align=bottom>
  <b>Table 1</b>: copier results in cycles.<br />
</caption>
<tbody align=right>
<tr>
<td>&nbsp;</td>
<th>rate [c/byte]</th>
<th>overhead [c]</th>
</tr>
<tr>
<th>vramcpy, a</th>
<td>2.219</td>
<td>270</td>
</tr>
<tr>
<th>vramcpy, u</th>
<td>19.520</td>
<td>165</td>
</tr>
<tr>
<th>memcpy, a</th>
<td>6.207</td>
<td>121</td>
</tr>
<tr>
<th>memcpy, u</th>
<td>35.00</td>
<td>84</td>
</tr>
<tr>
<th>tonccpy, a</th>
<td>5.83</td>
<td>167</td>
</tr>
<tr>
<th>tonccpy, u</th>
<td>25.020</td>
<td>151</td>
</tr>
</tbody>
</table>
</td>
<td width=32>&nbsp;</td>
<td>
<table id="tbl-toncset"<br />
  border=1 cellpadding=4 cellspacing=0><br />
<caption align=bottom>
  <b>Table 2</b>: filler results in cycles.<br />
</caption>
<tbody align=right>
<tr>
<td>&nbsp;</td>
<th>rate [c/byte]</th>
<th>overhead [c]</th>
</tr>
<tr>
<th>toncset2, a</th>
<td>0.656</td>
<td>222</td>
</tr>
<tr>
<th>toncset2, u</th>
<td>0.656</td>
<td>334</td>
</tr>
<tr>
<th>memset, a</th>
<td>3.000</td>
<td>158</td>
</tr>
<tr>
<th>memset, u</th>
<td>23.000</td>
<td>93</td>
</tr>
<tr>
<th>toncset, a</th>
<td>2.81</td>
<td>166</td>
</tr>
<tr>
<th>toncset, u</th>
<td>2.81</td>
<td>266</td>
</tr>
</tbody>
</table>
</tr>
</table>
</div>
<h4>Conclusions</h4>
<p>
<code>tonccpy()</code> and <code>toncset()</code> are essentially<br />
variations of <code>memcpy()</code> and <code>memset()</code> that<br />
also work for GBA/NDS VRAM even in the worse circumstances. They&#8217;re<br />
actually a little faster as well, so I can recommend using them instead of<br />
<code>memcpy/set</code> in all cases. This is not to say that they are<br />
the optimal solutions: faster general solutions certainly exist, but they<br />
will be longer, hairier and probably less portable. Faster non-general<br />
solutions exist as well, of course. If you know your pointers and sizes<br />
will be word-aligned, consider DMA32, <code>CpuFastSet</code> or the <code>memcpy32/set32</code><br />
routines from tonclib.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2008/01/tonccpy/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Tonc cleanup and fixes</title>
		<link>http://www.coranac.com/2007/12/tonc-cleanup-and-fixes/</link>
		<comments>http://www.coranac.com/2007/12/tonc-cleanup-and-fixes/#comments</comments>
		<pubDate>Wed, 05 Dec 2007 23:21:51 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>
		<category><![CDATA[tonc fix]]></category>

		<guid isPermaLink="false">http://www.coranac.com/2007/12/06/tonc-cleanup-and-fixes/</guid>
		<description><![CDATA[
I&#8217;ve uploaded some newer tonc files today, mostly for devkitArm r21 compatibility. Because the linkscript for multiboot got b0rken somehow, I had to change all the demos to default to cart-builds. I intended to change to cart-boot later anyway, but not being able to build the demos properly with the latest devkit kinda forces me [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ve uploaded some newer tonc files today, mostly for devkitArm r21 compatibility. Because the <a href="http://forum.gbadev.org/viewtopic.php?t=14493">linkscript for multiboot got b0rken</a> somehow, I had to change all the demos to default to cart-builds. I intended to change to cart-boot later anyway, but not being able to build the demos properly with the latest devkit kinda forces me to do it now. I&#8217;ve also had to change the tier-3 makefiles because it used `<tt>-fno-expections</tt>&#8216; in the CXXFLAGS. This should of course have been `<tt>-fno-exceptions</tt>&#8216; (thanks muff).
</p>
<p>
There have also been a few changes in the text parts: build-specs are set to cart-boot there too now, and I fixed some broken links. I&#8217;ve also fixed a slew of spelling and grammar issues that <a href="http://patatersoft.info/">Patater</a> sent in. These part of the text shouldn&#8217;t be gibberish anymore &ndash; just unintelligible <kbd>:P</kbd>.
</p>
<p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2007/12/tonc-cleanup-and-fixes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Usenti 1.7.8 and TTE demo</title>
		<link>http://www.coranac.com/2007/10/26/</link>
		<comments>http://www.coranac.com/2007/10/26/#comments</comments>
		<pubDate>Mon, 29 Oct 2007 22:15:31 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>
		<category><![CDATA[usenti]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[tte]]></category>

		<guid isPermaLink="false">http://www.coranac.com/wordpress/2007/10/30/26/</guid>
		<description><![CDATA[
One major and some smaller changes to usenti. The major one is that
there is now a font exporter that can convert bitmaps to TTE-usable
fonts. I&#8217;m not sure if it&#8217;s final yet, but any later changes should
be small. The text-tool has been altered to facilitate creation of fonts
by adding an opaque mode, an align-to-grid option and [...]]]></description>
			<content:encoded><![CDATA[<p class=ni>
One major and some smaller changes to usenti. The major one is that<br />
there is now a font exporter that can convert bitmaps to TTE-usable<br />
fonts. I&#8217;m not sure if it&#8217;s final yet, but any later changes should<br />
be small. The text-tool has been altered to facilitate creation of fonts<br />
by adding an opaque mode, an align-to-grid option and proper clipboard<br />
support (so you should be able to just copy an ASCII table into it, at<br />
which point the font is practically made already).
</p>
<p>
Also, there are separate pasting modes: one that matches the colors to<br />
the current palette (potentially mixing up the colors) and a<br />
direct-pixel paste, regardless of colors. Thanks, gauauu, for finally<br />
making me do this.  Secondly, Kawa&#8217;s been badgering me (politely)<br />
about editing colors in raw hex rather than via RGB triplets. This can<br />
be found, for various bad reasons, under the Palette menu under<br />
&lsquo;<tt>Advanced color edit&rsquo;</tt>.
</p>
<p>
Both of these items have &hellip; interesting side effects. Through<br />
the former, you can replace colors of a given palette-entry by<br />
copy-all, swap and paste. The color-edit accepts multiple colors<br />
separated by white-space and, later on, by commas as well, meaning that<br />
accidentally it&#8217;s now  possible to take previously exported palettes<br />
and add them again. Yes, it&#8217;s<br />
<a href="http://en.wikipedia.org/wiki/Creeping_featurism">feep</a>, but<br />
it&#8217;s interesting, somewhat hidden and cheap feep, so that&#8217;s alright. I&#8217;m<br />
thinking about adding something similar for the image itself as well<br />
(plus raw image imports), but only when I&#8217;m bored enough.
</p>
<p></p>
<p>
To show off the font exporter and the TTE system itself, there is a<br />
little demo of what it can do <a href="/files/misc/tte_demo.rar">here</a>.<br />
I&#8217;m still pondering over what it should and should not be able to do, but<br />
most of the things shown in the demo would be in the final version<br />
as well.
</p>
<p></p>
<p>
Oh, RE: that wordpress bug. A v2.3.1 has been released now with the fix. Interesting factoid: the error (#5088) was classified as &ldquo;highestomgbbq&rdquo;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2007/10/26/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>new tonclib</title>
		<link>http://www.coranac.com/2007/10/new-tonclib/</link>
		<comments>http://www.coranac.com/2007/10/new-tonclib/#comments</comments>
		<pubDate>Fri, 05 Oct 2007 20:55:13 +0000</pubDate>
		<dc:creator>cearn</dc:creator>
				<category><![CDATA[tonc]]></category>

		<guid isPermaLink="false">http://localhost/wordpress/index.php/20071005/new-tonclib/</guid>
		<description><![CDATA[
I&#8217;ve been making a lot of changes to tonclib &#8211; mostly
adding, but also some removals. The most important changes are:



  A more unified interface for the base drawing routines. Whereas I
  used to have something like bm8_foo(...), I now have
  bmp8_foo(..., void *dstBase, u32 dstPitch) for
  everything. Although the extra parameters [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ve been making a lot of changes to <tt>tonclib</tt> &ndash; mostly<br />
adding, but also some removals. The most important changes are:
</p>
<ul>
<li>
  A more unified interface for the base drawing routines. Whereas I<br />
  used to have something like <code>bm8_foo(...)</code>, I now have<br />
  <code>bmp8_foo(..., void *dstBase, u32 dstPitch)</code> for<br />
  everything. Although the extra parameters make the routines a little<br />
  slower, it makes it easier to switch video-modes.
</li>
<li>
  A few color routines like blending/fading, convert to rgbscale<br />
  (like grayscale, but for any color vector) and a few color adjustments.
</li>
<li>
  I&#8217;m trying to include (well, annex, really) some of libgba&#8217;s<br />
  functionality. In terms of shared functionality, the libgba names can<br />
  be used by including <tt>tonc_libgba.h</tt>. This is definitely not a<br />
  finished item yet.
</li>
<li>
  <em>Tonc&#8217;s Text Engine</em>. I already had some basic routines for<br />
  text on different video types, but this is a good deal better. Instead<br />
  of having separate <code>foo_puts()</code> routines, TTE uses function<br />
  pointers for placing glyphs on screen. This means there can be a single<br />
  interface for all modes, and customizable writers. Already provides are<br />
  glyph renderers for 8/16bit bitmaps and 4bit tiles, using a 1bpp bitpacked<br />
  font. In principle, the renderers can handle any sized fixed and<br />
  variable-width fonts (within reason: 128&#215;128 fonts would be impractical,<br />
  for example). There are also hooks for the <code>stdio</code> functions<br />
  (printf, yay!) and some simple commands for positioning, color and font<br />
  changes. Example of use:</p>
<pre class="proglist">
<span class="cmt">// Set-up 4bpp tile rendering bg 0 using cbb=0 and sbb=31.
// The default options set implicitly here are: verdana 9 font, yellow for
// text color</span>
tte_init_chr4_dflt(<span class=num>0</span>, BG_CBB(<span class=num>0</span>)|BG_SBB(<span class=num>31</span>));
<span class="cmt">// Init stdio hooks</span>
tte_init_con();
<span class="cmt">// Print something at position (10,10)</span>
iprintf(<span class=str>"\\{P:10,10}'Ello world!, %d"</span>, <span class=num>1337</span>);
</pre>
<p>  Aside from the initializer, using TTE is basically independent of what<br />
  you&#8217;re writing with or on. Of course, all this stuff does have a fair<br />
  amount of per-character overhead (about 150 cycles, I believe). It shouldn&#8217;t be too hard to port TTE to NDS; I am planning to do this at some point.
</li>
</ul>
<p class=ni>
There are more smaller changes here and there, but those are of lesser consequence.
</p>
<p>
<a href="/files/misc/tonclib-1.3b.rar" title="tonclib 1.3">tonclib 1.3 linky</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coranac.com/2007/10/new-tonclib/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
</head>
<body>
<p>
My database has called in sick. Please imagine some 
annoying elevator tune till he gets back.
</p>
<p>
<small>[[Doo-di-doo tooo. Dum-di-dum-di-doo-dooo.]]</small>
</p>
</body>
</html>

-->