Coranac

I maek game :D

2009-12-23 – 21:12 | .

Okay, so it's only a card game; but a game nonetheless.

 

The game in question is an NDS implementation of SET. Set is a card-matching game with 81 cards (see below). The figures on the cards have four properties and 3 possibilities for each property. The key is to find three cards for which the values of each property are either all equal or all different. Looking at the color property, for example, a "Red Red Red" combination could (yes "could"; there are still three other properties to consider) form a set. "Red Green Blue" would also work, but "Red Green Green" would not.

Further details can be found in the readme and the game itself.


All 81 Set cards.
 

The game is mostly finished. There may be some tweaking to do here and there, but right now I don't want to get bogged down in a massive fine-tuning-fest – especially since I'm not sure what parts need fine-tuning … and because there's other stuff I really should get back to.

That said, all important aspects work … with one exception: hiscore saving. Yes, that. I've seen the multitude of threads on the subject but sofar I'm unsure of what would work on both hardware and emulator, so I'm leaving it as is for now. If anyone has a tidy hw+emu solution, please do tell.

Links

 

Oh, and merry Christmas everybody.

Some new notes on NDS code size

2009-11-24 – 17:31 | .

When I discussed the memory footprints of several C/C++ elements, I apparently missed a very important item: operator new and related functions. I assumed new shouldn't increase the binary that much, but boy was I wrong.

The short story is that officially new should throw an exception when it can't allocate new memory. Exceptions come with about 60 kb worth of baggage. Yes, this is more or less the same stuff that goes into vector and string.

The long story, including a detailed look at a minimal binary, a binary that uses new and a solution to the exception overhead (in this particular case anyway) can be read below the fold.

(more...)

One of the nice features of WordPress is that it already has a lot of functionality built-in. The whole thing is set up so that normal people can just install and start writing posts immediately, with WordPress taking care of all the details like converting HTML entities and adding newline where appropriate.

Of course, for those that aren't normal and that would like to write in raw HTML, these things are somewhat annoying. Fortunately, though, WordPress allows you to disable these kinds of filters. The catch is that you need to find out which filters to disable, namely, wptexturize (which converts HTML entities) and wpautop (which does newline control). WordPress also makes it easy add additional filters, like the CodeSnippet plugin that I use for code highlighting.

However, with the amount of filters available, sometimes things will clash. A good example of this is comments that have source code in them. Part of what CodeSnippet does is convert certain characters (specifically: ‘<’, ‘>’, ‘&’) to printable characters (&lt;, &gt;, &amp;) and aren't considered special HTML characters anymore. However, there are several other filters that have a similar task, so that when you write this:

Oh hai! This is a useful bitfield function.
 
[code lang="cpp"]
template<class T>
inline void bfInsert(T &y, u32 x, int start, int len)
{
   u32 mask= ((1<<len)-1) << start;
   y &= ~mask;
   y |= (x<<start) & mask;
}
[/code]

what it becomes is:

Oh hai! This is a useful bit function.
template
inline void bfInsert(T &amp;y, u32 x, int start, int len)
{
    u32 mask= (1&lt;&lt;len) &lt;&lt; start;
    y &amp;= ~mask;
    y |= (x&lt;&lt;start) &amp; mask;
}

Not exactly pretty. Note that the template class is simply removed because it's seen as an illicit HTML tag, and all the special characters are doubly converted. This is still a mild example; I think if you place the angle brackets wrong, whole swaths of code can simply be eaten by the sanitizer.

Unfortunately, finding out where the problem lies is tricky. Not only are there dozens of potential functions doing the conversion, they can be called from anywhere and PHP isn't exactly rich in the debugger department. You also have no idea where to start, because the filters can be called from everywhere. Worse still, in this particular case the place where the bad happens is actually before the comment is even saved to the database (but only for unregistered people; for me the code comments would work fine), and because comments are handled on a page that you don't actually ever see, random echo/print statements are useless as well.

But I think I finally got it: it was wp_kses() using (in a roundabout way) wp_specialchars() in the wp-includes/kses.php roomfile. The contractor is actually wp_filter_comment() from wp-includes/comment.php, using the pre_comment_content filter as a middleman.

The trick now is to keep it from happening. What I've done is define not one but two pre_comment_content filters: one that pre-mangles the brackets and ampersand before wp_kses, and one that de-mangles them afterwards. Of course, this will only be of importance between [code] tags. Exactly how to do this will depend on the plugin you're using, but in the case of CodeSnippet it goes like this:

//# Put this along with the other add_filter() calls.

// Ensure in-\&#91;code] entities ('<>&') work out right in the end.
add_filter('pre_comment_content', array(&$CodeSnippet, 'filterDeEntity'), 1);
add_filter('pre_comment_content', array(&$CodeSnippet, 'filterReEntity'), 50);

...

//# Add these methods to the CodeSnippet class.
    /**
     * Pre-encode HTML entities. Should come \e before wp_kses.
     */

    function filterDeEntity($content)
    {
        $content=  preg_replace(
            '#(\[code.*?\])(.*?)(\[/code\])#msie',
            '"\\1" . str_replace(
                array("<", ">", "&"),
                array("[|LT|]", "[|GT|]", "[|AMP|]"),
                \'\\2\') . "\\3";'
,
            $content);
        $content= str_replace('\"', '"', $content);
       
        return $content;
    }
    /**
     * Decode HTML entities. Should come \e after wp_kses.
     */

    function filterReEntity($content)
    {
        if(strstr($content, "[|"))
        {
            $content= preg_replace(
                '#(\[code.*?\])(.*?)(\[/code\])#msie',
                '"\\1" . str_replace(
                    array("[|LT|]", "[|GT|]", "[|AMP|]"),
                    array("<", ">", "&"),
                    \'\\2\') . "\\3";'
,
                $content);
            $content= str_replace('\"', '"', $content);
        }
       
        return $content;
    }

Notice that both methods are under the same filter group. The trick is that they have different priorities, which makes one act before wp_kses(), and one after. Also note how the regexps work in the replacement part of preg_replace(). This particular feature of preg_replace() allows for shorter code, but is very fragile; it may be better to use preg_replace_callback() instead. In any case, written like this it seems to work:

Oh hai! This is a useful bit function.
template<class T>
inline void bfInsert(T &y, u32 x, int start, int len)
{
   u32 mask= ((1<<len)-1)<<start;
   y &= ~mask;
   y |= (x<<start) & mask;
}

Comment preview

The code-comment mangling is just part of the issues one can encounter in blog comments. It's usually impossible to see beforehand what will be accepted and what not. Is HTML allowed? Are all tags allows, or just some or none at all? What about whitespace? Or BB-like tags? Basically, you'll never know what a comment will look like until you submitted it, and by then it's too late to change it.

You know what'd be really helpful? A comment preview!

You'd think this'd be a fairly obvious feature for a blogging system to have, but apparently not. I was thinking of making by own preview functionality, but when attempting to do so several items within WP thwarted my efforts. Fortunately, it seems plugins of this sort exist already. The plugin I'm now using is ajax-comment-preview, which works pretty darn well.

 

So anyway, comments should be able to handle code properly now and there's a comment-preview to show you what the comment will look like in the end. And there was much rejoicing.

Signs from Hell

2009-08-03 – 21:42 | .

The integer datatypes in C can be either signed or unsigned. Sometimes, it's obvious which should be used; for negative values you clearly should use signed types, for example. In many cases there is no obvious choice – in that case it usually doesn't matter which you use. Usually, but not always. Sometimes, picking the wrong kind can introduce subtle bugs in your program that, unless you know what to look out for, can catch you off-guard and have you searching for the problem for hours.

I've mentioned a few of these occasions in Tonc here and there, but I think it's worth going over them again in a little more detail. First, I'll explain how signed integers work and what the difference between signed and unsigned and where potential problems can come from. Then I'll discuss some common pitfalls so you know what to expect.

1 Basics

The signedness of a variable refers to whether it can be used to represent negative values or not. Unsigned variables can only have positive values; signed values can be both positive or negative.

In the computer world, signedness is mostly a matter of interpretation. Say you have a variable that is N bits long. This is enough room for 2N distinct numbers, but it says nothing about which range of numbers you should be using them for. Interpreted as unsigned integers, its range would be [0, 2N−1]. Under a signed interpretation, you'd use some bit-patterns for negative numbers. There are actually several ways of doing this, but the most commonly used is known as two's complement which leads to a [−2N−1, 2N−1−1] range: half positive and half negative.

1.1 Two's complement theory

Two's complement is sometimes seen as an awkward system, but it actually follows quite naturally when you only have a fixed number of digits to write down numbers with. Consider the whole line of positive and negative integers. As you move away from zero, the numbers will grow larger and larger. Now suppose you have an counting device composed of a limited number of digits, each of which can only display numbers 0 through 10−1. With N digits, you only have room for 10N different numbers, and once those are used up (at 10N−1), the counter returns to 0 and counting effectively resets. In essence, the number on the counter works in modulo 10N.

The key is that this works in both positive and negative directions. As far as the counter is concerned, 0 and 10N are the same thing. This being the case, you can argue that −1 (that is, the number before zero) is equivalent to 10N−1; and −2 ≡ 10N−2, and so on. Note that this works regardless of what 10 actually is; it can be ten (decimal), two (binary) or sixteen (hexadecimal).

The 10N possible numbers form a window over the number line, but where the window starts is up to the user. For signed numbers, you can move the window so that the upper half of the 10N range is interpreted as negative numbers.

 

Fig 1 shows how this works for 8-bit numbers (written in hex for convenience). The black numbers represent the entire number line, where numbers can have as many digits as you need. With only two nybbles, the counter repeats every 100h = 256 values. FFh, 1FFh, but also −1 all reduce to the same symbol, namely FFh. In Fig 2 you can see how the available symbols are mapped to either signed or unsigned values. In the unsigned case, numbers simply count from 0 to FFh; for signed, the top half of the symbol range is put on the left side of zero and are used for negative numbers.


Fig 1. Representing numbers with limited bits in hexadecimal. With only 8 bits, only the last 2 nybbles are shown. The cycle 00h-FFh repeats every 256 values.

Fig 2. The 00h-FFh range interpreted as unsigned or signed numbers. Note that, say, symbol “FF“ is used for both values FFh (=255) and −1.
 

The mathematical reason behind all this like this. Assume for convenience that N = 1, so that 0 is equivalent to 10 and in fact every multiple of 10. By definition, subtracting a value from itself gives 0. Because subtraction is merely addition by its negative value, you get the following:

(1) \begin{eqnarray} x &-& x &=& 0& \\ x &+& (-x) &=& 0 & \\ x &+& (-x) &\equiv& 10 & \\ & & (-x) &=& 10 &- x \end{eqnarray}

The term −x in the last step should be seen as a unit, call it C. Numerically, C is the number that, when added to x, gives 10. In decimal, if x = 1, then C = 9. C is called the 10's complement of x, because it's what's needed to complete the 10. It's called the two's complement in binary, because then 10 equals two.

In binary, there is an alternative to calculate the twos complement of a number. Subtracting a number from 2N is equivalent to inverting all its bits, so you get:

(2) \begin{eqnarray} (-x) &=& 2^N - x \\ &=& 2^N -1 - x + 1 \\ (-x) &=& \sim x + 1 \\ \end{eqnarray}

Using two's complement(1) for negative numbers has some interesting properties. First, subtraction and addition are basically the same thing. This is nice for arithmetic implementers for two reasons: the same hardware can be used for both operations, and it can be used for both positive and negative numbers.

Second, because the top half is now used for negative numbers, the most significant bit can be seen as a sign bit. Note: a sign bit, not the sign bit. There is a subtle linguistic difference here. When talking about the sign bit, one may thing of it as a single bit that indicates the sign. For example, 8-bit +1 and −1 could be `0000 0001' and `1000 0001', respectively. In two's complement, however, +1 and −1 are actually `0000 0001' and `11111111' (the sum of which is `1,00000000' ≡ 0, as it should be).

1.2 Declaring signed or unsigned

In the end, whether a particular group of bits is signed or unsigned is a matter of interpretation. For example, the 8-bit group `1111 1111' can be either 255 or −1, depending on how you want to look at it. You can't determine the signedness from just the bits themselves.

Also, when you've decided you're going to use a signed interpretation, whether the group forms negative number or not depends on the size of the group. for example, consider the two bytes `01 FF'. As separate bytes, these would form +1 and −1, respectively. However, if you view them as a single 16-bit integer (‘short”), it forms 0x01FF, which is a positive number.

 

In C, you specify signedness when you declare a variable. The general rule is that an integer is signed unless the keyword `unsigned' is used. The exception to the rule is `char', whose default signedness is platform and compiler-dependent! Be careful with this particular datatype.

int ia;                 // Signed integer.
unsigned int ib;        // Unsigned integer.

short sa;               // Signed 16-bit integer.
unsigned short sb;      // Signed 16-bit integer.

char ca;                // ??-signed 8-bit integer.
signed char cb;         // signed 8-bit integer.
unsigned char cc;       // unsigned 8-bit integer.

Because they're shorter and more descriptive, the following typedefs are often used for variable declarations. Basically, it's ‘s’ or ‘u’ for signed or unsigned, respectively, followed by the size of the type in bits. Unsigned variants are also sometimes indicated by ‘u”+typename.

Table 1: common short (un)signed typedefs.
Base type Signed Unsigned
char s8 u8 uchar
short s16 u16 ushort
int/long s32 u32 uint
long long s64 u64  
 

In assembly, you can't declare the signedness of variables, because there's no such thing as variables. There's only labels and how you use those labels determines what the related data are. Technically, there is only one datatype: the 32-bit word, corresponding to C's int or long. The other datatypes are essentially emulated, or defined by how which memory instructions you use: LDRB/LDRSB/STRB for bytes and LDRH/LDRSH/STRH for halfwords. For most data operations, signedness is irrelevant and as such mostly ignored. Only in a few cases does the sign actually matter and as these are essentially the topic of the rest of the article, we'll get to those eventually.

2 Potential problems

The following sections are cases where signedness may become problematic. I say “may”, because often it just works out. But that's just the thing: it can work most of the time and then things can go horribly wrong all of a sudden. The root of the problem comes down to one thing: negative numbers; usually, negative numbers becoming large positive numbers when interpreted as unsigned values.

For example, 32-bit signed −1 = 0xFFFFFFFF = unsigned 4294967295 (= 232−1). If nothing else, remember that part.

2.1 Sign extension, casting and shifting

When you go from a small datatype to a larger one, you're essentially adding a new set of bits at the top, and these bits have to be initialized in a meaningful way. The addition of these bits should have no effect on the value itself. For example, +1 should remain +1 and −1 should remain −1. What this boils down to for two's complement is that the new bits need to be filled with the sign-bit of the old value. This is called sign extension, because the top-bit (the sign-bit) is extended into all the higher bits. There is also zero-extension, which is when the higher bits are zeroed out. These two forms effectively correspond to signed and unsigned casting. (2).


Fig 3. Sign- and zero-extension for bytes when casting. A raw “F0” becomes 240 unsigned or −1 signed.

Conversions of this kind actually happen all the time, without any kind of direct intervention from the programmer. Data operations are always done in CPU words and any time you use a smaller datatype, there is the need to sign- or zero-extend. This also brings forth the question of which type of extension will be used: sign- or zero-extension. As the following bit of code shows, it depends on the signedness of the variable you're converting from. 8-bit variables sc and uc are both initialized by 0xFF, which is either −1 or 255 (you can use either of those too, by the way). After that, these are used to initialize signed or unsigned words.

As you can see from the output, the value in the words correspond to the signedness of the bytes, not the words. Also note that printing sc (the signed byte) gives 0xFFFFFFFF and not the 0xFF you initialized it with, and which are in fact its actual contents since 0xFFFFFFFF is too large to fit into a byte. However, when using it with anything, it's automatically extended to word-size. This becomes great fun when you later compare it to 0xFF again.

// Testing implicit conversions.
void test_conversion()
{
    s8 sc= 0xFF;        // 8-bit -1 (and 255)                  
    u8 uc= 0xFF;        // 8-bit 255 (and -1)

    s32 sisc= sc, siuc= uc;
    u32 uisc= sc, uiuc= uc;

    printf("  sc: %4d=%08X ;   uc:%4d=%08X\n", sc, sc, uc, uc);
    printf("sisc: %4d=%08X ; siuc:%4d=%08X\n", sisc, sisc, siuc, siuc);
    printf("uisc: %4d=%08X ; uiuc:%4d=%08X\n", uisc, uisc, uiuc, uiuc);
    printf("sc==0xFF : %s\n", (sc==0xFF ? "true" : "false") );

    /* Output:
          sc:   -1=FFFFFFFF ;   uc: 255=000000FF
        sisc:   -1=FFFFFFFF ; siuc: 255=000000FF
        uisc:   -1=FFFFFFFF ; uiuc: 255=000000FF
        sc==0xFF : false

        Warnings issued (for sc=0xFF):
        - warning C4305: 'initializing' : truncation from 'const int' to 'signed char'
        - warning C4309: 'initializing' : truncation of constant value
    */

}
 

Sign- and zero-extension also play a role in right-shifts. When using shifts for arithmetic (shift-right is short-hand for a division by power of two), you want the sign preserved. For example, when dividing −16 = 0xFFFF:FFF0 by 16 (shift-right by 4), you want the result to be −1 (=0xFFFF:FFFF), and not 268435455 (=0x0FFF:FFFF). The right-shift that preserves the sign is the arithmetic right-shift, and is used for signed numbers. For unsigned numbers, or if the variable is considered a set of bits instead of a single number, a logical right-shift is appropriate, since that uses zero-extension.

In assembly, arithmetic and logical right-shift are called ASR and LSR, respectively. In Java and other languages where the keyword unsigned does not exist the difference is indicated by >> (sign-extend) and >>> (zero-extend). In C, however, both types use the same symbol: >>. As such, you cannot tell which type of extension is used from just the expression; you'd have to look at the signedness of the operands (including temporaries) to see if it's a logical or arithmetic right-shift.

Table 2: Right-shifts for different languages.
Language Signed Unsigned
ARM asm asr lsr
C >> >>
Java(script) >> >>>

Fig 4. Sign- and zero-extension when right-shifting by four. Unsigned F0h=240 or −16. Unsigned 240>>4 = 15; Signed −16>>4 = −1.

This ambivalence of shift symbols in C can be a major source of pain in fixed-point calculations. Since unsigned has precedence over signed, if you have an unsigned variable at any point of the calculation, all subsequent calculations are unsigned too and you can kiss negative numbers goodbye. If everything starts going wrong as soon as you move in another direction or if rotations aren't calculated properly, this will be the cause.

 

The code below illustrates the problem in a very common situation. You have a position p, and a directional vector for movement, u. Since you want sub-pixel control of these, you use fixed-point notation for both (I'm assuming non-FPU system here). The u vector is a unit vector (say, cos(α), sin(α)); to get to the full velocity vector, we have to multiply u by some speed. The procedure comes down to something like this:

pnew = pold + speed·u

In the example, I'm only considering the x-component for convenience. Now, because position and direction can have negative components, those would be signed. The speed, however, is a length and therefore always positive, so it makes sense to make it unsigned, right? Well, yes and no. As you can see from the result, mostly no.

With speed = +1 and ux = −1, the end result should be +1*−1 = −1, which would be 0xFFFFFF00 in Q8 fixed-point notation. However, it isn't, thanks to the unsignedness of speed, which makes subsequent arithmetic unsigned so the right-shift does not sign-extend. So instead of the small step you intended, you get a giant leap into no man's land.

void test_right_shift()
{
    // Assume movement for 2 directions, with Q8 for everything.
    // a = look direction.  
    // p = (px, py) = position.
    // u = (ux, uy) = ( cos(a), sin(a) )

    int  px= 0;                 // Starting position.
    int  ux= -1<<8;             // Moving backwards.
    uint speed= 1<<8;           // Unsigned as speed's always >= 0, right?

    px = px + (speed*ux>>8);    // Fixed point motion. Result should be -1<<8.

    printf("px : %d=%08X\n", px, px);

    /* Result:
        px: px : 16776960=00FFFF00

        In other words: NOT the -1<<8 = 0xFFFFFF00 you were after.
    */

}

This mistake is depressingly easy to make, even for those who generally think about which datatype to use. Especially those people, as they're prone to optimize prematurely and automatically pick unsigned for a variable that will never be negative. The danger is that unsigned arithmetic has precedence, which can screw up at later right-shifts.

Bottom line: variables used in fixed-point calculations should be signed. Always.

2.2 Division

This isn't really a signed-vs-unsigned item per se, but integer division behaves in a peculiar way for negative numbers. It becomes one, however when you throw right-shift in the fray, which doesn't quite work like a division equivalent anymore for negative numbers. To discriminate between integer and normal division, I will use ‘\ ’ for integer division in this section. Note also the modulo operation is intimately tied to division, so this section applies to that as well.

What integer division comes down to is taking a normal division and throwing away the remaining fraction. For example, 7 / 4 = 1¾. The integer division is just 1. This is also true for negative numbers: −7 / 4 = −1¾, so 7 \ 4 = −1. In short, integer division rounds towards zero. With bit-shifting, however, you get something slightly different. Theoretically, x>>n is equivalent to x \ 2n. For positive numbers, this is true: 7>>2 in binary is 00000111>>2 = 00000001. But with −7>>2 you get 11111001>>2 = 11111110 = −2. Division-by-right-shift always rounds to negative infinity.

The upshot of this difference is that for negative numbers, the results of x \ 2n and x>>n will be out of sync, as Table 3 illustrates. They still give identical results for positive numbers though.

Table 3: integer and by-shift division by four.
x (dec) x \ 4 x>>2 (dec)   x (bin) x>>2 (bin)
-9 -2 -3 11110111 11111101
-8 -2 -2 11111000 11111110
-7 -1 -2 11111001 11111110
-6 -1 -2 11111010 11111110
-5 -1 -2 11111011 11111110
-4 -1 -1 11111100 11111111
-3 0 -1 11111101 11111111
-2 0 -1 11111110 11111111
-1 0 -1 11111111 11111111
0 0 0 00000000 00000000
1 0 0 00000001 00000000
2 0 0 00000010 00000000
3 0 0 00000011 00000000
4 1 1 00000100 00000001
5 1 1 00000101 00000001
6 1 1 00000110 00000001
7 1 1 00000111 00000001
8 2 2 00001000 00000010
9 2 2 00001001 00000010

There are some other consequences besides the obvious difference in results. First, there's how compilers deal with it. Compilers are very well aware that a bit-shift is faster than division and one of the optimizations they perform is replacing divisions by shifts where appropriate(3). For unsigned numerals the division will be replaced by a single shift. However, for signed variables some extra instructions have to added to correct the difference in rounding.

Second, note that the standard integer division does not give an equal distribution of results: there are more results in the zero-bin. Shift-division spreads the results around evenly. In some cases, you will want to use the shift version for that reason. One clear example of this would be tiling: using the ‘proper’ integer division would give you odd-looking results.

Negative number division / right-shift equivalents

Table 3 shows that for negative numbers, integer division and right-shift don't give the same results. If you do want the same results, the following equations can be used. Given x < 0 and N = 2n, then

\begin{eqnarray} x \backslash N &=& (x + (N-1)) >> n \\ \\ x>>n &=& (x - (N-1)) \backslash N \end{eqnarray}

GCC will use the x\N equivalence to produce signed integer division if possible.

2.3 Comparisons

The last area where signedness can be a factor is comparisons. The next bit of code is from my implementation of a filled circle renderer with boundary clipping. The circle is centered on (x0y0). Variables x and y are local variables that keep track of where we are on the circle, because these can be negative, they must be signed. Variables dstW and dstH are the destination image's width and height. Since width and height are unsigned by definition, it'd make sense to make these unsigned, right? Right?

//# Part of a clipped filled circle renderer that didn't quite work.

    int dstP= srf->pitch/2;                     // used in arithmetic, so signed.
    uint dstW= srf->width, dstH= srf->height;   // Unsigned by definition.
    u16 *dstD= ((u16*)srf->data)+(y0*dstP);
   
    int x=0, y= rad, d= 1-rad, left, right;

    ...
   
        // Side octants
        left= x0-y;
        right= x0+y;
        \<b\>if(right>=0 && left<=dstW)\</b\>       // Fully out of bounds
        {
            if(left<0)      left= 0;            // Clip left
            if(right>=dstW) right= dstW-1;      // Clip right

            // Render at scanlines y0-x and y0+x
            if(inRange(y0-x, 0, dstH))
                armset16(color, &dstD[-x*dstP+left], 2*(right-left+1));
            if(inRange(y0+x, 0, dstH))
                armset16(color, &dstD[+x*dstP+left], 2*(right-left+1));
        }
    ...

Well, apparently not. When I tested this, right and bottom edge clipping went fine, but when the circle went over the top or left edge, it disappeared completely.

The problem lies with the line in bold, which does the trivial rejection test. Variables left and right are the left and right-most edges of the scanline of the circle. If this is completely to the left of the screen (right < 0) or to the right of the screen (left ≥ dstW) then there's nothing to do. Technically, the tests on that line are correct, so the code should work. The reason it doesn't actually occurs a few lines earlier: the definition of dstW as an unsigned variable. Because of this, the second condition is an unsigned comparison. Now think of what happens when left moves over the left of the screen. left becomes becomes a (small) negative number, which is converted to postive number for the comparison. A large positive number for that matter – one that's quite a bit larger than the width of the image and as a result the routine thinks the circle is out of bounds.

So again, a routine went all wonky because I assumed that, since a width is always positive, using an unsigned variable would be a good idea.

 

The worst part of this particular bit, however, is that I should have known this. The compiler actually issues a warning for this type of thing:

warning: comparison between signed and unsigned integer expressions

Or at least it would have if I hadn't disabled the warning because the message was cropping up everywhere in my normal and sign-safe for-loops. Let this be a lesson: disable warnings at your own risk and for Offler's sake do not ignore them.

2.4 Well, duh

The problems covered above are the subtle ones, where you have to be aware of some of the details that go into the C language itself. There are also a few issues where the programmer really should have known they were going to be a problem from the start.

 

The first example is, again, one that can occur when optimizing prematurely. You may have heard that loops work better when you count down instead of count up, because in machine code a subtraction is an automatic comparison to zero. So, a clever programmer may turn this:

uint i;         // Unsigned, since it's always positive.
for(i=0; i<size; i++)
{
    // Do whatever
}

into this:

uint i;         // Unsigned, since it's always positive. Right?
for(i=size-1; i>=0; i--)
{
    // Do whatever
}

There are two problems with this code. First, the change probably will not matter with modern compilers because they are aware of the equivalence and can do this conversion themselves(4), so there's nothing to gain from this.

The real problem, however, is the terminating condition: `i>=0'. Since i is unsigned, it can never be negative, and therefore the condition is always true.

 

The second example involves bitfields. As it happens, bitfields can be signed or unsigned as well. For the most part, handling this is like handling normal signedness, but there is one situation where you have to be careful.

void test_bitfield()
{
    struct Foo {
        int     s7 : 7;     // 7-bit signed
        uint    u7 : 7;     // 7-bit unsigned
        int     s1 : 1;     // 1-bit signed
        uint    u1 : 1;     // 1-bit unsigned
    };

    Foo f= { -1, -1, 1, 1 };

    printf("s7: %3d\nu7: %3d\ns1: %3d\nu1: %3d\n\n", f.s7, f.u7, f.s1, f.u1);
   
    /*  Results:
        s7:  -1     // Inited to -1
        u7: 127     // Inited to -1
        \<b\>s1:  -1     // Inited to  1\</b\>
        u1:   1     // Inited to  1
    */

}

In the code above I've created a bif-fielded struct with both signed and unsigned members. There are two 7-bit fields and two 1-bit fields, and these are initialized to −1 and +1, respectively. The values are then printed.

The 7-bit fields work as you might expect. f.s7 is −1, as it's signed, and f.u7 is 127, which is the 7-bit equivalent of −1. The interesting case is for f.s1. This is initialized to 1, but comes out as −1, because for a single signed bit the possibilities are 0 and −1, and not 0 and +1! Without this knowledge, a later test like `f.s1==1' might give unexpected results.

3 Summary

So, summarizing:

  • Unsigned variables only represents positive numbers; signed ones can have positive or negative values. Negative numbers are usually represented via two's complement, which is based on the cyclical nature of counters when you have a limited number of digits.
  • In C, integers are signed unless specified otherwise, except for char, whose signedness is compiler dependent.
  • Careless use of signed and unsigned types can result in subtle runtime bugs with not-so-subtle results. Usually, what happens is that a negative number is reinterpreted as a very large positive number and everything goes banana-shaped.
  • Unsigned has a higher operator precedence than signed. If one of the operands is unsigned, the operation will use unsigned arithmetic. This can cause problems for divisions, modulos, right-shifts and comparisons.
  • For negative numbers, division/modulo by 2n is not quite the same as right-shifts/ANDs. Analyse which is best for your situation, then act accordingly.
  • Ignore compiler warnings at your own peril.
  • The place where a bug manifests is not always the place where it originates. The declaration of variables matters! Do not forget this when debugging or when asking for assistance.
 

There isn't really a hard rule on when to use which signedness, but here are a few guidelines nonetheless.

  • If a variable can, in principle, have negative values, make it signed. If it represents a physical quantity (position, velocity, mass, etc), make it signed.
  • A variable that represents logical values (bools, pixels, colors, raw data) should probably be unsigned.
  • And now the big one: just because a variable will always be positive doesn't mean it should be unsigned. Yes, you may waste half the range, but using signed variables is usually safer. If you must have the larger range (for the smaller datatypes, for example), consider defining the storage variables unsigned, but convert them to local signed ints when you're really going to use them.
  • If mathematical symbols were gods, the minus sign would be Loki. Be extra careful when you encounter them. If there are minus signs anywhere in the algorithm, or even the potential for negative numbers, everything should be done with signed numbers.

Notes:
  1. Or any 10's complement, really.
  2. One could say that zero-extension is just a form of sign-extension; it's just that the sign for an unsigned number is always positive.
  3. And please let the compiler do its job in this regard: the low operator-precedence of shifts makes their use awkward and error-prone. If you mean division, then use division.
  4. Although they may well do it incorrectly: turning the decrementing loop into an incrementing one. Point is, the compiler may not follow exactly what you're doing anyway.

Gaddammit!

 

So here I am, looking forward to a nice quiet weekend; hang back, watch some telly and maybe read a bit – but NNnnneeeEEEEEUUUuuuuuuuu!! Someone had to write an interesting article about sine approximation. With a challenge at the end. And using an inefficient kind of approximation. And so now, instead of just relaxing, I have to spend my entire weekend and most of the week figuring out a better way of doing it. I hate it when this happens >_<.

 

Okay, maybe not.

 

Sarcasm aside, it is an interesting read. While the standard way of calculating a sine – via a look-up table – works and works well, there's just something unsatisfying about it. The LUT-based approach is just … dull. Uninspired. Cowardly. Inelegant. In contrast, finding a suitable algorithm for it requires effort and a modicum of creativity, so something like that always piques my interest.

In this case it's sine approximation. I'd been wondering about that when I did my arctan article, but figured it would require too many terms to really be worth the effort. But looking at Mr Schraut's post (whose site you should be visiting from time to time too; there's good stuff there) it seems you can get a decent version quite rapidly. The article centers around the work found at devmaster thread 5784, which derived the following two equations:

(1) \begin{eqnarray} S_2(x) &=& \frac4\pi x - \frac4{\pi^2} x^2 \\ \\ S_{4d}(x) &=& (1-P)S_2(x) + P S_2^2(x) \end{eqnarray}

These approximations work quite well, but I feel that it actually uses the wrong starting point. There are alternative approximations that give more accurate results at nearly no extra cost in complexity. In this post, I'll derive higher-order alternatives for both. In passing, I'll also talk about a few of the tools that can help analyse functions and, of course, provide some source code and do some comparisons.

(more...)

Bughunting

2009-06-28 – 18:56 | .

Yes! YES!! OH GOD, YES!!!

 
I mean … uhm …

While browsing throught the E3 reports, I was moderately pleased to see the Aliens versus Predator series (of games, not movies) is getting another sequel.

 

I've always had a soft spot for xenomorphs. This extends to the Aliens versus Predator games that have been released on the PC. The first AVP came out in 1999 and I think this was one of the first PC games I ever bought. While not really as popular or as rich in storyline as, say, HalfLife, I still think it has many redeeming qualities even today. For example …



An Alien on the ceiling watching dinner walk by: I seeee you.

You can play as Human, Alien, or Predator, each with very different styles of play. This was pretty unique back then for FPSs. Actually, I think it still is. The Alien in particular was unusual: very quick, able to walk on walls and ceilings and a strange fish-eye lens point of view. It also had no ranged attack, which meant you had to get up close and personal to attack. Moreover, the alien did not have much in the way of hitpoints, which effectively meant that you had to not only get close, but get close undetected. You had to hide in dark corners and on ceilings waiting for people to walk by and then bite their heads off.

Most games will put you as the Hero Marine against bug-like critters to be slaughtered en-masse. This game gives you the opportunity to see what it looks like from the other side, which is definitely an educational experience. One thing that comes very clear, for example, is why fire and flamethrowers are not your friend. Bullets could often be avoided (except from turrets), but flamethrowers put up an entire wall of fire and one hit would keep burning for quite some time which, for creatures with few hitpoints, would fit nicely into the bad things category.


Playing against the aliens was also a different-than-usual experience. Able to hide anywhere (how many FPSs require you to check the ceilings?), nearly invisible against the background, fast and very, very deadly.

And … oh yeah! They bleed acid. Yeah.

They also had a very peculiar reaction to being shot: exploding and scattering themselves over a wide area. All while bleeding acid. If you remember your physics classes, you should be aware of this thing called inertia: things in motion will continue in the same direction. You know which direction the Aliens will usually be moving towards when you shoot them? You. You know in which direction all the parts and blood will be moving? You!

In other words: even when you kill them, there's a good chance they'll kill you right back.


Acid rain. (click for details)

Spliiiish. AKA AARRGH!GETITOFF!GETITOFF!GETITOFF!!!.

The game's also quite hard. I'd almost say Nintendo Hard. AVP 1 had no in-level saving. I don't think there's ever been a PC FPS that didn't allow you to save at will. Combined with the fact that the characters were realistically weak (a rocket jump, would only get parts of you to far-off distances or heights; on average you remain in the same spot), the lack of saving increased the tension considerably.

Of course a sizable group of gamers, cowardly pussies that they are, complained and eventually a save feature was added later. Shame, really; the levels are short enough for it to work, and it's actually way more fun to play when you're running for dear life.


And then there's the motion tracker. If you've seen the movie Aliens, you'll know what I'm talking about. Basically, it's a device that measures how deep the shit you're in is. If it emits a low bup, you're safe; if it starts giving off a high-pitched beep (or worse, multiple beeps), you're in trouble.

This truly is the stuff of nightmares. There's actually something worse than darkness: darkness and having a reminder that you're probably going to die in the next few steps if you're not careful. This feeling was enhanced by the night vision goggles which turn off the tracker. So now you only know where something was, roughly, and you have to find it again. The motion tracker is without a doubt the single most evil and mind-screwing feature ever put in a game. For those who want to argue in favor of, say, Resident Evil or Silent Hill or other horror games: No! You're wrong! It really is that simple.


On second thought, there is something worse. It's called a facehugger. The spiderlike critters from the Alien series that jump you from out of nowhere and basically rape your face. The game has those too. To show just how bad these are, here's a little anecdote about my first encounter with one.

Why facehuggers are evil

It's the third Marine mission: Invasion. You have to get to the top of the tower for a rescue. Up to this point I had done what I'd always done in an FPS: make slow but steady progress to avoid any nasty surprises. From this level onward that strategy doesn't work anymore. At all.

The reason for that is that now the aliens start to respawn at semi-random locations. So not only am I a puny hooman faced with very quick aliens that can come from out of nowhere trying to take my head off, now there's an endless supply of them too and only one of me. Oh and did I mention there's no save? Not that it'd matter because they'd spawn at a different location anyway, but still.

In any case, at some point you'd clue in to the fact that the only way this is gonna work is to just make a blind run for it and hope pray curse for the best. Amazingly, this worked out pretty well. That is, until I took an elevator down to face this:


An open Alien egg. Initialize panic mode.

An open Alien egg. And I could hear the pitter-patter of tiny feet down the corridor to the right, which also lit up on the tracker. When turning round the corner, the sound became louder. But still, I could not actually see the little bastard yet. “Well. Shit.” is something of an understatement at this point.

But then I hear something above me as well: an Alien was climbing down to the room in the image. “Well. Shit.” has now become completely inappropriate and I headed back to the elevator room to kill it. I basically sprayed the whole room with fire, hoping I'd catch it at some point. And I did. I continued to dance around to avoid it until it burst. And then all was quite again. I didn't even hear the face hugger moving around anymore in the distance. Thinking all was safe, I turned round to hunt down the face hugger again an…

SSSSSSSSSSKKKKKKKKKKKKKKRRRRREEEEEEE
EEEEEEEEEEEEEEEAAAAAAAAAAAAAARRRRRRR
RRRGGGGGGGHHHHHHHHHHHHHHWITHTHEHURTING
ANDTHEPAINANDTHEDYINGOHBLOODYFUCKINGHELL
WHATWASTHAT?!?!?!!!ELEVENTYONE!!?!!

Turns out the reason I didn't hear the facehugger anymore wasn't that I'd killed it or that it had moved too far away, but because it was already in mid-jump. Not only did it get me, it got me completely by surprise and the blood-curdling scream it emitted actually made me fall off my chair. Literally. It scared me so much that I actually leaped out of my chair. It took about a minute before I could even hold the mouse again because my hands were shaking so much. No game has ever had that strong of an effect before or since. This was just awful. And yet awesome at the same time.

Facehuggers are just plain evil. Just hearing one moving in the area is enough to give me hives.


The only really bad thing about the game is that it won't play on current computers – some graphics and sound glitches that made crashed the game or made it unplayable. However, this has actually been remedied recently. People have been tinkering with the source code and fixed the most important issues. See forumplanet.gamespy.com/tech_support/b49029/1049364 for details and links to patches.

 

Compared to AVP 1, its sequel was, well, ultimately something of a let-down. It's still good playing, but I felt that it could have been so much more. Sure, it had better graphics. Well, more detailed models and textures anyway; unfortunately, the textures also looked really coarse and flat, and decals would often seem to be placed over the polygons, rather than on it, which just looked awful. AVP 1 had destructable light sources – something the Alien could make use of very well – but AVP 2 didn't. It also did not have adjustable gamma settings, which really hurt because often I literally could not see anything. And they took out the cheat modes and skirmish *sigh*.

The Aliens had also changed in some very bad ways. In the original, they were fast and furious, but in AVP 2 it often seemed thay they were just hobbling along on their way too skinny legs. Instead of looking like the vicious and fast killing machines, they came off more as clumsy puppies. Okay, yes, puppies with really sharp claws and teeth, but not the terrors they're supposed to be.

They also didn't explode into parts with that delightful crackling sound anymore, or bleed over everything (mostly you). Mostly they just flopped down. Also, there seemed to be only one or two death poses and I think only a single animation timer for all critters. Often you'd find yourself in a field of dead aliens which lay down in exactly the same way. I know it doesn't sound like a big deal, but little things like this can spoil the mood completely. Worst of all though was what they did with the facehuggers: they took away the scream when they kill you. This completely removed the scare factor :(

Having said that, it also did some things very right. There was one interconnected story line, with the threads of the three playmodes intersecting at several instances. Very nicely done. Also, as the Alien you actually played through the facehugger and chestburster stages. This was also fun.


Ugly texturing (click for details).

Synchonized dying.

Chestburster finding its way out.

Intersecting story line - Alien.

Intersecting story line - Marine.

Intersecting story line - Predator.

And now there's gonna be a third installment. Like in AVP 2, there will be an interwoven story, so that's good. From the pictures I've seen, the graphics are going to be awesome. Textures are crips, motion looks fluid – it just looks right again. It also looks like it's not going to be for the faint of heart, with lovely gratuitous displays of blood and guts and trophy-taking and everything (see here for video).

I was somewhat surprised to see Sega as the publisher for the game. It seemed a little odd at first, but if you take into account that they're a bunch of fucking sadists, I think it is actually quite fitting.

 

So yeah, I'm looking forward to this one.

new and improved geshi

2009-06-05 – 21:13 | .

With Tonc I pretty much did all the syntax highlighting of code manually. As you might expect, this experience was – well, the proper description is something not suitable for anyone under the age of several thousand, so let's keep it at “somewhat less than pleasant”. So the first thing I looked when starting this whole blogging gig for was something that could do that automatically. In my case, that was codesnippet, which was build on the very awesome Geshi. There were some small problems with number formatting and whitespace handling, but overall it's served me well.

The Geshi that came with it was … 1.0.7.20, I think. In any case, Geshi's is now at 1.0.8.3, so I figured it was time for an upgrade. Most notable was that the way numbers were parsed has been greatly modified, with different types of representations now being parsed separately – and correctly to boot. Right now, it's almost fully correct, as you can see from the list below:

// Regular int
123
123l
123L
123ll       // fail
123LL       // fail
123u        // fail
123U        // fail
+123
-123

// Octal
0123

// Hex
0x12
0x123
0x123.4

// Float
123.4
123.4f
123.4F
+123.4
-123.4
1.2e3
1.2E3
1.2e+3
1.2e-3

// Inner
(1.23)
abc123de

Only some of the more special integer literals aren't parsed correctly, specifically the unsigned (-U) and long long (-LL) suffixes aren't accepted. I don't suppose hex floats will work either, but that's a GCC extension anyway.

To fix this, you need to modify geshi a little; specifically the GESHI_NUMBER_INT_CSTYLE regular expression:

  GESHI_NUMBER_INT_CSTYLE =>
    '(?<![0-9a-z_\.%])(?<![\d\.]e[+\-])([1-9]\d*?|0)l(?![0-9a-z\.])',

… yeah. I'm not sure why it's formulated like that either. I'd have thought '\b' would have worked just as well, but alright. Anyway, notice the single 'l' character in there? That needs to be extended to something that matches a potential single 'u', possibly followed by one or two 'l's. In other words: 'u?l{0,2}'.

  GESHI_NUMBER_INT_CSTYLE =>
    '(?<![0-9a-z_\.%])(?<![\d\.]e[+\-])([1-9]\d*?|0)\<b\>u?l{0,2}\</b\>(?![0-9a-z\.])',

HTML in code

An astute readed may have noted the bold in the previous snippet. Normally, you can't do that in Geshi.. One of the things that Geshi does is translate HTML entities like '<' into things like "&lt;" so that it'll turn up right on the resulting page. This, of course, is one of the things Geshi is expected to do. However, in this case it also makes it impossible to add HTML parts in the code snippet, which at times can be very useful.

So what do we do now? Well, we can use escaped HTML tags. Much like "\n" doesn't actually mean backslash + 'n' but a newline character, "\<" can be used for the actual '<'. And to unescape that, a double backslash can be used, much like it is in C.

\\<b\\>BOLD\\</b\\>    becomes     \<b\>BOLD\</b\>

There are several ways to implement this. One would be to modify it in the geshi code. I haven't tried that route yet because I expect it could get messy. That's arguably how it should be done, but it's easier to do it after the fact: when all the conversions have been done. Basically, you need something like this:

// Initialize geshi with the text to convert and language file to use.
$geshi = new GeSHi($text, $lang, $this->geshi_path);

// This does the actual work.
$text= $geshi->parse_code();

// Replace (un)escaped html entities.
$text= str_replace(
    array(
        // Normal entities
        '\\\&lt;', '\\\&gt;', '\\\&amp;',
        // In-string escapes get crap added, gaddammittohell >_<.
        '<span class="es0"><</span>',
            '<span class="es0">></span>',
            '<span class="es0">&</span>',
        // Unescaped entities
        '\\\&', '\\\<', '\\\>'),
    array(
        '<'     , '>'     , '&',        // Normal entities
        '<'     , '>'     , '&',        // In-string entities.
        '\\\&amp;', '\\\&lt;', '\\\&gt;'    // Unescaped entities
        ),
    $text);

There are three sets of items to search & replace here. The first two are the basic escaped tag delimiters, so that they'll actually result in HTML tags, and unescaped delimiters, so that you can print the combination itself. The third category are for HTML in string literals. Since the backslash has a specific meaning there as well, Geshi puts some highlighting stuff around it that would make the standard search fail. So that whole thing would need to be searched for and destroyedreplaced.

It's ugly, I know, but it seems to work. It'd be nicer if this could be done in the parser itself, but I have a feeling that'd take changes in multiple places. Since I don't know the code that well yet, I'm not touching that one with a ten-foot pole.

Lastly, let's test the ARM asm highlighter:

// Regular int
123
123l
123L
123ll
123LL  
123u
123U
+123
-123

// Binary
0b01100110
0B10101010

// Octal
0123

// Hex
0x12
0x123
0x123.4

// Float
123.4
123.4f
123.4F
+123.4
-123.4
1.2e3
1.2E3
1.2e+3
1.2e-3

// Inner
(1.23)
abc123de

Still works too. Bitchin'.

DMA vs ARM9 - fight!

2009-05-28 – 23:07 | .

DMA, or Direct Memory Access, is a hardware method for transferring data. As it's hardware-driven, it's pretty damn fast(1). As such, it's pretty much the standard method for copying on the NDS. Unfortunately, as many people have noticed, it doesn't always work.

There are two principle reasons for this: cache and TCM. These are two memory regions of the ARM9 that DMA is unaware of, which can lead to incorrect transfers. In this post, I'll discuss the cache, TCM and their interactions (or lack thereof) with DMA.

The majority of the post is actually about cache. Cache basically determines the speed of your app, so it's worth looking into in more detail. Why it and DMA don't like each other much will become clear along the way. I'll also present a number of test cases that show the conflicting areas, and some functions to deal with these problems.

(more...)
Notes:
  1. Well, quite fast anyway. In some circumstances CPU-based transfers are faster, but that's a story for another day.

mode 7 addendum

2009-04-19 – 18:32 | .

Okay. Apparently, I am an idiot who can't do math.

 

One of the longer chapters in Tonc is Mode 7 part 2, which covers pretty much all the hairy details of producing mode 7 effects on the GBA. The money shot for in terms of code is the following functions, which calculates the affine parameters of the background for each scanline in section 21.7.3.

IWRAM_CODE void m7_prep_affines(M7_LEVEL *level)
{
    if(level->horizon >= SCREEN_HEIGHT)
        return;

    int ii, ii0= (level->horizon>=0 ? level->horizon : 0);

    M7_CAM *cam= level->camera;
    FIXED xc= cam->pos.x, yc= cam->pos.y, zc=cam->pos.z;

    BG_AFFINE *bga= &level->bgaff[ii0];

    FIXED yb, zb;           // b' = Rx(theta) *  (L, ys, -D)
    FIXED cf, sf, ct, st;   // sines and cosines
    FIXED lam, lcf, lsf;    // scale and scaled (co)sine(phi)
    cf= cam->u.x;      sf= cam->u.z;
    ct= cam->v.y;      st= cam->w.y;
    for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
    {
        yb= (ii-M7_TOP)*ct + M7_D*st;
        lam= DivSafe( yc<<12,  yb);     // .12f    <- OI!!!

        lcf= lam*cf>>8;                 // .12f
        lsf= lam*sf>>8;                 // .12f

        bga->pa= lcf>>4;                // .8f
        bga->pc= lsf>>4;                // .8f

        // lambda·Rx·b
        zb= (ii-M7_TOP)*st - M7_D*ct;   // .8f
        bga->dx= xc + (lcf>>4)*M7_LEFT - (lsf*zb>>12);  // .8f
        bga->dy= zc + (lsf>>4)*M7_LEFT + (lcf*zb>>12);  // .8f

        // hack that I need for fog. pb and pd are unused anyway
        bga->pb= lam;
        bga++;
    }
    level->bgaff[SCREEN_HEIGHT]= level->bgaff[0];
}

For details on what all the terms mean, go the page in question. For now, just note that call to DivSafe() to calculate the scaling factor λ and recall that division on the GBA is pretty slow. In Mode 7 part 1, I used a LUT, but here I figured that since the yb term can be anything thanks to the pitch you can't do that. After helping Ruben with his mode 7 demo, it turns out that you can.

 

Fig 1. Sideview of the camera and floor. The camera is tilted slightly down by angle θ.

Fig 1 shows the situation. There is a camera (the black triangle) that is tilted down by pitch angle θ. I've put the origin at the back of the camera because it makes things easier to read. The front of the camera is the projection plane, which is essentially the screen. A ray is cast from the back of the camera on to the floor and this ray intersects the projection plane. The coordinates of this point are xp = (yp, D) in projection plane space, which corresponds to point (yb, zb) in world space. This is simply rotating point xp by θ. The scaling factor is the ratio between the y or z coordinates of the points on the floor and on the projection plane, so that's:

\lambda = y_c / y_b,

and for yb the rotation gives us:

y_b = y_p cos \theta + D sin \theta,

where yc is the camera height, yp is a scanline offset (measured from the center of the screen) and D is the focus length.

Now, the point is that while yb is variable and non-integral when θ ≠ 0, it is still bounded! What's more, you can easily calculate its maximum value, since it's simply the maximum length of xp. Calling this factor R, we get:

R = \sqrt{max(y_p)^2 + D^2}

This factor R, rounded up, is the size of the required LUT. In my particular case, I've used yp= scanline−80 and D = 256, which gives R = sqrt((160−80)² + 256²) = 268.2. In other words, I need a division LUT with 269 entries. Using .16 fixed point numbers for this LUT, the replacement code is essentially:

// The new division LUT. For 1/0 and 1/1, 0xFFFF is used.
u16 m7_div_lut[270]=
{
    0xFFFF, 0xFFFF, 0x8000, 0x5556, ...
};


// Inside the function
    for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
    {
        yb= (ii-M7_TOP)*ct + M7_D*st;           // .8
        lam= (yc*m7_div_lut[yb>>8])>>12;        // .8*.16/.12 = .12
       
        ... // business as usual
    }

At this point, several questions may arise.

  • What about negative yb? The beauty here is that while yb may be negative in principle, such values would correspond to lines above the horizon and we don't calculate those anyway.
  • Won't non-integral yb cause inaccurate look-ups? True, yb will have a fractional part that is simply cut off during a simple look-up and some sort of interpolation would be better. However, in testing there were no noticeable differences between direct look-up, lerped look-up or using Div(), so the simplest method suffices.
  • Are .16 fixed point numbers enough?. Yes, apparently so.
  • ZOMG OVERFLOW! Are .16 fixed point numbers too high? Technically, yes, there is a risk of overflow when the camera height gets too high. However, at high altitudes the map is going to look like crap anyway due to the low resolution of the screen. Furthermore, the hardware only uses 8.8 fixeds, so scales above 256.0 wouldn't work anyway.

And finally:

  • What do I win? With Div() m7_prep_affines() takes about 51k cycles. With the direct look-up this reduces to about 13k: a speed increase by a factor of 4.
 

So yeah, this is what I should have figured out years ago, but somehow kept overlooking it. I'm not sure if I'll add this whole thing to Tonc's text and code, but I'll at least put up a link to here. Thanks Ruben, for showing me how to do this properly.

Even though the total size of code is usually small compared to assets, there are still some concerns about a number of systems. Most notably among these are stdio, iostream and several STL components like vectors and strings. I've seen people voice concerns about these items, but I don't think I've ever seen any measurements of them. So here are some.

Table 1 : Memory footprint of some C/C++ components in bytes. Items may not be strictly cumulative.
Barebones: just VBlank code14516
base+printf71148
base+iprintf54992
base+iostream266120
base+fopen56160
base+fstream260288
base+<string>59384
base+<vector>59624
base+<string>+<vector>59648

The sizes in Table 1 are for a bare source file with just the VBlank initialization and swiWaitForVBlank() plus whatever's necessary to use a particular component. For the IO parts this means a call to consoleDemoInit(); for vectors and strings, it means defining a variable.

 

Even an empty project is already 15k in size. Almost all of this is FIFO code, which is required for the ARM9 and ARM7 to communicate. Adding consoleDemoInit() and a printf() call adds roughly 71k. Printf has a lot of bagage: you have to have basic IO hooks, character type functions, allocations, decimal and floating point routines and more.

Because printf() uses the usually unnecessary floating point routines for float conversions, it is often suggested to use the integer-only variant iprintf(). In that case, it comes down to 55k. The difference is mostly due to two functions: _vfprintf_r() and _dtoa_r(), for 5.8k and 3.6k, respectively. The rest is made up of dozens of smaller functions. While the difference is relatively large, considering the footprint of the other components, the extra 16k is probably not that big of a deal. For the record, there is no difference in speed between the two. Well, almost: if the format string doesn't contain formatting parts, printf() is actually considerably faster. Another point of note is that the 55k for iprintf() is actually already added just by using consoleDemoInit().

And now the big one. People have said that C++ iostream was heavy, and indeed it is. 266k! That's a quite a lot, especially since the benefits of using iostream over stdio is rather slim if not actually negative(1). Don't use iostream in NDS projects. Don't even #include <iostream>, as that seems enough to link the whole thing in.

Related to iosteam is fstream. This also is about a quarter MB. I haven't checked too carefully, but I think the brunt of these parts are shared, so it won't combine to half a Meg if you use both. Something similar is true for the stdio file routines.

Why are the C++ streams so large? Well, lots of reasons, apparently. One of which is actually its potential for extensibility. Because it doesn't work via formatting flags, none of those can be excluded like in iprintf()'s case. Then there's exceptions, which adds a good deal of code as well. There also seems to be tons of stuff for character traits, numerical traits, money traits (wtf?!?) and iosbase stuff. These items seem small, say 4 to 40 bytes, but when there are over a thousand it adds up. Then there's all the stuff from STL strings and allocators, type info, more exception stuff, error messages for the exceptions, date/time routines, locale settings and more. I tell you, looking at the mapfile for this is enough to give me a headache. And worst of all, you'll probably use next to none of it, but it's all linked in anyway.

Finally, some STL. This is also said to be somewhat big-boned, and yes it isn't light. Doing anything non-trivial with either a vector or string seems to add about 60k. Fortunately, though, this is mostly the same 60k, so there are not bad effects from using both. Unfortunately, I can't really tell where it's all going. About 10k is spent on several d_*() routines like d_print(), which is I assume debug code. Another 10k is exceptions, type info and error messages and 10 more for. But after that it gets a little murky. In any case, even though adding STL strings and vectors includes more that necessary, 60k is a fair price for what these components give you.

 

Conclusions

The smallest size for an NDS binary is about 14k. While printf() is larger than iprintf(), it's probably not enough to worry about, so just use printf() until it becomes a pressing matter (and even then you could probably shrink down another part more easily anyway). There is no speed difference between the two. The C++ iostream and fstream components are not worth it. Their added value over printf and FILE routines are small when it comes to basic IO functionality. STL containers do cost a bit, but are probably worth the effort. If you need more than simple text handling or dynamic arrays and lists, I'd say go for it. But that's just my opinion.

Please note, the tests I did for this were fairly roughly. Your mileage may vary.

 

Lastly. The nm tool (or arm-eabi-nm for DKA) is my new best friend for executable analysis. Unlike the linker's mapfile, nm can sort addresses and show symbol sizes, and doesn't include tons of crap used for glue.


Notes:
  1. Yes, I said negative. Even though it has the benefit of being typesafe and extensible, the value of these advantages are somewhat exaggerated, especially since it has several drawback as well (see FQA:ch 15 for details). For one thing, try finding the iostream equivalent of "%08X", or formatted anything for that matter. For early grit code I was actually using iostream until I got to actually writing the export code. I couldn't move back to stdio fast enough.
« Previous PageNext Page »

Powered by WordPress