Grit 0.8.6 : synchronization update

This is just an update to synchronize what I have with devkitPro's distribution of grit. This includes updates to the makefile, and turning back the way the size-constant was defined back to a #define. Apparently, consts aren't constanty enough for C compilers for use in array declarations. Shame, I would have liked to get rid of macros as much as possible :(.


In any case, the two versions should be identical again (with one small exception, namely that my version emits a .size directive for assembly, but that's a minor something that should not affect anyone.)

grit 0.8.4 (out with the old bugs, in with the new)

Okay, so it's been a while, but there's finally a new version for grit.

 

First of all, the vector::insert should finally be fixed. And there was much rejoicing. I've also added an option for forcing the map palette-index (-mp <num>, which should help with NDS backgrounds that use ext-palettes.

 

Also – and this one is pretty big – I've completely replaced the tile-mapping routines for something more general. The new method should be able to handle variable-sized tiles (-tw <n> and -th <n>) and is mostly independent of bitdepth. Specifically, bitdepths over 8 bpp can be handled as well, at least in principle. It also means that the external tileset can be a metatile-set as well now, which is good if you're using metatiles.

With this new method also comes a way to create a custom bitformat for maps (-mB flag). I'm not entirely sure how this can be used yet, but using more than 10 bits for the tile-index, or a 1bpp collision map should be possible now.

Since this is a fairly major change, I kinda expect there's still some bugs in the system. I have tested it for a number of options, but you know how it is with multi-platform stuff. In particular, if any of you big-endian-system users have trouble now, this will probably be the cause.

And now I will leave you with a …

Define: overthinking

<ramble>

I think too much. Or so people keep telling me. And they may have a point.

Anyway, so I'm working on this assembly version of toncset for, well, just because I guess. A fill routine has 3 parts: a head, a main run and a tail. The main part fills 32-bit chunks (words) when there's more than 4 bytes left and when the destination is word-aligned. This part is easy, because you can just dump words with stmia or str. For an example of this, see my older memset16 post.

The tail is for the remaining bytes after the main part. Under normal circumstances you could do this byte for byte, but some sections of GBA/NDS memory do not take kindly to byte-writes, so you'd have to read the word, mask the appropriate bytes out/in and then write it back. This is mostly just annoying, but still very doable.

The head part, however, is both annoying and tricky. It consist of filling the unaligned bytes in the first word of a run so that the main part can do its thing. This is similar to the tail part, in that it requires bitmasks. However, it's also possible that both the beginning and and of a run occur in the same word, effectively making the head the tail as well, so you'd have to apply a double mask … somehow. In C, it looks something like this:

/* Input:
  void *dstv;   // potentially non-aligned
  u32 fill;     // Already extended to full 32-bit if appropriate.
  uint size;    // > 0
*/


u32 addr= (u32)dstv;
int left= addr&3;
if(left != 0)
{
    u32 *dst= (u32*)(addr&~3);
    int right= left+size;
    u32 mask= 0xFFFFFFFF;

    if(right < 4)   // Everything in a single word: head and tail.
    {
        mask= ~(mask<<8*size);          // Create right-mask 000F
        mask <<= 8*left;                // Create mid mask 00F0
        *dst = (*dst &~ mask) | (fill & mask);
        return;
    }
    else
    {
        mask <<= 8*left;                // Create left mask FFF0
        *dst = (*dst &~ mask) | (fill & mask);
        size= right-4;
        *dst++;
    }
}

This bit of C translates roughly into the following ARM code:

    @ Reglist
    @ r0 : dst
    @ r1 : src
    @ r2 : size
    @ r3 : left (dst&3) / right (left+size) / data
    @ r4 : lshift (left*8)
    @ r5 : rshift (right*8) / mask
    @ ip : maskBase (0xFFFFFFFF)
.Lfgset_head:
    bic     r0, r0,#3               @ Align dst
    mvn     ip, #0                  @ Set-up mask (0xFFFFFFFF)
    mov     r4, r3, lsl #3          @ (left*=8) != 0 : right-side only
    add     r3, r3, r2              @ right= left+size
    cmp     r3, #4

    @ <= 4 : single-word. Shrink mask, do usual and quit early
    movlo   r5, r2, lsl #3          @ \.
    mvnlo   ip, ip, lsl r5          @ - r5= ((1<<8*size)-1)<<(8*left);
    mov     r5, ip, lsl r4          @ /

    subhi   r2, r3, #4              @ Adjust size for follow-ups

    @ Mask in r1 and write back to dst
    ldr     r3, [r0]
    bic     r3, r3, r5
    and     r5, r1, r5
    orr     r3, r5
    str     r3, [r0], #4

    bhi     .Lfgset_main            @ Longer stretch : go back.
    bx      lr                      @ Single-word fill : finished.

The main thing I thought when I'd written this down was “meh”. You will note that registers r4 and r5 are used here, which means stack-work (omitted here for brevity). The positioning of pushing and popping makes everything a little awkward, so I was off looking for something else.

The essence of the problem here is that you can only use five registers without touching the stack: r0-r3 and ip (r12). Now, r0-r2 are taken by the destination, fill and size, so I can't do anything with those. I also need one to store the left-edge (r3 in this case), leaving us with one for the right edge, the left and right shift intermediaries, and the left and right masks. Right! Crap -_-.


So; strategies. Well, one of the reasons I need to use so many registers is because the lifetimes overlap. For example, I still need left for a while because shifting the mask up comes last here. I can't use r2 for multiple purposes either because I'll need it for the size. Now, I could free up r3 by making the left-mask first, but then I might get in trouble when creating the right-mask. Also, right-4 is actually what size wants to be when it grows up in the long-run case, so I can use that there as well. I'd just have to undo that for the short case, or perhaps even create the right-mask from negative numbers.

At this point I figured it would be helpful to take a look at the various ways of creating the masks I needed. The standard form for a 0x000000FF-style bitmask is (1<<x)-1, but there are others as well. The following list holds a few examples.

// Bitmask examples. Assume x=8 (in r1)

 ( 1<<x)-1;     // 000000FF.    mov r0, #1; rsb r0, r0, r0, lsl r1;
~(-1<<x);       // 000000FF.    mvn r0, #0; mvn r0, r0, lsl r1;
  -1<<x;        // FFFFFF00.    mvn r0, #0; mov r0, r0, lsl r1;
-( 1<<x);       // FFFFFF00.    mov r0, #1; sub r0, r0, r0 lsl r1; sub r0, #1
  -1>>x;        // 00FFFFFF.    mvn r0, #0; mov r0, r0, lsr r1;
~(-1>>x);       // FF000000.    mvn r0, #0; mvn r0, r0, lsl r1;

All of these use +1 or −1 as their base, and all but one is a two-instruction affair. The left-mask looks like 0xFFFFFF00, so the most obvious one to pick here is -1<<x. Technically, the right-mask is 0x000000FF, using x = 8*right = 8*(left+size). However, you can also see it as a 0x00FFFFFF-style mask if you use 4-right. This solves two problems at once. First, it is the negative of the new size, so the value is readily available. Second, this mask is a right-shifted 0xFFFFFFFF, but as the lower bits are shifted out anyway, it doesn't actually have to be a proper 0xFFFFFFFF; it can be a 0xFFFFFF00 as well, which we have in the form of the left-mask. In other words, we don't require as many registers for temps because we already have everything we need. The resulting code is this:

    add     r2, r3, r2              @ (1a) right := left+size.
    movs    r3, r3, lsl #3
    mvn     ip, #0                  @ ip= -1
    mov     r3, ip, lsl r3          @ lmask := (-1)<<(8*left)
    subs    r2, r2, #4              @ (1b) aligned size= right-4

    @ new size < 0 : single-word fill
        rsblo   r2, r2, #0          @ - (2) r3= -8*size
        movlo   r2, r2, lsl #3      @ /
        andlo   r3, r3, r3, lsr r2  @ (3) mask = lmask & rmask
   
    @# inserts and jumps

See? No r4 and r5 anywhere. The key is toying around with r2 and r3. While r2 is reserved for the size, it needs to modified anyway to account for the work done here. In the end, size should be right−4, which is what points (1a) and (1b) do. Since right-4 is a right<4 as well, we can use its result as the condition for the special case; the result being the negative distance from the word-edge. As explained above, the right-mask can be constructed from the left-mask by lmask>>(-8*size), which is done at points (2) and (3).

It's a little hairy, but it works. And yet, it still evoked a feeling of “meh” like before. It's the two instructions at point (2) that annoyed me. The reason it's two instructions and not one is because you can't multiply by −8 in one go. By +8, yes: that's a shifted-mov; −1, yes: that's rsb, r2, #0; but the combination is difficult because the shift only applies to the second operand. A sub r2, #0, r2, lsl #3 would do it, but the first operand needs to be a register and I don't have a spare one with zero in it. I could make one, that just means I have an extra instruction somewhere else. I do, however, have either a +1 or −1 in ip, maybe I can use that somehow. And then it hits me: the carry flag!

There are adc, sbc and rsb instructions that add C, C−1 and C−1 to the result, respectively. Setting or clearing the carry flag is easy, so that's not a problem. All I need now is to start the flag using +1 instead of −1 to cancel out the −1 in sbc or rsc. As it turns out, I can do that I use this format for the left-mask: -(1<<x). In the mask overview above I listed thi as a 3-instruction gig, but as it turns out I can use the carry-trick here to for one instruction less. The final version (for just the head part) looks like this:

    ands    r3, r0, #3
    beq     .Lfgset_main            @ Jump to main stint is aligned

.Lfgset_head:
    bic     r0, r0, #3
    add     r2, r3, r2
    movs    r3, r3, lsl #3          @ left*8 ; clear carry
    mov     ip, #1
    sbc     r3, ip, ip, lsl r3      @ -(1<<8*left) +1-1
    subs    r2, r2, #4              @ size= right-4

    @ If negative (==carry clear), this is a single-word fill
    @ This requires a truncated mask (like 0x0000FF00)
        sbclo   r2, ip, r2, lsl #3      @ x= -8*size +1-1
        andlo   r3, r3, r3, lsr r2      @ mask= mask & (mask>>-8*size);

    @ Insert and jump to main stint if available.
.Lfgset_insert:
    ldr     ip, [r0]
    bic     ip, ip, r3
    and     r3, r1, r3
    orr     ip, ip, r3
    str     ip, [r0], #4

    bhi     .Lfgset_main        @ Longer stretch
    bx      lr                  @ Single-word fill : finished.

Sweeeet :). I was happy with this, until I realized what I'd been working on: an exception of an exception. This would definitely not be part of the 20% of the code that uses 80% of the runtime, so it's really not something one should worry about. Interesting, yes, and I learned a few new tricks, but perhaps time would have been better spent on getting 5% extra out of the main loop. The only problem there is that that is just boring old unrolling a bit, whereas the head presented a more ‘interesting’ problem so I went for that instead.

So yeah; I think too much >_>

</ramble>

And this year^Hdecade's award for irony goes to ...

Expelled : No intelligence allowed! Give them a hearty round of scorn and ridicule, folks.


OK, perhaps a bit of backstory is in order here.

There's this strange thing going on in the USA known as the Creation-evolution controversy. In a nutshell, on one side you have the Theory of Evolution, accepted by all but a fraction of the scientists and supported by evidence from multiple fields; and on the other you have groups (usually motivated by their religion or ignorance and frequently both) screaming “nuh-uh!”, backed up by arguments ranging from utterly insane, to fabrications, misunderstandings, red herrings, and “I dunno, Magic Man dun it”. No, I'm not embellishing here: there are long lists of creationists claims that often make no sense at all, but are still used even after being debunked decades ago. As an example how silly these can be, consider the Banana argument. And no, this is not a parody; they're absolutely serious.

After the scientists got so fed up by the constant misrepresentations that they won't even debate anymore and several defeats in courts, the creationists came up with a new strategy: Intelligent Design (ID). They've been quite clever with this, actually. For one, they're leaving the Bible out of it and claiming that life's complexity can only come about through an intelligent, yet unnamed (wink, wink), designer. They also claim that all they want is a fair hearing; that the scientists are being mean with their insistence on evidence and refusal to accept bogus reasoning.


Enter Expelled. You'd have to read the wikipedia page for details, but the idea behind the movie is to highlight this repression by scientists; that the evolutionists are actively ‘expelling’ people critical of evolution. It so happens that they've interviewed a number of evolutionary biologists (under false pretenses) for their views on the subject. This is now widely regarded as a bad move.

You see, one of them happened to be PZ Myers, a vocal critic of creationism and other irrationality on his blog pharyngula. You can see how hostile he is in this video (I'd point to the Expelled trailers with the interview, but they pulled it). Ever since the interview, he and other science-bloggers have been keeping an eye on the movie, pointing out flaws in the producers' arguments whenever they went public with anything.

And now it gets interesting. About two weeks ago, there was a screening of the movie in Minneapolis. He reserved a place via their website, went to the theatre … but was barred from entering. To spell it out: here's a movie accusing evolutionists of expelling their critics, expelling their critics. And then lying about it afterwards! Repeatedly! Seriously, you just can't make this stuff up.

Naturally, this is now all over the blogosphere. The original account has well over 1600 comments. Many other science bloggers have commented on it as well. Greg Laden's blog has a list of over 100 links, including to stories from the NY Times and Salon. Another interesting detail is that Myers was accompanied by Richard Dawkins (yes, that Richard Dawkins), who did get in. The two had a nice little discussion afterwards, which can be seen here.


Other interesting links


Code highlighting. Neat.

I use quite a bit of code in my documents. Now, you can't exactly copy code from an editor to an HTML page … at least, not if you want formatting to be maintained. Before now, used my standard text editor to convert it to html, and then post-process to remove excess styles and such. This worked, but it's still a little cumbersome.

After some searching, I found GeSHi, a php package that will convert and highlight code for use on the web and it is very customizable as well. Very nice. Before I had even looked into how I could use this, I found out that there is a WP plugin that uses it as well: IG-syntax hiliter. So now I can do this:

Keywords:
void for return int

Comments:
/* cmt */
// cmt

Strings/characters:
"hello"
'x'

Numbers:
12345, , +12, -12, 01234, 0x12ab34, 0X12AB34,
1.234, 1.2e3, 1.2e-3

/* Affine tilemap demo */
void test_tte_ase()
{
    // Base inits
    irq_init(NULL);
    irq_add(II_VBLANK, NULL);
    REG_DISPCNT= DCNT_MODE1 | DCNT_BG2;

    // Init affine text for 32x32t bg
    tte_init_ase(2, BG_CBB(0) | BG_SBB(28) | BG_AFF_32x32,
        0, CLR_YELLOW, 0xFE, NULL, NULL);

    // Write something
    tte_write("\\{P:120,80}o");
    tte_write("\\{P:72,104}Round, round, \\{P:80,112}round we go");

    AFF_SRC_EX asx= { 124 << 8, 84 << 8, 120, 80, 0x100, 0x100, 0 };
    bg_rotscale_ex(&REG_BG_AFFINE[2], &asx);

    // Rotate it
    while(1)
    {
        VBlankIntrWait();
        key_poll();

        asx.alpha += 0x111;
        bg_rotscale_ex(&REG_BG_AFFINE[2], &asx);

        if(key_hit(KEY_START))
            break;
    }
}

That should be C code. And by golly it works :). Now I just have to make something for ARM asm.


There are some caveats to the plugin and geshi, though.

  • On the plugin side, it works as an extra filter in the displaying process. However, when writing the code in the post, that'll be standard C, complete with brackets and ampersands and such. So it is vitally important to turn off the visual editor and auto-validating of XHTML. It you don't, there will be trouble when you save the post.
  • By default, the plugin uses line numbers (bleh) and a few other extras that clutter the actual code. So I turned those off in the plugin options. However, this wasn't enough for everything. The geshi settings for number parsing and using CSS-classes highlighting are also turned off, and if you want those (and I do), you'll have to make a few changes to the plugin manually. In particular, I needed to add `$geshi->enable_classes(true)' and `$geshi->set_number_highlighting(true)'.
  • The regexp for numbers in geshi is incomplete: it doesn't do hex or floats, for example. In parse_non_string_part() use this instead: [php] $reg= "#\\b((0[xX][0-9A-Fa-f]+)|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)|([0-9]))\\b#"; $stuff_to_parse= preg_replace($reg, "<|/NUM!/>\\1|>", $stuff_to_parse); [/php]

Regexp help courtesy of www.regular-expressions.info.

LOL wordpress

Ever since I started my non-university site, I've been meaning to create a nice DB-driven back-end to manage it all, perhaps with some bells and maybe even whistles. However, I never came round to doing so because, well, I'm not exactly fond of building interfaces and there was other stuff to do as well. It also seemed a bit silly to do all that work myself when there are other packages out there that do everything for you. So in the end I decided to follow in the herd down to wordpress.

It's … nice. Changing styles is easy and you can build your own page templates and text filters to customize everything. One of my worries was that it'd limit me in my post formatting, but I can do everything in HTML just like I always did … once I turned off the visual editor and removed the wpautop and wptexturize filters. Of course, one of the downsides of using someone else's system is that you have to learn how everything works. So at the moment everything's still pretty simplistic. I'm not entirely sure if the stuff I have can really be presented the way I want by wordpress, but I seem to get by so far. At least I'll be able to update a little easier now.


So, this is what my site looks like now. Yeah I know, bland site is bland, but at least it works. There's posts over here and a navbar over there . You may notice that there are two links to ‘projects’ there – that's something wordpressy. It makes a division between ‘Posts’, basically journal items with timestamps, and ‘Pages’, which are standalone. Tonc would be an example of Pages. The projects page and documents would be part of that too. Updates on those items, however, would be announced as posts.


Oh, and yes, I have been a little revisionist in the post orders. The past month or so has been a sort of trial run for all this stuff so there are many posts all at once right now. We apologize for the inconvenience.

wordpress 2.3 HTML entity bug

There is/was a rather annoying bug in wordpress 2.3. Normally when editing pages in the advanced editor, the actual text of the post/page has to be preprocessed to convert things like &times; to &amp;times; so that it wouldn't show up in the editor as ×, and ditto for angle brackets. The extra ampersands and brackets would then be removed before saving. However, in the 2.3 upgrade a few things in the core structure of post/page retrieval had been changed and the somehow the the page-retrieval didn't do the pre-processing anymore. The upshot was that the amps and brackets got converted to normal ascii and unicode, seriouslycompletely messing up your pages.

I managed to track the problem to get_post() and get_page() in wp-includes/post.php. get_post() did get the upgrade, but get_page() didn't. The latter needs to be updated to carry and use a third parameter, and call sanitize_post() like get_post() does.

Of course, right after doing this I found out that a fix was already present in the SVN. Just update the file with this one. Ain't that always the way.


To see if you are affected as well, make a new page and input the following:

&times; &gt; entities

Then hit `Save and Continue Editing'. If the text has changed to “× > entities”, you're in trouble: every page you edit in that state will have its entities converted. Get the fixed file as soon as possible.