The Docs

Bit late, yes, but now that I have my second document, I feel I should perhaps say something about the category. In broad lines, the site is divided into three sections. First there are ‘short bits’, which are, well, short pieces of text, from simple announcements and comments to maybe a few pages worth of functions on various subjects, like a fast memset for ARM systems. This basically corresponds to a WordPress ‘Post’. Then there are ‘collections’ – groups of pages on a subject which could serve as a standalone site. Tonc would be a good example of this. I suppose you could call these ‘books’. The ‘documents’ are somewhere in between – too long and relevant to be posts, but, being singular pages, too short to be books. A book is what a document wants to be when it grows up.


Anyway, right now I have two documents. The first is a collection of bit tricks: little snippets of code on bit-fiddling. No, they are not futile exercises, they're programming-puzzles and they're clever little hacks for extra speed. OK, maybe they're just a little futile >_>.


The second document is about 20 pages worth of Annotations to “Programming the Nintendo GameBoy Advance”, Mr Harbour's popular E-book on GBA programming. As slick as it looks, the book has some very serious flaws like incorrect information, non-functional code and teaches some bad programming habits. This document points them out.

I've actually been sitting on this one for quite a while now. I may have the social graces of a bent teaspoon, but even I know that pointing out errors in code written by a professional that beginner C courses warn about isn't a nice thing to do. Of course, that's just part of why I'm releasing this: the book certainly looks like a good resource for GBA programming if you're new to the GBA, especially if you're new to programming in general. It'd kinda suck if you had to unlearn large parts later because the information wasn't up to standards. These annotations should help you recognize most of its errors and steer around them.

New memset16 routine

The old tonclib's memset16() was a Thumb/ROM function, but called memset32() if it was desirable. Below you can find an ARM-only version. This time round, I chose to do all the real work inside the 16-bit version, and let memset32() jump into the middle of memset16(). Combined, these are less than 32 instructions, which should make cache happy as well. The main difference in performance is in the lower counts: the function overhead is about 100 cycles lower than before.

@ ---------------------------------------------------------------------
@ CODE_IN_IWRAM void memset32(void *dst, u32 fill, size_t wcount)
@ ---------------------------------------------------------------------
    .section .iwram, "ax",%progbits
    .arm
    .align
    .global memset32
memset32:
    mov     r2, r2, lsl #1
    cmp     r2, #16
    bhs     .Lms16_entry32
    b       .Lms16_word_loop

@ ---------------------------------------------------------------------
@ CODE_IN_IWRAM void memset16(void *dst, u16 fill, size_t hwcount)
@ ---------------------------------------------------------------------
    .section .iwram, "ax", %progbits
    .arm
    .align
    .global memset16
memset16:
    cmp     r2, #0              @ if(count != 0)
    movnes  r3, r0, ror #2      @   if(dst && (dst&1))
    strmih  r1, [r0], #2        @   {   *dst++= fill;   count--;    }
    submis  r2, r2, #1          @ if(count == 0 || dst == NULL)
    bxeq    lr                  @   return;

    orr     r1, r1, lsl #16     @ Prep for word fills.
    cmp     r2, #16
    blo     .Lms16_word_loop
.Lms16_entry32:

    @ --- Block run ---
    stmfd   sp!, {r4-r8}    
    mov     r3, r1
    mov     r4, r1
    mov     r5, r1
    mov     r6, r1
    mov     r7, r1
    mov     r8, r1
    mov     r12, r1
.Lms16_block_loop:
        subs    r2, r2, #16
        stmhsia r0!, {r1, r3-r8, r12}
        bhi     .Lms16_block_loop
    ldmfd   sp!, {r4-r8}
    bxeq    lr
    addne   r2, r2, #16         @ Correct for overstep in loop

    @ --- Word run (+ trailing halfword) ---
.Lms16_word_loop:
        subs    r2, r2, #2
        strhs   r1, [r0], #4
        bhi     .Lms16_word_loop
    strneh  r1, [r0], #2        @ r2 != 0 means spare hword left
    bx  lr

@ EOF

As usual, I'm being somewhat dirty with how the assembly works. In the first 5 instructions of memset16(), I'm doing several things in one go: testing the destination (and count) for 0, doing a single halfword write for non-word aligned destinations, and returning if afterwards the count is 0, return from the routine. I can do all this in five instructions through clever manipulation of conditionals.

The instructions that make up the main loop are a little non-standard as well. Here's how it works and why:

  1. This is how it works and why: Reduce fill-count, C, by N hwords. Note that C need not be a multiple of N. This is important.
  2. Fill N halfwords if C>=N. It's `>=', not `>', because C==N indicates the last stretch. It's also not `!=', because C need not be a multiple of N.
  3. Loop as long as C>N. In this case it is `>', because C==N indicates the last full stretch.
  4. Now it's time for the residuals. If C%N==0, then we're finished, so it's time to return.
  5. However, if there were residuals, then C>0, thanks to the last subtraction inside the loop. So we have to correct for it by adding N again.

The standard method is splitting possible residuals first, but this version is shorter and allows for earlier escaping. A second benefit is that you can use non-power of two values for N as well. It is possible, for example, to use a 12-fold stmia here with only a few changes. The lower number of loops means that this would be ~10% faster … eventually. It really depends on things like memory waitstates whether the 12-fold version is worthwhile.


Oh, the highlighting was done by geshi as well. Making that arm-asm highlighter turned out very easy indeed.

EDIT, 2007-12-07

There was a small bug in the version above. r12 was used but never initialized. I know I had it in there when I tested it, but somehow it got lost.

Code highlighting. Neat.

I use quite a bit of code in my documents. Now, you can't exactly copy code from an editor to an HTML page … at least, not if you want formatting to be maintained. Before now, used my standard text editor to convert it to html, and then post-process to remove excess styles and such. This worked, but it's still a little cumbersome.

After some searching, I found GeSHi, a php package that will convert and highlight code for use on the web and it is very customizable as well. Very nice. Before I had even looked into how I could use this, I found out that there is a WP plugin that uses it as well: IG-syntax hiliter. So now I can do this:

Keywords:
void for return int

Comments:
/* cmt */
// cmt

Strings/characters:
"hello"
'x'

Numbers:
12345, , +12, -12, 01234, 0x12ab34, 0X12AB34,
1.234, 1.2e3, 1.2e-3

/* Affine tilemap demo */
void test_tte_ase()
{
    // Base inits
    irq_init(NULL);
    irq_add(II_VBLANK, NULL);
    REG_DISPCNT= DCNT_MODE1 | DCNT_BG2;

    // Init affine text for 32x32t bg
    tte_init_ase(2, BG_CBB(0) | BG_SBB(28) | BG_AFF_32x32,
        0, CLR_YELLOW, 0xFE, NULL, NULL);

    // Write something
    tte_write("\\{P:120,80}o");
    tte_write("\\{P:72,104}Round, round, \\{P:80,112}round we go");

    AFF_SRC_EX asx= { 124 << 8, 84 << 8, 120, 80, 0x100, 0x100, 0 };
    bg_rotscale_ex(&REG_BG_AFFINE[2], &asx);

    // Rotate it
    while(1)
    {
        VBlankIntrWait();
        key_poll();

        asx.alpha += 0x111;
        bg_rotscale_ex(&REG_BG_AFFINE[2], &asx);

        if(key_hit(KEY_START))
            break;
    }
}

That should be C code. And by golly it works :). Now I just have to make something for ARM asm.


There are some caveats to the plugin and geshi, though.

  • On the plugin side, it works as an extra filter in the displaying process. However, when writing the code in the post, that'll be standard C, complete with brackets and ampersands and such. So it is vitally important to turn off the visual editor and auto-validating of XHTML. It you don't, there will be trouble when you save the post.
  • By default, the plugin uses line numbers (bleh) and a few other extras that clutter the actual code. So I turned those off in the plugin options. However, this wasn't enough for everything. The geshi settings for number parsing and using CSS-classes highlighting are also turned off, and if you want those (and I do), you'll have to make a few changes to the plugin manually. In particular, I needed to add `$geshi->enable_classes(true)' and `$geshi->set_number_highlighting(true)'.
  • The regexp for numbers in geshi is incomplete: it doesn't do hex or floats, for example. In parse_non_string_part() use this instead: [php] $reg= "#\\b((0[xX][0-9A-Fa-f]+)|([0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)|([0-9]))\\b#"; $stuff_to_parse= preg_replace($reg, "<|/NUM!/>\\1|>", $stuff_to_parse); [/php]

Regexp help courtesy of www.regular-expressions.info.