The Annotated "Programming the Nintendo GBA : The Unofficial Guide"


“Programming the Nintendo GameBoy Advance: The Unofficial Guide” by J. Harbour is an e-book on GBA programming covering some of the basic and not-so basic items involved in GBA programming. It's a clear, easy to understand and professional-looking book; one that is used by many people taking their first steps into GBA and if I recall correctly several colleges.

That said, it is not without issues. Yes, it does teach the basics of GBA programming, but it also:

  • Focuses on the wrong basics (bitmap modes, which are hardly ever used in commercial games).
  • Uses inconsistent nomenclature in code and other bad programming practices.
  • Contains code that is simply incorrect (does not compile or worse).
  • Gives incorrect advice and information.
  • Uses inefficient code, even when optimizing it even a little would not have been difficult, or even recommended considering the GBA platform.

While these items may have little consequence in the short-term, in the long-term they will hinder understanding of GBA programming. I believe that there's more to programming than just getting the thing to work – style and efficiency matter as well. This is especially true when teaching programming, as habits learned early on have a tendency to stick.

For example, code with an loose or inconsistent nomenclature makes it harder to read, write and understand, and therefore harder to maintain. It's the Principle of least astonishment: the fewer surprise elements, the smoother the ride. The problem with inconsistencies is that you never know where you stand: “why is it done like this here, but like that there?”, and so forth. They make it hard for novices to understand the rationales behind the code, and coding will ‘magic’: an incantation to utter to get something done. Code is not magic and you have to actually think about things before acting. This is part of the difference between a good programmer and a bad programmer.

Another example is inefficiencies. Sure, Premature optimisation is the root of all evil, but if a function is so slow to be unusable or if it costs next to nothing to make it faster, I think it's better to just start with the faster version in the first place instead of having to correct it later.

Because of things like this, readers of the book may get stuck at some point and will have to unlearn and relearn how to do things properly before they can really continue. Yes, I know how that sounds, but we've seen many, many instances of that on the forum over the last few years. Tonc has been one of my efforts to weed them out; this is another.

Most of the annotations here aren't actually about GBA programming specifically, but general principles of programming like consistent terminology, efficiency, and basic programming principles. Nearly every demo in the book suffers from these problems. Because later demos often repeat the same problems and I don't want to repeat myself too often, I'll only cover a specific problem once.


The main goal in writing this was to be thorough – probably a little too thorough, as some entries will be more important than others. Still, I think that all the points are worth thinking about at least once.

Now, writing something like this should have been the author's job. When someone else does it, especially ‘the competition’, it might come off as a little mean-spirited. This is unfortunate, but sometimes unavoidable. If instructions lead people into blind alleys, putting up signs pointing this out is a good thing, not a bad thing.

2010-05-09 revision
I've finally taken the time to straighten out a few things in the text. I've done some rephrasing to remove most of the snark (which indeed was spread a little thick) and shaded some items that on rereading aren't as important as I thought at first.

1 Chapter 1: The Zen of Getting Started

No annotations (there's little to annotate about introductions anyway)

2 Chapter 2: GameBoy Architecture in a Nutshell

Incorrect info: Memory architecture (pp 54-59)

The term ‘Access Width’ that's used here is a little ambiguous. For example, table 2.2 lists the access widths of the memory sections as 8 or 16 bits, except for IWRAM, where it says 32 bits as well. The problem here is that there is a difference between bus width and (software) access width. Table 1 lists the information according to GBATek. The bus-size may be 16bit in some cases, but this does not mean the access is restricted to that size; it would just take a little longer.

For example, it simply isn't true that VRAM accesses (pp 58) are restricted to halfwords: words work as well, only it'd take a cycle longer. However, when filling regions of memory you have to loop over a given size anyway, and then word-accesses will actually be quicker because there are only half the loop iterations in that case. If you do it right, you can get a performance gain of up to 8 times.

Table 1: bus and access widths, gbatek:memory.
Region Bus Read Write Cycle
BIOS ROM 32 8/16/32 - 1/1/1
Work RAM 32K 32 8/16/32 8/16/32 1/1/1
I/O 32 8/16/32 8/16/32 1/1/1
OAM 32 8/16/32 16/32 1/1/1
Work RAM 256K 16 8/16/32 8/16/32 3/3/6
Palette RAM 16 8/16/32 16/32 1/1/2
VRAM 16 8/16/32 16/32 1/1/2
GamePak ROM 16 8/16/32 - 5/5/8
GamePak Flash 16 8/16/32 16/32 5/5/8
GamePak SRAM 8 8 8 5

Incorrect info: ARM vs Thumb code (pp 60).

There is a performance benefit to using 16-bit instructions in a computer with 16-bit memory, although I don't get into Thumb code at all in this book, as it is an advanced topic. True, there may be cases for using Thumb instructions to speed up parts of a game, but there are also cases for using regular ARM instructions as well.

Yes, Thumb instructions are better suited for 16-bit memories, which is why Thumb instructions are advised for GBA programming. But it is not an advanced topic, as the only thing you have to do to make things compile as thumb code is use the following compiler flags: -thumb -thumb-interwork. The speed-up is estimated at 60% under almost all conditions.

3 Chapter3 : Game Boy Development Tools

HAM stuff. Can't really comment on that.

4 Chapter 4 : Starting With the Basics

4.1 REG_DISPCNT (pp 105-106)

Nomenclature / classification.

It is better that #defines for register bits use a prefix of some sort that indicates where they belong. Names such as MODE_3 and BG2_ENABLE don't do that, which can become problematic when other things have modes and backgrounds as well.

Volatile registers.

All IO registers should be defined as volatile. Because this would make the definitions long (volatile unsigned long, etc), use the common shorthands like u32, vu32, etc.

4.2 Fillscreen demo: inefficient code / bad coding (pp 106-108)

//# From the fill screen demo.


//create a pointer to the video buffer
unsigned short* videoBuffer = (unsigned short*)0x6000000;


//macro to pack an RGB color into 16 bits
#define RGB(r,g,b) (unsigned short)(r + (g << 5) + (b << 10))


void DrawPixel3(int x, int y, unsigned short c)
    videoBuffer[y * 240 + x] = c;


//fill the screen
for (x = 0; x < 239; x++)
    for (y = 0; y < 159; y++)
        DrawPixel3(x, y, RGB(0,(255-y),x));

The following quote has four excepts from the FillScreen project that are problematic.

Inefficiency: Variable for videoBuffer

Creating an pointer variable to store an fixed address like videoBuffer is perfectly legal, but not always recommended for several reasons. Firstly, it takes a little bit of IWRAM space. Not much, I'll admit, but if you do this for every section/register, you're already up to half a kB at least. Secondly is that it can be dangerous, as the pointer can be redirected at any time. This can be useful on some occasions, but this probably isn't one of them. Thirdly, the extra layer of indirection requires a little bit of extra work for the CPU, because the contents of global variables need to be read from memory before they can be used. It won't be a lot of work, but in the wrong places (like a video-buffer pointer, which is used every VRAM interaction) it can add up dramatically.

The general method for hardware access is a #define. In this case:

#define videoBuffer  ((u16*)0x06000000)

Note the double parentheses here: one for the cast and one for the whole thing. The second one is necessary because array accesses have a higher operator priority than casts: (u16*)0x06000000[1] would be parsed as (u16*)(0x06000000[1]), which won't even be compiled. This is probably why the variable is used in the first place.

Still, in the case of videoBuffer, using a variable might not be such a bad idea because it should be redirected for page flipping. But that's for modes 4 and 5, not mode 3.

RGB Macro

Preprocessor macros are generally considered evil, because there are so many potential pitfalls there. One of them is that parameters can be expanded in funny ways because the work by direct text substitution. For example `#define SQUARE(x) x*x' with x = 1+2 will give 1+2*1+2 = 5, and not 9. This is why every book that covers C gives the advice to always, always put parentheses around the macro arguments. Because it's not done here, the call to RGB() later on requires the parentheses – something which one can't expect the users to know – and they'd have a right to complain when their code goes wonky.

This kind of code is one of the reasons why most books will urge you to use inline functions instead of macros. Use macros only when you really have to.

The macro is also somewhat unsafe because it doesn't validate the arguments. The color components have ranges between 0 and 31; go out of the range and one color will bleed into another. But people are wise enough not to do that, right? Usually, yes, but in this particular program this is exactly what happens: the color used is RGB(0, 255-y, x) meaning that the green and blue component are generally outside the safe range here. Yes, it doesn't matter much here because a funky pattern is kinda what the author is after, but it'd be nice if unsafe uses like this had been mentioned.

Function for drawing pixels

There are three kinds of procedures in C: macros, functions and inline functions. Each of these has pros and cons. The important con of regular functions is the overhead cost for calling them: setting up the parameters, 2 jumps between the caller and called function, and potentially a few others as well. In contrast, macros and inline functions are integrated into the caller so there's no overhead. Additionally, because functions are separate entities you (or the compiler) can't optimize their calls much. Like, say, loading videoBuffer once and using it with offsets if you're in a loop. No, you have to reload it each time.

For most functions, the overhead of a function will be small compared to its body. However, the body of DrawPixel3() is one line. In most cases, the overhead here will cost you a factor 3! As a rule, pixel plotter routines should be macros or inlines.

Screen fill loop

In C, the x-coordinates of a matrix (such as a bitmap) are contingent in memory. Because of that, it is better to put the x-loop inside the y-loop. That way, the code will be optimized for using consecutive memory. Of course, it doesn't matter here because DrawPixel3() doesn't allow optimizing, but when using a proper mode 3 plotter, the effects should be noticeable. Also, the boundaries are usually the screen dimensions, not one less.

This gives use the following improvements:

// Create a pointer to the video buffer
#define videoBuffer ((unsigned short*)0x6000000)

// Macro to pack an RGB color into 16 bits
#define RGB(r, g, b) (unsigned short)((r) + ((g) << 5) + ((b) << 10))

static inline void DrawPixel3(int x, int y, unsigned short c)
    videoBuffer[y * 240 + x] = c;


// Fill the screen
for (y = 0; y < 160; y++)
    for (x = 0; x < 240; x++)
        DrawPixel3(x, y, RGB(0, (255-y), x));   // NOTE: still unsafe, but meh.

None of these items are particularly difficult to do correctly, nor do they increase the complexity of the code. Both versions will work fine, but the latter is more robust. In trivial projects this doesn't matter, but once the projects get large enough they will become problematic, and you might have to start all over again to get rid of the bad practices.

This is why people object to this book and many of the other tutorials: they're riddled with things like these. Newbie programmers will come into them and adopt the poor programming standards and only later find out that the practices they use are in fact bad. But by then it may already be too late to do anything about them :\.

4.3 ButtonTest project

Bad coding principle: copy/pasta (pp 111)

The project runs over all ten buttons to illustrate button-presses. For each button there is a separate block of code.

ham_DrawText(3, 2, "UP");
ham_DrawText(3, 3, "DOWN");

//# etc ...


// check UP button
    ham_DrawText(0, 2, "X");
    ham_DrawText(0, 2, " ");

// check DOWN button
    ham_DrawText(0, 3, "X");
    ham_DrawText(0, 3, " ");

//# etc ...

As a rule, if you find yourself coding by Copy&Paste, there will be an easier way using loops or lookups. These are usually shorter, and easier to read and maintain. For the initial strings it may be debatable if it's worth it, but for the actual button handling it really should be done.

Unfortunately, in this particular instance it actually does require a little more work because HAM does not have the proper functionality, but looping rather than hardcoding each option is still a good principle.

for(i=0; i<10; i++)
    if( (~R_CTRLINPUT) & (1<<i) )
        ham_DrawText(0, 2+i, "X");
        ham_DrawText(0, 2+i, " ");

And if you want to be really terse, you can even do it like this, but that's probably pushing it.

// LOL One line !!!1!one!
for(i=0; i<10; i++)
    ham_DrawText(0, 2+i, (~R_CTRLINPUT & (1<<i) ? "X" :" ") );

Bottome line: instead of having 60 lines of code, now you have only 8 or even 2. As you add more cases, the shorter version can really help maintainability.

5 Being One with the Pixel (pp 111)

5.1 Mode3Pixels (pp 133-135)

//# Parts from Mode3Pixels's main.c

//declare the function prototype
void DrawPixel3(int, int, unsigned short);


//changes the video mode
#define SetMode(mode) REG_DISPCNT = (mode)


//packs three values into a 15-bit color
#define RGB(r,g,b) ((r)+(g<<5)+(b<<10))


x = rand() % 240;

Prototype formulation

A function prototype is a way to tell the compiler that a function of a certain name exists, what the number and types its arguments are, and what type it returns. The prototype given here does just that. However, it should also serve as a summary for the programmer. Because the declaration here doesn't mention the names of the arguments, it's much less useful than it could be.

Consistency: RGB() macro

Note that the r term now has parentheses. However, the others do not, so it's still unsafe to use.

Inefficient ranged randoms

While the use of `rand() % N' for a random number between 0 and N is a time-honoured tradition, it doesn't really work well for the GBA. Both rand() and the modulo use division, which is a pretty slow operation on the GBA.

A better solution would be to use `rand()*N>>15', which may not be as obvious, but it works just as well (better actually, because the higher bits tend to be more random) and is much, much faster. It makes use of the fact that rand() returns numbers in the range [0, 215), which can be interpreted as a Q15 fixed-point number between 0 and 1. For even faster randoms, replace rand() with your own liner congruent RNG that doesn't use divisions. You can go from 1200 cycles to about 90 that way. (tonc:qran)

5.2 Mode3Lines, Mode3Circles, Mode3Boxes (pp 147-149)

As I said before, using DrawPixel3() as a function instead of a macro or inline means losing a factor 3 in speed. It gets even worse for DrawBox3(). The 3x is the standard function overhead, but in the case of drawing rectangles, there's potential for optimization because you're accessing adjacent pixels. This means the speed loss is more like 3.5 instead of 3. And then there are simple optimisations that aren't taken here. Filling in words instead of halfwords would double the speed. Using a fast filler (DMA/memset16) would increase it even more. Much more.

5.3 Mode3Bitmap (pp 152-154)

Bad coding: #including data

If there's one bad practice the experienced GBA homebrewers would like to see go away, it's this one.

#include "mode3.raw.c"

The reasons why this is bad can be found at tonc:header. The author mentions this is technically a bad practice and that the proper method is separate compiling & linking, but that it's alright for small projects (at least I think he does; I just can't find the reference anymore). This is true, but the fact is that the many of the homebrewers and students don't know the proper procedure yet, and will adopt this standard, which will then have to be unlearned later.

5.4 Palette discussion

mem variable: paletteMem (pp 156)

paletteMem should be `#define paletteMem ((u16*)0x05000000). Also note the inconsistent naming for memory: paletteMem and videoBuffer. We'll see more of this later.

Transparent color (pp 156)

Note that it's common practice to keep palette entry 0 set to black.

It is indeed common practice, but not really recommended. the problem with having it black is that it'll hide the pixels that really should appear black. It's better to use a bright, easily distinguishable color, so that it's easy to see what should be transparent and what shouldn't. There's a reason that movies and TV-shows use blue-screen or green-screen for CGI, and not black-screen.

Incorrect info: sprite palettes (pp 156)

“Remember, even the sprites in your game uses this palette …”

No, sprites have their own palette.

Incorrect info: DMA (pp 158)

“However, mode 4 has an advantage of being fast when using hardware-accelerated blitting functions and DMA, because twice as many pixels can be copied to video memory.”

Technically this is true, but the phrasing is a little ambiguous. Obviously, the other modes can use DMA just as well – there's nothing special about mode 4 regarding DMA. Also, it's not really that twice as many pixels can be copied; rather, mode 4 simply uses half the number of bytes for the same amount of pixels. But even with DMA and 2× fewer bytes to copy, mode 4 is simply too slow for general games.

5.5 Mode4Pixels (pp 159-161)

Code inconsistency : RGB() macro

Now all terms are properly parenthesized. If it's corrected here, why's it not done in other projects?

Loop index names

Loop variables are named i, j, k and such (or doubled: ii, jj, kk). n and m are used for sizes, not indices. Though I'm willing to admit I prefer it that way my background is physics and not of computer science.

Inconsistent use of functions

Why are we using memcpy() for the bitmap, but not the palette?

memcpy usage (pp 164)

“Memcpy may not be the fastest method, because it doesn't take advantage of writing two pixels at a time …”

True, it's not the fastest method, but it does write 2 pixels at a time. Four even. In fact, that's the only reason it works: if it did copy byte for byte it wouldn't work because of the no-byte-write rule for VRAM. memcpy() is a tricky bugger, it will only work right when the size and alignment are right. See tonc:memcpy for details.

“However, it may be faster to copy mode4_Bitmap to the video buffer using a loop that iterates 160x120 times …”

No it won't, because memcpy() already takes the faster route: struct-copy by 16 bytes in one go. Simple for-loops will almost always be slower than dedicated copying routines.

“… (that's half the number of pixels)”

Be careful how you interpret this remark. 160×120 is half the number pixels of a mode 3 buffer, which is what videoBuffer refers to. The only reason you need this amount now is because videoBuffer is a halfword array, not a byte array. It's not that you now need to copy half the number of pixels, but rather half the number of halfwords. You're still copying 240×160 pixels, they just take up lewer bytes. It's a subtle differece, I know, but it's an important difference.

“…but I'm not going into optimization at this point because DMA is faster than any software loop.”

Interestingly, it's actually pretty close. A DMA copy is only about 10% faster than a well-written software copier. For filling, it's actually 10% slower!

5.6 Page flipping (pp 164-165)

Incorrect information: double buffering speed

“Basically, when you draw everything to an off-screen buffer, things move along much more quickly. In fact, using a double buffer makes the drawing operations so fast that interferes with the vertical refresh, so you much add code to check for the vertical blank period and only flip in this period.”

No. The speed of drawing is the same, you just don't see it happen. Page flipping is the solution to interference with the VDraw, not the cause. There simply isn't enough time to render a full 240×160 scene within the VBlank period, so some of the drawing would be done as the screen was being updated, leading to graphical artifacts. What page-flipping does is allow you to do is draw everything on a separate buffer and then simply swap it in. It's the swapping that's fast, but not the drawing.

Also, yes, you need to wait for the VBlank to do this to avoid tearing, but this is not the only reason for waiting. Usually, all timing is done relative to the VBlank.

Mem variables/inconsistent nomenclature: FrontBuffer/BackBuffer

Again, using #defines for FrontBuffer and BackBuffer would be preferable. Also note that the terms structure of the names are inconsistent with earlier variables (videoBuffer was camelcase, and paletteMem had Mem instead of Buffer). This may seem a minor point, but it would just make everything easier to read and remember if a consistent terminology were used.

The BACKBUFFER #define is problematic as well, as it refers to the bit in REG_DISPCNT instead of the actual backbuffer. Again, using a prefix to hint at meaing would be preferable.

5.7 Mode4Flip (pp 165-170)

There are several really nasty bugs in this example.

volatile unsigned short* ScanlineCounter =
    (volatile unsigned short*)0x4000006;


paletteMem = RGB(0, 31, 0);


//slow it down
n = 500000;


void WaitVBlank(void)
    while(*ScanlineCounter < 160);

ScanlineCounter and WaitVBlank

First, as explained before, use #defines for memory mappings: it's generally faster and less memory intensive – no sense in wasting memory if there's absolutely no reason for it. Second, 0400:0006 has a perfectly valid name already: REG_VCOUNT. Third, notice the volatile here. This keyword is a crucial part of the code, without which it would not function. What volatile does is force the compile to not optimise accesses to the variable. This is necessary here because the value of the register changes without the programmer's interference. Since volatile falls outside the experience of most programmers, this should have been explained a little more in the text.

As for WaitVBlank; technically, it's correct: it waits till the VBlank. However, what's needed for proper animation is waiting for the next VBlank. If we're already in the VBlank, we pass through this function immediately. That is why there's that wait-loop inside the main loop, to wait till we get out of the VBlank again. The correct procedure would be to check REG_VCOUNT (tonc:vsync). Also, note the dereference of ScanlineCounter. This is required because it's a variable; with a #define, you could build the deref right into it.

Incorrect code: paletteMem redirection

And this is one of the reasons why memory mapping should be done with #defines instead of pointer variables (or at least with const-pointers): pointers can be redirected, like it's done here. What it should have said here is `paletteMem[1]', but as it stands, it makes paletteMem point to address 0000:03E0. If it was a #define or a const, the compiler would have complained.

Bad coding: Slow-down loop

//slow it down
n = 500000;

The only reason this is necessary is because WaitVBlank() is flawed. However, this kind of slow-down method shouldn't work with modern compilers because the optimizer would recognize it as an empty loop that does nothing and throw it away.

Inefficient code: DrawBox4()

I admit that the function is comprehensible, but it is so ridiculously slow that it should not be used in an actual project. Even with a moderate amount of effort, you can get a function that's more than 30× faster. Remember: DrawPixel4() checks for even or odd pixels, but if you're working on a stretch of pixels that's completely unnecessary because both will be painted on.

The key to a faster version is to consider the odd pixels on the left and right separately, and do a fast filler for the halfwords inbetween.

5.8 Mode5Pixels (pp 172)

Inconsistent/unsafe code: RGB() macro

And now the unsafe version is back! O_o.

5.9 Text (pp 173-)

Miscellaneous: font system

“Now, I realize that many programmers use a bitmapped font, and that's not a bad idea at all, because the font characters can be treated as sprites. However, I prefer a low-memory footprint and more control over the font display mechanism.”

While this indeed a good idea, the font actually has quite a large memory footprint, and there are very few control features.

Memory inefficiency: the font

The font used consists of a partial font: ASCII 32 to 96 (some punctuation, numbers and uppercase letters). This is represented by an array of 64×8×8 halfwords. Each halfword can be 0 or 1. This not only leaves out lowercaps and some of the punctuation symbols, but also wastes 15/16 of the space by using only one bit of each halfword. It should have at least been bytes, and at best a 1bpp bitpacked font (tonc:text). Also, the whole 8kiB font (which generally is constant data) is placed in IWRAM!! Losing a quarter of a very precious memory area for something that's 16× bigger than it's supposed to be it just wrong.

It's fairly easy to take the font out of IWRAM and compress it down by a factor of 2. Simply replace this:

//# Original declaration, in IWRAM.
unsigned short font[] =

by this

//# Replacement declaration, in ROM.
const unsigned char font[] =

It's possible to compress this down by 8×, but that take a lot more effort.

Inefficiency: DrawChar

Aside from the slowness of DrawPixel3() (which has been covered already) the extra math in assigning draw will probably take some cycles as well. Pointer arithmetic would be of great assistance here both in performance and clarity. Yes, clarity too: pointers have a way of reducing the amount of written code because all the constant or nearly constants offsets are integrated into the pointer, rather than added to the code for every single step.

void DrawChar(int left, int top, char letter, unsigned short color)
    int x, y;
    int draw;

    for(y = 0; y < 8; y++)
        for (x = 0; x < 8; x++)
            // grab a pixel from the font char
            draw = font[(letter-32) * 64 + y * 8 + x];
            // if pixel = 1, then draw it
            if (draw)
                DrawPixel3(left + x, top + y, color);

My own mode 3 text writer uses about 1.35k cycles/char. This is a lot, but it's manageable. This DrawChar() does 5k cycles/char. But wait! That's with the font in fast RAM. If you put it in ROM where it's supposed to be, the time jumps to 7k/char. That means that you're out of the VBlank in about 11 characters so you could just manage to print, say, lives left and the score. In my opinion, this is one of those instances where optimisation is not premature.

6 Chapter 6: Tile-based Video Modes

Incorrect info: parallax scrolling (pp 203).

“Mode 0 is great for this because all four backgrounds are hardware rendered. You do not need to write your own parallax scrolling routine.”

Yes and no. Yes, mode 0 is great, but you still need to write your own parallax scrolling routine because the scrolling registers only do scrolling, not parallax (parallax is differential scrolling). Aside from that, for big maps you'd still need to write your own scrolling engine anyway. However, it will be easier and faster with the tiled backgrounds than in the bitmap modes, because you'd only have to update one row or column of the screenblocks, rather than the whole of VRAM.

Incorrect info: Tile data and Tile map (pp 204)

‘The tile data itself can be stored anywhere in VRAM, as long as it's on a 16 Kb boundary …”

This isn't quite true. The data doesn't have to be aligned to a charblock boundary, it just makes it easier to use. If placed at an offset, all the map entries would have to incorporate that offset as well somehow.

“… video memory is divided into 4 logical Char Base Blocks, which are made up of 32 smaller Screen Base Blocks …”
(pp 204-205).

No. Well, yes, but a misleading yes. Screenblocks have nothing to do with charblocks; the two are completely separate entities. That they happen to use the same memory addresses is an inconvenience to watch out for, not an indication of something hierarchical.

“The tile map (which defines where the tiles are positioned) must begin at screen base 31 at the very end of video memory.”

This is an unfortunately use of “must”. Nothing requires the tilemap to start at screenblock 31, it just so happens that it does in this example.

Incorrect info: DMA Blitting (pp 209)

“Let's not forget that DMA is a hardware process, where a software blitter is compiled and run by the CPU as machine instructions. You can't begin to compare a hardware process with a software process, because anything that is hard-coded into silicon will blow away a series of machine instructions. … you could write a fast memory copy routine in assembler and it would be much faster than a C routine. However, DMA will blow them both away …”

Actually, we can compare them and, as it happens, DMA will not blow a well-constructed asm copier away. Yeah, I was surprised when I saw that as well. The reason that DMA is faster than regular copies is because there is no loop overhead. However, with ldmia/stmia instructions, the loop overhead per byte is significantly reduced anyway, especially for sections with high waitstates. In most cases, DMA will only be 10-20% faster in copying. It will actually be slower when it comes to fills, because it will re-read the data to fill with every time, which a software version won't have to. It's not safe to assume that a hardware process will always beat a software routine.

6.1 TileMode0 (pp 211-215)

void DMAFastCopy(void*, void*, unsigned int, unsigned int);

//copy the palette into the background palette memory
DMAFastCopy((void*)test_Palette, (void*)BGPaletteMem, 256, DMA_16NOW);


#define BG_COLOR256 0x80
#define CHAR_SHIFT 2
#define SCREEN_SHIFT 8
#define WRAPAROUND 0x1

//background mode identifiers
#define BG0_ENABLE 0x100
#define BG1_ENABLE 0x200
#define BG2_ENABLE 0x400
#define BG3_ENABLE 0x800


//background memory offset macros
#define CharBaseBlock(n) (((n)*0x4000)+0x6000000)
#define ScreenBaseBlock(n) (((n)*0x800)+0x6000000)

    //create a pointer to background 0 tilemap buffer
    unsigned short* bg0map =(unsigned short*)ScreenBaseBlock(31);

#define BGPaletteMem ((unsigned short*)0x5000000)


//vertical refresh register
#define REG_DISPSTAT *(volatile unsigned short*)0x4000004

//wait for vertical refresh
void WaitVBlank(void)
    while((REG_DISPSTAT & 1));


#define BUTTONS (*(volatile unsigned int*)0x04000130)

    if(!(BUTTONS & BUTTON_LEFT)) x--;

Partially improper types: DMAFastCopy()

Part of choosing the type of a symbol is making it easy to use. For general memory, the most useful types are `void*' or `const void*', because then you won't have to force the user to cast everything and making the code less readable in the process. In the case of DMAFastCopy(), the first argument should have been `const void *', because source data shouldn't be modified. The destination should be `void*'. Using these types means that the user won't have to cast explicitly. An other problem with having identical datatypes here (combined with not having parameter names) is that you could make the mistake of switching the arguments and so try to copy from destination to source, which is generally a bad idea. This is more likely than you might think, since in the standard method of copying, memcpy(), the destination address goes first.

Unnecessary casts clutter up the code, making things more difficult to read. Note that the destination is actually already the correct type, but gets a needless cast anyway. The reason for this is probably that it was copy-pasted blindly from another source, which is one of the bad habits I'm trying to warn against.

Naming inconsistency: REG_BGxCNT defines

BG_COLOR_256, CHAR_SHIFT, SCREEN_SHIFT and WRAPAROUND are all part of REG_BGxCNT. However, apart from the first name, there is nothing that indicates this. The reason this is bad is that it can cause conflicts and/or confusion with other register bits. Bits like BGx_ENABLE, which aren't part of the background control at all, but part of REG_DISPCNT. Again, this can be solved by a prefix.

Incorrect #define: WRAPAROUND

WRAPAROUND is incorrect: it should be 0x2000. 0x1 is a priority setting (gbatek:bg control). And like many other register #defines, it's impossible to tell what WRAPAROUND really refers to. Mapping systems and text writers can have wrap-around as well, for example.

Improper types: Char/ScreenBaseBlock

In nearly – and perhaps all – cases, char and screen block addresses are used as pointers, not as raw addresses. It'd make sense to put types inside the macros, rather than forcing the user to add them explicitly.

Consistency: BGPaletteMem

The earlier PaletteMem was a variable, now it's a #define. It should be a #define, but it'd be nice if there was some consistency in these things throughout the book.

Incorrect #define/info: REG_DISPSTAT

Well, the definition is correct, but not the comment above it. 0400:0004 is the display status, there is no such thing as a vertical refresh register. Although there is a bit inside REG_DISPSTAT that checks the VBlank/Draw status, the register has other functions as well.

Inconsistency/Incorrect function: WaitVBlank

This is the second version of WaitVBlank(), and again it does not do what it should. Nor does it do what it says. bit 0 of REG_DISPSTAT checks for VBlank status: if 1, then we're inside the VBlank (gbatek:dispstat). So unlike what the function name indicates, it actually waits till the VBlank is over, not until it starts. And as a timing mechanism, it's still useless for reasons described earlier.

Inconsistency/Incorrect #define: BUTTONS

The proper name is REG_KEYINPUT or REG_KEYS or something else that starts with REG_. More importantly, REG_KEYINPUT is a vu16, not vu32.

Improper/inefficient code:

    if(!(BUTTONS & BUTTON_LEFT)) x--;

REG_KEYINPUT uses active-low settings (which should have been mentioned here as well somewhere, as it's not exactly obvious). This means that when a button is down, the bit is clear (0), and not set (1). To go to an active-high setting, one should invert the bits and then mask: `~BUTTONS & foo'. That says “check if button foo is down”. The code, `!(BUTTONS & foo)', checks whether foo is not not down. Technically it's the same, but the formulation is awkward.

Aside from that, a logical NOT and a bitwise NOT are not the same. In most cases the bitwise version will be quicker, though ultimately the difference is negligible.

6.2 RotMode2 (pp 218 - 224)

Improper types: REG_BG2PA-PD,-X, -Y

The affine registers are signed, not unsigned. Technically, it doesn't really matter here because these things are write-only, but people will base their code on this, use the unsigned types, and then see the screen go wonky because a signed 0xFFFF is not an unsigned 0xFFFF.

And yes, the proper term is ‘affine’, not Rotation or Rot/Scale. The registers form a general 2×2; affine transformation and while rotations are affine transformation, not every affine transformation is a rotation. The term is incorrect and misleading.

Unexplained terms: RotateBackground

Yes, the function has something to do with rotation and scaling, but nowhere is it actually explained what the terms are and what it actually does. When dealing with math, you must always define your terms clearly. This is especially true for the GBA case, where the affine transformations work a little differently than you may expect.

In any case, the function basically implements tonc:eq 12.4, with α = −ang, map anchor p0 = (scroll_x, scroll_y) and screen anchor q0 = (cx, cy). The formulation and order of the terms in the equations is … confused. When it comes to writing down matrix equations explicitly, it is better to have the first column (the terms multiplied by x_center) before the second column (the y_center terms). Technically it doesn't matter, but it makes it hard to visualise what's going on.

void RotateBackground(int ang, int cx, int cy, int zoom)
    center_y = (cy * zoom) >> 8;
    center_x = (cx * zoom) >> 8;
    DX = (x_scroll - center_y * SIN[ang] - center_x * COS[ang]);
    DY = (y_scroll - center_y * COS[ang] + center_x * SIN[ang]);
    PA = (COS[ang] * zoom) >> 8;
    PB = (SIN[ang] * zoom) >> 8;
    PC = (-SIN[ang] * zoom) >> 8;
    PD = (COS[ang] * zoom) >> 8;

Also, do not use all-caps for variables. These are reserved for constants and macros. One or two-lettered global variables is probably also not a good idea.

Inefficiency: sin/cos tables should be power-of-two

Power-of-Two-sized (PoT) arrays for periodic functions make things a lot easier, because you can then simply mask the index to keep from going out of bounds.

WaitVBlank() version 3

    while(!(REG_DISPSTAT & 1));
    while(REG_DISPSTAT & 1));

The necessary components for a correct vsynch are here, but not in the right order. The first part actually waits till the VBlank is over, not when it starts. As a result, all the updating is done inside the VDraw period, not after it. Switch the two statements and it should work correctly.

7 Chapter 7: Rounding up Sprites

Missing info: Converting tiles (pp 231)

gfx2gba can be used for object tiles as well. In fact, it's notably better than pcx2sprite. (Though grit is better then either here, of course :))

7.1 SimpleSprite (pp 236 - 241)

Terminology: SpriteMem, SpriteData and SpritePal

While it's good that they're clearly indicated as having something to do with sprites, There is the potential for confusion between SpriteMem (i.e, OAM) and SpriteData (Object VRAM), because Mem and Data are pretty much synonyms.

Also, we have now three different nomenclatures for palettes: paletteMem, BGPaletteMem and SpritePal.

Inconsistent/confused terminology: Obj attribute bits

Once again, it is important to indicate where bit defines belong, which isn't done here. For example, COLOR_256 sets the color-mode of an object to 256-color, but backgrounds have a similar option, but at a different bit. Yes, it's called BG_COLOR256 there, to avoid the naming collision, but unless you already knew that, you might be tempted to use COLOR_256 in both cases, or assume that there was an OBJ_COLOR256. The same goes for the MODE_x names, which might lead one to think that they're similar to video mode bits and hence belong to REG_DISPCNT. Even better, the once that do have an OBJ prefix actually belong to REG_DISPCNT, not object attributes.

Improper types: x, y

Local variables should be ints, unless you have a really good reason to use something else. ARM cores are 32bit, so use 32bit where you can. For a little more on this, see tonc: good/bad practices.

Inefficiency: UpdateSpriteMemory

void UpdateSpriteMemory(void)
    int n;
    unsigned short* temp;
    temp = (unsigned short*)sprites;
    for(n = 0; n < 128*4; n++)
        SpriteMem[n] = temp[n];

This is a pretty slow implementation of an OAM updater. Using u32 pointers instead of u16* would double the speed. Using dedicated copiers (like DMA) would increase the speed by a factor of 5 to 10. Do not use manual copies unless you're working on small ranges; use memcpy(), DMA or CpuFastSet().

7.2 BounceSprite header (pp 243 - 245)

Magic numbers: SpriteData3

//video modes 3-5, OAMData starts at 0x6010000 + 8192
unsigned short* SpriteData3 = SpriteData + 8192;

Magic numbers in code are generally a bad idea. When it comes to tile addresses, magic numbers are very common for some reason. At the very least, one should make a macro that takes care of the raw math, so that you just have to enter the charblock and tile index to get the right address. A nice trick would be to define TILE and CHARBLOCK types so that you can map VRAM into tiles. (see tonc:tileblocks). This allows both easy addressing and copying.

But even if you do use magic numbers, it's better to use hexadecimal; 8192 just isn't a very nice number. In hex it's 0x2000, which is easier to remember.

Also, you have to be very careful with datatypes and because pointer addition, because the addition works by type, not by byte. In particular, SpriteData+0x2000 = 0x06014000, not 0x06012000, because the datatype of SpriteData is u16. The code is correct here, but the comment is not.

7.3 BounceSprite main.c (pp 246 - 251)

void MoveSprite(int num)
    //clear the old x value
    sprites[num].attribute1 = sprites[num].attribute1 & 0xFE00;
    sprites[num].attribute1 = sprites[num].attribute1 | mysprites[num].x;
    //clear the old y value
    sprites[num].attribute0 = sprites[num].attribute0 & 0xFF00;
    sprites[num].attribute0 = sprites[num].attribute0 | mysprites[num].y;


//draw the background
for(n=0; n < 38400; n++)
    videoBuffer[n] = bg_Bitmap[n];

//set the sprite palette
for(n = 0; n < 256; n++)
    SpritePal[n] = ballPalette[n];

//load ball sprite
for(n = 0; n < 512; n++)
    SpriteData3[n] = ballData[n];

Incorrect/bad code: MoveSprite

C has special, shorthand operators for things things like `x = x op y'. They're called compound operators, and look like this: `x op= y'. In the case of OR, that would look like `x |= y', rather than `x = x | y'. The shorthand versions are generally preferred, as the statements are easier to parse and less error-prone.

Additionally, the new sprite coordinates should be masked. If not, they can overwrite the rest of the bits when entering negative numbers (see tonc:obj-position).

7.4 TransSprite (pp 258 - )

Too specialized design methodology: InitSprites()

For general functions, make sure that what they do makes sense and don't set bits that have no place in them. For example, InitSprite() forces objects to be transparent. This should be dealt with via a parameter, not forced. It's probably better for everyone to just have a number of parameters that mimic the attribute settings. This limits the amount of function arguments and shortens the code significantly.

Incorrect functionality: SetTrans() and SetColorMode()

void SetTrans(int num, int trans)
    mysprites[num].trans = trans;
    sprites[num].attribute0 = mysprites[num].colormode |
        mysprites[num].trans | mysprites[num].y;

void SetColorMode(int num, int colormode)
    mysprites[num].colormode = colormode;
    sprites[num].attribute0 = mysprites[num].colormode |
        mysprites[num].trans | mysprites[num].y;

Only from the code could you tell these have something to do with objects. Aside from that, what they do is not general enough. What should be done here is mask out the relevant bits and then mask the new bits in. As it is now, they will only update attr0 with position, transparency and color-mode settings, and erasing the other bits, like the shape information. It just happens to work out for square objects.

Magic numbers: REG_BLDMOD and REG_COLEV

It'd be nice to make some register #defines for these things. Not everyone will know that (1<<4) will mean object top-layer transparency.

Inefficient struct design: SpriteHandler

While using ints is usually good, one place where the case isn't always so clear-cut is in structs. The thing about structs is that the cost memory space, and IWRAM space at that in the case of global variables. The SpriteHandler struct takes up 8 words, and the array 128*8*4 = 4 kiB. That's already 1/8th of IWRAM. Now, sometimes this is okay because, well, sometimes you just need large amounts of data. But the members alive, colormode and trans are all single bits, and size only requires 2 bits. So about half of the struct is always empty. The solution is bit-packing, which is how most of GBA memory works anyway. It should be done here as well.

Bad names: dirx/diry

These aren't directions; they're velocities. Directions don't have magnitudes; vectors like velocities do. Additionally, directions don't necessarily imply movement: positions have a direction too, as does looking.

7.5 Rotation and Scaling (pp 265 - 266)

Incorrect explanations

“Is there really any need to draw pre-rotated sprites anymore when support for rotating a sprite is built in to the GBA hardware?”

Yes, it is. There are only 32 affine matrices for objects, which may not be enough of all your sprites. Also, the affine transformations don't give very smooth results and have a number of artefacts that can look very ugly indeed (tonc:affine objects).

“The process isn't perfect because the GBA doesn't have a floating point processor …”

That's not the reason it's not perfect. The process is one of sampling; sampling can cause artefacts because that's just how sampling works, unless you do some funky anti-aliasing.

“… so all the rotation must be done in fixed-point math.”

Which is a non-obvious subject for most nowadays, and should be explained a little. Preferably when it's first used, which was in chapter 6.

“This is a refinement over the SIN and COS arrays you saw in the previous chapter, as there is no longer any need for a source file containing these radian values since they're just computed at the start of the program. This does cause a slight delay ”

Apart from the delay, everything in this sentence is wrong. What's described here is to precalculate the sin/cos arrays in-game, rather than precalculate them on the PC. This does cause a slight delay because the GBA is very bad at floating point math, which is necessary to build the arrays. But that's the least of your trouble. Because they're build in-game, the arrays will go into IWRAM as well instead of ROM. In this case, That's 2*360*4 = 2880 bytes of IWRAM wasted on what's essentially constant data. Furthermore, the amount of memory these arrays can be significantly reduced for the following reasons.

  1. You only need either a sine or cosine table, as they're basically the same.
  2. At .8 fixed-point, you'll only have 9 significant bits. The other 23 are essentially wasted.

To build them outside the game and link them in is the right thing to do. To build them in-game is not a refinement, it's a detriment.

Improper typing/naming: RotData

The types here should be signed (tonc:affine-types)!

7.6 RotateSprite

Wasted memory / missing member: SpriteHandler (pp 270)

There are two new members here: rotate and scale. Contrary to what you might think, rotate isn't the angle; it's the flag marking the sprite as an affine sprite. The real angle is kept in a member called angle, which is, in fact, missing from the SpriteHandler definition, so the code wouldn't compile.

Button #defines (pp 271)

It would be much easier if these were in hex, or even binary to highlight the bitwsie nature of the constants.

#define BUTTON_A        0x0001
#define BUTTON_B        0x0002
#define BUTTON_SELECT   0x0004
#define BUTTON_START    0x0008
#define BUTTON_RIGHT    0x0010
#define BUTTON_LEFT     0x0020
#define BUTTON_UP       0x0040
#define BUTTON_DOWN     0x0080
#define BUTTON_R        0x0100
#define BUTTON_L        0x0200

Fragile code: InitSprites() and ROTDATA() (pp 274)

The function uses ROTDATA(tileIndex) to indicate the affine matrix number. Affine indices max-out at 32; tile indices can be up to 1024 and will be greater than 32 very often, making it a bad idea to use tileIndex like this. Also, the ROTDATA() macro does not mask-out its input, so that the size-bits may be overwritten.

Inefficient/incorrect code: CheckButtons()/ Pressed() (pp 276)

It's a good practice to read REG_KEYINPUT into a variable and use that for interaction. This way, you can go to active-high status, and do more complicated things like test for key-helds and releases and the like (tonc:keys-advanced). However, the way it's done here makes little sense and only to complicate matters. All you need is a simple routine that ANDs a bit-mask. This will also have the benefit of being able to check for multiple keypresses in one go, something which the switchblock does not allow.


Also, for some reason some array-deferences ([1] - [4]) have gone missing. The code would not compile.

Incorrect comments: SetMode() (pp 277)

//set the video mode--mode 3, bg 2, with sprite support
SetMode(2 | OBJ_ENABLE | OBJ_MAP_1D);

This actually sets the video-mode to 2 (which is a good thing because the tile indices would be 512 or higher, making InitSprites() fail) and no background.

7.7 Animated Sprites (pp 279 - 280)

Incorrect information frames and OAM

“The easiest way to animate sprites is to copy a particular frame on an animation sequence into OAM so that it is rendered during the next screen refresh.”

Not quite. The easiest and quickest way would be to have all the frames in VRAM already, and update the tile index. This won't always work because there may be too many frames, in which case you'd have copy new frames into VRAM. Note, copy into VRAM, not OAM. OAM stands for Object Attribute Memory, which is at 0700:0000. The animation frames go into 0601:0000, which is part of VRAM.

7.8 AnimSprite main.c (pp 283 - 289)

Incorrect code: WaitVBlank() (pp 286)

void WaitVBlank(void)
    while( (*ScanlineCounter));

This is the fourth, incorrect version of WaitVBlank(). This actually waits until the start of the first scanline, rather than the VBlank.

Improper types/magic numbers: UpdateBall()

void UpdateBall(index)
    u16 n;
    //load ball sprite
    for(n = 0; n < 512; n++)
        SpriteData3[n] = ballData[(512*index)+n];

Do not use non-ints for loop variables. In this case, it slows down an already slow loop by another 10-20%. And then there's the matter of the magic 512 … what is it? Also, the index parameter has no datatype.

8 Chapter 8: Using Interrupts

Incorrect info: timer importance

Then I talk about the all-important subject of timers, how to slow down your program to a consistent framerate. This has been something of a problem in the prior chapters (aside from using the VBlank), but now you will have the means to correct it.”

The GBA timers are generally not used to keep a consistent frame rate. Synchronizing should use the VBlank, the problem in the prior chapters was that none of the projects did this correctly.

8.1 Using Interrupts (pp 296)

Incorrect info: software interrupts (pp 296)

“A software interrupt is common in a multitasking operating system like Windows 2000 or XP. Since the GBA is a video game machine, as you may have expected, all interrupts occur on the hardware side.”

This is not true. There are indeed software interrupts, also known as BIOS calls (gbatek:bios-functions).

Incorrect info: REG_DISPSTAT{VCOUNT} (pp 297)

The VCount trigger in REG_DISPSTAT used bits 8-15, not 6-15.

Incorrect info: bit-ops (pp 298)

“The hexadecimal values in this list of definitions allow you to perform a bitwise AND with the REG_IE register in order to set the specific bit …”

This should be bitwise OR to set.

Not quite correct info: multiple interrupt timings (pp 298)

One and only one interrupt will occur at a time! So the REG_IF register will only have one bit set, not several. …”

This isn't completely true. Yes, only one interrupt will fire at a time (unless you get really unlucky), but multiple bits in REG_IF may be set. This can occur when dealing with nested interrupt routines, or if you get an interrupt but don't acknowledge it.

Incorrect info: acknowledging irqs (pp 299)


To acknowledge an interrupt, you need to write that bit to REG_IF, not use an OR-EQ. So that's `=', not `|='. Yes, this is a little odd, but that's how it really works. Using `|=' would in fact clear all the interrupts.

Code redundancy (pp 300)


When using a single-bit mask, you don't have to use the == and such. After all, it can be nothing else but that or 0, and these will be enough.

8.2 InterruptTest (pp 301 - 305)

Improper typing of registers

IO registers should always be volatile. Especially interrupt registers. That means you, REG_IE, REG_IF and REG_IME.

Bad code for interrupts: MyHandler plotter (pp 305)

The object here was to draw a random pixel at every HBlank. As the author indicated, “… hblank code must be fast!”. So the last thing you should be doing is call rand() 3 times, call modulo 3 times, use a slow function, and do all these things in ARM code from ROM. The whole routine will probably take about 4-8 scanlines to execute.

8.3 Using Timers (pp 306 - 307)

Incorrect code: timer check (pp 306)

timer = REG_TM0D;
if (timer % 65536)
	// overflow--time to deal with it

Timers run continuously so they will be a multiple of 65536 exactly is next to zero (well, 1/65536, really). Meaning that this code will execute pretty much all the time.

8.4 TimerTest (pp 307 - 314)

int timers;

timers[0] = REG_TM0D / (65536 / 1000);
timers = REG_TM1D;
timers = REG_TM2D / (65536 / 1000);
timers = REG_TM3D;

Improper types: timers

Again, array operators have gone somehow.

Problematic code: integer division

Because the 65536/1000 division is done first, you get a 1% inaccuracy since this rounds down to 65. It'd have been better to use `REG_TM0D*1000/65536'.

8.5 The Framerate program

Incorrect info: VBA accuracy (pp 315)

“I think this program demonstrates that the VisualBoyAdvance emulator is working perfectly, because a consistent framerate of 60 FPS comes through when the VBlank is used.”

VBA is quite inaccurate in its timings. Keeping a consistent frame rate is the easy part for an emulator, because you just update REG_VCOUNT and those kinds of timers with the PC clock.

Incorrect code: framerate counter (pp 327)

timer = REG_TM3D / (65536 / 1000);
if (timer > 999)

    //display frame rate
    sprintf(str, "FPS %i", frames);
    Print(1, 1, str, 0xFFFF);
    frames = 0;

This code doesn't work as a framerate counter. The timer is set to 256 clocks, in other words about 1098 timer increments per retrace (308*228*4 / 256). The timer variable here has a maximum of 65536/65 = 1008. And on average will increment by 1098/65 18. Because the check is done at 999, it can miss the overflow, using more frames in a 'second'. If you want a more accurate second timer, use cascaders, like here: (tonc:timer-demo).

9 Chapter 9: The Sound System

Don't know too much about sound programming, so I'll leave this chapter for later.

10 Chapter 10: Interfacing with the Buttons

10.1 Detecting Button Input

Incorrect info: REG_KEYINPUT size (pp 361)

“The button status value at 0x04000130 is a 32-bit number.”

No, it's not. REG_KEYINPUT is 16-bit. The bits at 0400:0132 belong to REG_KEYCNT.

“By the way, almost all locations are 32-bit because the GBA is a 32-bit machine.”

Yes, the GBA is a 32-bit machine, but that has little bearing on how memory is being used or the bus sizes. It's true that the addresses themselves are 32-bit numbers, but most of the time they're accessed as halfwords.

Improper variable/inconsistency: BUTTONS (pp 362)

volatile unsigned int *BUTTONS = (volatile unsigned int *)0x04000130;

Again, a variable where a direct #define would have been better. In earlier projects, it was indeed a #define. One other benefit of having it as a #define is that you can also deference the pointer in the macro, rather than having to do it yourself all the time.


One interesting bit here is that the impotance of volatile is finally mentioned.

10.2 use of REG_KEYINPUT (pp 362-368)

This section covers an investigation of how REG_KEYINPUT (aka BUTTONS) works. While technically unnecessary as it's clearly explained in the reference documents (see gbatek:keyinput), it is an interesting idea. However, when investigating things on a low level, it is much handier to start looking at hex or binary values rather than decimals. Bit-patterns are much easier to see in hex/binary than in decimal numbers.

Specifically, it makes a loop like this unnecessary:

    //check for button presses
    for (n=1; n < 1000; n++)
        if (!((*BUTTONS) & n))
            sprintf(str, "BUTTON CODE = %i ", n);
            Print(10, 40, str, BLUE);

If one prints out the ~(*BUTTONS) value in hex, it's immediately obvious which button corresponds to which code.

    // Print out code for pressed button(s)
    sprintf(str, "BUTTON CODE = %04x", ~(*BUTTON));
    Print(10, 40, str, BLUE);

10.3 Creating a button handler

Separating buttons into an array (pp 374)

Since button input is a very low-level aspect of programming the GBA, it's helpful to move the actual memory reading code into a function and store the results of the buttons in an array. … The main benefit of polling all the buttons at the same time is that you are more likely to lose a button if there is a lot of code between each poll.”

The motivation is good, the implementation isn't. It's already covered in an earlier chapter, but if you're only interested in the status (that is, a one-bit value), it makes sense to keep it in bits and pack it into one variable. This makes subsequent use so much easier than if they're separated into different variables, not to mention faster. As far as I can see, the only time a button-array is useful is if you need to know how long they're pressed or something like that.

10.4 The ButtonHandler Program (pp 375-)

Bad info: key releases

“As you have seen, the button handler need not be complicated unless you want to detect button releases separately from presses.”

Indeed, it doesn't have to be complicated, but the handler in the book actually is more complicated than necessary. All you need is two variables to store successive keystates in, and a few bitmasks (see tonc:keys). That's all that's required to check the state, and transitions like releases.

“… because when it comes to GBA coding, all that matters is responding to a button press.”

No. Key releases are very important for things like charging and combos.

11 Chapter 11: ARM7 Assembly Language Primer

Incorrect name of chapter

The basic idea behind a language primer should be to show how the language works. This chapter doesn't. There is next to no information about the instruction set, what it can and cannot do, or how to properly integrate it with C. What is does do is show how you can assemble a file, and show some very slow assembly routines. For a more useful introduction in the nuts and bolts of ARM assembly, see tonc:asm.

Bad practice: batch files (pp 396)

While batch files are indeed easier to work with than makefiles for small programs, for anything beyond the most simplistic stuff makefiles are far better. Giving them up for batchfiles is a step in the wrong direction. Using batchfiles like it's done here – as the place to collect the compiler flags – is a particularly bad idea because it means you still have to run several of them to complete a build.

Bad info: assembly difficulty (pp 400)

“Indeed, assembly is difficult to master, because each assembly instruction translates directly to a CPU instruction!”

The basics of ARM assembly is actually pretty easy. In many ways, easier than C because the format is simpler. There are no chances of operator precedence snafus or typing problems, because each operator is an instruction and there is only one type: the word. What is difficult is mastering it, and warping your mind into the right state to make proper use of it.

Bad code: linker batchfile (pp 403)

The linker batch file can only be used on one file, but Real Projects usually have multiple files.

As an example of how things should go, see the devkitPro or third-tier tonc template makefiles. If these are a little too advanced, use the second-tier tonc makefiles, which are still relatively easy to understand for makefile newbies.

11.1 FirstAsm Program (pp 406)

I can't be 100% sure, but I'm quite confident that this code of this was simply taken directly from the gcc output. The presence of decimal numbers for addresses is usually a dead giveaway. These should have been converted to more reader-friendly hex-values. The value 67108864 means nothing; but its hex representation, 0x04000000 is, of course, recognizable as the IO register base. The same is true for 33554432. According to the comments, it's supposed to be VRAM, but 33554432 is 0x02000000, not 0x06000000. So I guess something went wrong here.

Also note the complete lack of explanation of the code, and why it's written the way it is. Why, for example, you can't just do `mov r2, #1027', but have to use a mov and an add? Well, it so happens that ARM cores can't use use constant values directly if they span more than a byte. It's things like these that make assembly tricky to use, and what explanatory texts should focus on.

11.2 DrawPixel32 (pp 409)

    .GLOBAL DrawPixel32
    stmfd   sp!,{r4-r5}     @ Save register r4 and r5 on stack
    mov     r4,#480         @ r4 = 480
    mul     r5,r4,r1        @ r5 = r4 * y
    add     r5,r5,r0,lsl #1 @ r5 = r5 + (x << 1)
    add     r4,r5,r3        @ r4 = r5 + videobuffer
    strh    r2,[r4]         @ *(unsigned short *)r4 = color
    ldmfd   sp!,{r4-r5}     @ Restore registers r4 and r5
    bx      lr

Incorrect name

The function has nothing to do with anything 32-anything. Usually, a 32 affix means there are 32-bit access somewhere, but this function doesn't have them. What's probably meant is that this was done with the ARM instruction set (32-bit) instead of Thumb (16-bit instructions), but that's mostly irrelevant from C's perspective.

Improper typing: videobuffer

The videobuffer parameter is an u32. Should have been a pointer. For the asm it doesn't matter, but it makes the use in C easier.

Inefficient code: DrawPixel32()

    stmfd sp!,{r4-r5}       @ Save register r4 and r5 on stack
    mov     r4,#480         @ r4 = 480
    mul     r5,r4,r1        @ r5 = r4 * y
    add     r5,r5,r0,lsl #1 @ r5 = r5 + (x << 1)
    add     r4,r5,r3        @ r4 = r5 + videobuffer
    strh    r2,[r4]         @ *(unsigned short *)r4 = color
    ldmfd   sp!,{r4-r5}     @ Restore registers r4 and r5
    bx lr

IBut this is possibly the worst implementation for plotting mode 3 pixels you could find. I'm sorry, but that's just how it is.

The code has 8 instructions; only 5 are necessary. You can do everything with 4 registers so that it's not necessary to use the stack. The multiplication can also be replaced with something simpler, because it can be done with shifted rsb and add's. It should have read:

//# declaration
void DrawPixel32(u32 x, u32 y, u32 color, u16 *dst);


    add     r0, r3, r0, lsl #1      @ r0= dst+x
    rsb     r1, r1, r1, lsl #4      @ r1= y*15
    add     r0, r0, r1, lsl #4      @ r0= &dst[x + y*15*(32/2)]
    strh    r2, [r0]                @ dst[x+y*240]= color
    bx      lr

In fact, that's what I get when I get from devkitArm's compiler now. The whole point of using assembly is to create code that'll run faster than a compiled equivalent. If you're making code that's actually slower, you're doing something very, very wrong. While it can be argued that optimised assembly is beyond the scope of a primer, in the case of assembly that's probably not the case as optimisation is the point of using assembly.

12 Conclusions

Well, that's about it I guess. The points mentioned here are the ones I could find on a first full read. While I admit that some of these points are more valid than others, I do think that each of them is worth thinking about. In the future I may add one or two points I've missed, or remove ones that are really just nitpicking.

22 thoughts on “The Annotated "Programming the Nintendo GBA : The Unofficial Guide"

  1. Pingback: blog

  2. Thanks for this. We used this book in a course for school, and I've been trying to explain why the book wasn't a great reference.

  3. Pingback: Jessie

  4. Pingback: covering letters format

  5. It's easy to stick a flag on top of a building, making observations about all of the design flaws in the building while taking the elevator ride up to the top, but in the end, all you have done is affixed a flag on top.

    First of all, this e-book was professionally edited and formatted, and then given away for free. I asked for a donation for the sources but have given away many copies of the sources for free to anyone who asked.

    Secondly, the book was written to be easy to understand, not highly optimized. The material is very challenging, very low level, and maybe didn't succeed at being easy to understand. There are some glaring omissions, like lack of a decent sprite animation example with multiple sprites. After all these years, I've added a few more example games and demos, including a very nice sprite class and sprite handler, as addendum.

    That being said... what's the point of this website? Did you pay for the book? Are you helping anyone by pointing out all the flaws in the free book? I disclaimed it in the introduction and first chapter that this book would teach the basics of the GBA. Even though it's not optimized code, I have gotten emails from guys who have used it to get a job working on handheld projects at game studios. I got an email from someone at Firaxis Games about it. There are teachers out there using this book and MY sources to teach courses without even asking, without even crediting MY hard work, let alone sending me a little email like "Hey, using your GBA stuff, it's been helpful, thanks." Nada. Nothing but YOU.

    I have a suggestion: Write your own book. If it's so easy, as you seem to believe with this errata-or-whatever site, I'm sure you would have no problem writing a book yourself--one that will be 100% balanced on both sides of the equation.

  6. Hey, pointing out flaws makes things better. Where would Open Source be without it?

    Also, if you can't handle (a little) criticism, don't write anything on anything. I could see him ripping much deeper.

    (And e-ghasp, he never attacked you personally, just your writing!)

  7. "That being said... what's the point of this website?"
    well, for starters... this WEBPAGE (not the site) was made to point out to absolute newbies that your book has some (serious) flaws in it, and that if they DO happen to learn off your book, to be wary of the flaws.

    "I have gotten emails from guys who have used it to get a job working on handheld projects at game studios."
    I believe that but... did they keep using YOUR coding practices? Doubt it, 'cause, like the page pointed out, the practices are... lame to say the least.

    "Nada. Nothing but YOU."
    Umm... you DO realize that's a personal insult yet he never insulted you, right?

    "I have a suggestion: Write your own book. If [...]"
    He pretty much did. Look at TONC, or are you THAT angry at him for writing up a great GBA tutorial?

  8. It's easy to stick a flag on top of a building, making observations about all of the design flaws in the building while taking the elevator ride up to the top, but in the end, all you have done is affixed a flag on top.

    Except that I'm not planting any flags; I'm trying to point out the flaws in the building. If in a building the floors are uneven, the support structures are rotten and the elevator cable is showing signs of fatigue, it's worth pointing those out to the tenants. In fact, it could be considered irresponsible not to.


    First of all, this e-book was professionally edited and formatted, and then given away for free. I asked for a donation for the sources but have given away many copies of the sources for free to anyone who asked.

    I'm not attacking the book on its formatting, spelling, grammer or layout; I'm talking about its mistakes and bad programming practices. The fact that it's free does not make those things go away.


    Secondly, the book was written to be easy to understand, not highly optimized. The material is very challenging, very low level, and maybe didn't succeed at being easy to understand.

    The book is easy to understand; that's not the problem. The problem is that much of the information is incorrect or leads to poor code. And I'm not asking for ‘highly’ optimized. I understand that optimization often comes at the cost of readability, but when it doesn't you might as well give the better version. This ultimately helps the readers, as they won't have to spend time finding a suitable replacement when the original function proves insufficient.

    For instance, the simple addition of inline to the pixel plotters already speeds things up by roughly a factor of three. This kind of thing comes at no expense to the reader at all. (And yes, I know that this is indeed done in the demos written later. But it's still absent from the book.)

    Another example is the mode 3 text system. The font that's advertised as having a low memory footprint is actually sixteen times too large and takes up a substantial bit of IWRAM. A 1 bpp bitpacked font would have worked out much better. If bitwork is considered too difficult, the font could have at least been compressed to bytes rather than halfwords. This could have been done by changing the declaration from `unsigned short font[]' to `const unsigned byte font[]'.


    Are you helping anyone by pointing out all the flaws in the free book?

    Yes, I am.

    I care deeply about correctness, efficiency and taking heed of recommended practices. I believe these things to be beneficial to the quality of whatever one is working on. This is especially true for GBA programming, where resources are sparse. It's part of why I like it.

    The point is, sloppiness is bad, and it deserves to be pointed out. this way, people with less experience can both avoid the bad practice, and perhaps recognize it when they encounter it somewhere else. I really do not see how this can be considered a bad thing.


    When it comes to learning something new, most people will simply copy what their sources tell them. In most cases this works out fine, but if the information has flaws, those flaws will be copied as being factual and possibly spread.

    The GBA homebrew scene is a good example of this. In the beginning, there was PERN. This was very easy guide to the basic elements of programming for the GBA, but it did contain some factual errors and quite some bad programming practices and style. Nearly all GBA programming guides that followed inherited some or all of its problems. This includes your book, which fortunately has done away
    with some of the problems, but unfortunately has added a few new ones as well.

    People have been basing their code on these guides and also adopted their bad practices. Dozens, perhaps hundreds of threads on the gbadev forum are related to these issues. Many people, me included, have been offering solutions to these problems; problems that would have never come up if people (especially the writers) had simply followed standard programming practices in the first place.

    Part of the reason I wrote Tonc was to warn people about the problems found in other tutorials, how to do things properly and why. This page is simply an extension of that. I could have written one for some of the other guides as well, but since the others don't have the presence that your book has (with good reason: it is superior in layout, presentation and coverage), I find it the only source that's worth doing it for.


    I know I sometimes sound … irritated in the text. This is because some of the mistakes simply confound me. For example, there are several different methods of vertical synching in the book and they're all wrong! That a well-willing amateur makes such a mistake is bad enough, but to see a professional writer and computer science teacher do this just boggles my mind.

    The only reason I can think of for this problem is that they were simply copied blindly from other tutorials without checking if they worked. You can't expect a student to catch something like this when the teacher has missed it repeatedly. This just illustrates the need for a document like this one.


    There are teachers out there using this book ...

    Yes, I know. That's part of the reason this page exists: to try to prevent erroneous information from being passed on to the students. For the record, just as there are people for whom it's been helpful, there are other who would like it to go away. See here, for example. I am not alone in this.


    ... Nothing but YOU.

    Please point out anything that I've taken from the book other than for the purpose of this document.


    I have a suggestion: Write your own book. If it's so easy, as you seem to believe with this errata-or-whatever site, I'm sure you would have no problem writing a book yourself--one that will be 100% balanced on both sides of the equation.

    In a manner of speaking, I have.

    I know very well how hard it can be, and how it feels when someone speaks badly about it. But just because it is hard work doesn't mean it can't be criticized. In fact, when it comes to a textbook, I think that flaws should be pointed out and if possible fixed. Students should be able to trust a textbook to give them accurate information and correct procedures. Feel free to disagree, but I believe it is the responsibility of an author to check the book's contents and to fix or at least point out any problems that are were missed at its release. If he does not, then somebody else might do it for him.

  9. To Jonathan:

    Instead of being so defensive, why don't you just fix the issues in your book? You are a professional programming author, so surely you care about correctness? In this case, the reason the GBA programming community recommends against your book, and highly recommends Cearn's TONC is because his is correct, and yours is not. We all make mistakes, and graciously accepting feedback and correction shows that you care about the quality of your work, and improves both your own knowledge, and allows the entire community to benefit. Cearn has corrected me multiple times, and I have learned and benefited from it.

    Fighting back in the manner you have done accomplishes nothing for yourself or the community -- other than turning people away from the rest of your books. Telling Cearn that he is "Nothing but a taker" is but a sophomoric jab, as he has spent a lot of time and energy providing the GBA community with the best tutorial that exists.

    So if you'd like us to take your book (or any of your other published books) seriously, put down your ego, learn from others, and everybody wins.

  10. Ok,
    So I'm glad I came across this and will read over it fully someday I imagine, that said I can understand some of John's annoyances, I mean first off I learnt programming the GBA fully off this book as much as I probably ever will and maybe his code was inefficient but it's introduced me to programming at a low level very well and I understand the principles which is all that was really asked for us on my course. I am no English boff but I believe professional writing even annotation's shouldn't reflect the author's emotions towards the subject, also perhaps you should put more thought into why he's done some of the things he's done.

    i.e.: Real C programmers know how to use compound assignment operators like &= and |=. Not using them makes the code longer, which is generally more difficult to parse, which makes it easier to make errors.

    Ok, what you say is true, no questions there but I remember a time when I wouldn't have understood compound assignment operators easily, perhaps John was aiming to make it GBA Programming for Dummies in this case. Assuming that he just didn't know of the possibility is rather short sighted on your part, like I say I haven't read the document in full and I may be jumping to conclusions that the rest of it is like this.

    What both you AND John need to remember was his book was a first edition, some mistakes are expected, especially in programming books where the code is probably not being edited sufficiently, still John acknowledged it was a First Edition and has openly stated he doesn't intend to work on a full 2nd edition preferring to add bit's he never had the opportunity to write, so he can't get upset if someone wants to correct the mistakes he's made.

    I hope I haven't offended anyone and I intend to email this site to my course tutor with the suggestion he includes it for next year.


  11. I had forgotten about this web site until another college professor mentioned it again. (Sigh). I acknowledge all of the points made here by visitors since my last post a couple years ago. What almost no one has bothered to notice, however, is that my ill-fated GBA book was written in early 2003, while this web site is dated 2007-08. My irritation in the first post was based on this web site author's presumption to correct me on scores of issues FIVE YEARS after the fact. Fine, so he knows a lot more about the GBA than I do, FIVE YEARS LATER. I used the material to teach a course for about three years myself, and came up with a whole ton of new code, a graphics library, scrolling examples, a fully-working heap memory manager, image tools, and C++ sprite classes to manage animation. None of this new code was squeezed back into the old e-book since I didn't have the time or desire to write any more on this topic.

    When I started writing, back in late 2002 and into spring 2003, there was next to nothing available online about the GBA, besides a lot of very difficult hardware guides and a few very confusing examples on the fan sites, filled with extremely low-level hardware code, almost no middle or high level function libraries whatsoever... literally, every example I came upon was filled with hard-coded hardware registers and bit-shifting. Taking those electrical engineering-level examples and trying to make legible examples out of them, transcribing many of the registers into #defines, and offering functions, was not without problems. (All eloquently pointed out here on this web site with much sarcasm). I was working with Andr Lamothe and Joe Grand, who both seemed to find my code interesting. A quick summary of the GBA can be found in chapter 5 of Grand's book "Game Console Hacking".

    I was offended that this web site author had come along many years later to criticize something when he clearly was already a quite advanced GBA programmer who did not need such a book. But I don't mind nearly as much as my original post conveys. So many years have passed that its hardly worth discussing now... surely everyone has moved on to the NDS or PSP.

    For the record, my first reaction was a bit strong, but it's no big deal. I'm glad people have found this web site and learned more about the GBA from it. If my very early work with the GBA helped anyone, and led to some excellent tutorials with only good programming practices in them, then I'm glad and it was worth the effort.

  12. Have you ever considered creating an ebook or guest authoring on other blogs?
    I have a blog centered on the same topics you discuss and would love to have you share some stories/information.
    I know my visitors would enjoy your work. If you're even remotely interested, feel free to shoot me an e-mail.

  13. "My irritation in the first post was based on this web site author's presumption to correct me on scores of issues FIVE YEARS after the fact." - jsharbour

    in fact it does not matter how old a book is. it does matter wether the information in it is wrong no matter how old it is. its more important to have a valid resource with trustable information. and since the book was re-released in 2012 the post might still be of interest for some buyers. in fact its the task of a serious author to provide errata. other authors do that as well.

    "So many years have passed that its hardly worth discussing now... surely everyone has moved on to the NDS or PSP." - jsharbour

    it is still funny to read these comments. in fact the book was re-released as a kindle ebook in 2012. if the advices have not been integrated into the new book that would be a pretty bad sign for a college professor who teaches this subject. if so the post is up to date again and not "FIVE YEARS after the fact" anymore. :)

    coronac: good work btw, well done.

  14. before i forget it:

    "I was working with Andr� Lamothe and Joe Grand, who both seemed to find my code interesting."

    Lamothe is NOT an indicator for good books, more or less the opposite. I earn enough Lamothe books to say that. Just read the customer reviews on amazon. Everybody knows that 50 percent of the content in his books is bad c source code. Good authors would put source on a cd and write about algorithms. He uses source to fill books. I still wonder why people think he is a game programming guru just because he is writing the foreword of most of the badly written books on game programming by amateur authors!?

  15. Wow, this has been a very interesting time-machine-experience for me. I'm about to set up a project to finally learn how to make GBA games and I find it very exciting! I would like to thank both Jasper Vijn and Jonathan Harbour for having spent a lot of your time and effort into teaching the rest of the world about GBA development. I want both of you to realise how important you have been for the establishment of the GBA development documentation and you should both be very proud of yourselves. THANK YOU SO MUCH!

    Time to start making games, yeah! I owe it all to you guys! :)
    Would be nice to be able to repay you guys in some way at some point, maybe when my game is done you can have a copy of it. Sharing is caring, happy days! :D

  16. I'm a 15 year old aspiring game Dev and J.S. harbour's book definitely made me have some bad practices in GBA development. Thankfully I found Tonc by the chapter where Harbour had us doing the button tests lol. I'm just happy I found this early on or I would have been subjected to even worse programming practice and false/incorrect information regarding GBA development xD however, I do thank you both (Coranac and Harbour) for releasing these wonderful materials and allowing me to learn from them and their mistakes (emphasis on mistakes for Harbour haha xD)

  17. This page came up again after so many years. I don't remember writing those two earlier posts, as it was so long ago--wish I hadn't, but back then I had a hard time with hate mail--the same sort you see on Youtube. Over a free e-book--can you believe some people? Anyway, this was about 8-12 years ago when it was relevant. Wow! This was a lot of fun back in the day, though.

    I wanted to reply to aeryck and Kaleb since I didn't see your posts. Glad you enjoyed it too! Fun times.

    There were about a dozen stories at the time about guys who were hired after putting together a GBA or Dolphin SDK demo for their portfolios--EA, 2K, Vicarious, when I was teaching at UAT. Just having a head start with the GBA architecture was enough, not necessarily experience with the legit tools. During the peak, 2008-ish, I was working with a colleague from LucasArts on a DS version of Memoir 44 (Days of Wonder). What a crazy idea! The DS was a lot tougher than the GBA, so all we really had available was a video buffer, no hardware. The SDK was similar to the Dolphin, nothing like the GBA.

    p.s. He (DW) is with an indie studio now called Impeller, and they've got a sci-fi game on KS called STARFIGHTER INC. Check out the team, some cool guys behind this game, some good industry experience. (expires 6/06/15).

  18. jsharbour brings shame on his entire family

  19. Hello admin, i've been reading your website for some time
    and I really like coming back here. I can see that you probably don't make money on your
    website. I know one awesome method of earning money, I think you will like
    it. Search google for: dracko's tricks

  20. Howdy

    SEO Link building is a process that requires a lot of time fo
    If you aren't using SEO software then you will know the amount of work load involved in creating accounts, confirming emails and submitting your contents to thousands of websites in proper time and completely automated.

    With THIS SOFTWARE the link submission process will be the easiest task and completely automated, you will be able to build unlimited number of links and increase traffic to your websites which will lead to a higher number of customers and much more sales for you.
    With the best user interface ever, you just need to have simple software knowledge and you will easily be able to make your own SEO link building campaigns.

    The best SEO software you will ever own, and we can confidently say that there is no other software on the market that can compete with such intelligent and fully automatic features.
    The friendly user interface, smart tools and the simplicity of the tasks are making THIS SOFTWARE the best tool on the market.



Leave a Reply

Your email address will not be published. Required fields are marked *