mode 7 addendum

Okay. Apparently, I am an idiot who can't do math.

 

One of the longer chapters in Tonc is Mode 7 part 2, which covers pretty much all the hairy details of producing mode 7 effects on the GBA. The money shot for in terms of code is the following functions, which calculates the affine parameters of the background for each scanline in section 21.7.3.

IWRAM_CODE void m7_prep_affines(M7_LEVEL *level)
{
    if(level->horizon >= SCREEN_HEIGHT)
        return;

    int ii, ii0= (level->horizon>=0 ? level->horizon : 0);

    M7_CAM *cam= level->camera;
    FIXED xc= cam->pos.x, yc= cam->pos.y, zc=cam->pos.z;

    BG_AFFINE *bga= &level->bgaff[ii0];

    FIXED yb, zb;           // b' = Rx(theta) *  (L, ys, -D)
    FIXED cf, sf, ct, st;   // sines and cosines
    FIXED lam, lcf, lsf;    // scale and scaled (co)sine(phi)
    cf= cam->u.x;      sf= cam->u.z;
    ct= cam->v.y;      st= cam->w.y;
    for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
    {
        yb= (ii-M7_TOP)*ct + M7_D*st;
        lam= DivSafe( yc<<12,  yb);     // .12f    <- OI!!!

        lcf= lam*cf>>8;                 // .12f
        lsf= lam*sf>>8;                 // .12f

        bga->pa= lcf>>4;                // .8f
        bga->pc= lsf>>4;                // .8f

        // lambda·Rx·b
        zb= (ii-M7_TOP)*st - M7_D*ct;   // .8f
        bga->dx= xc + (lcf>>4)*M7_LEFT - (lsf*zb>>12);  // .8f
        bga->dy= zc + (lsf>>4)*M7_LEFT + (lcf*zb>>12);  // .8f

        // hack that I need for fog. pb and pd are unused anyway
        bga->pb= lam;
        bga++;
    }
    level->bgaff[SCREEN_HEIGHT]= level->bgaff[0];
}

For details on what all the terms mean, go the page in question. For now, just note that call to DivSafe() to calculate the scaling factor λ and recall that division on the GBA is pretty slow. In Mode 7 part 1, I used a LUT, but here I figured that since the yb term can be anything thanks to the pitch you can't do that. After helping Ruben with his mode 7 demo, it turns out that you can.

 

Fig 1. Sideview of the camera and floor. The camera is tilted slightly down by angle θ.

Fig 1 shows the situation. There is a camera (the black triangle) that is tilted down by pitch angle θ. I've put the origin at the back of the camera because it makes things easier to read. The front of the camera is the projection plane, which is essentially the screen. A ray is cast from the back of the camera on to the floor and this ray intersects the projection plane. The coordinates of this point are xp = (yp, D) in projection plane space, which corresponds to point (yb, zb) in world space. This is simply rotating point xp by θ. The scaling factor is the ratio between the y or z coordinates of the points on the floor and on the projection plane, so that's:

\lambda = y_c / y_b,

and for yb the rotation gives us:

y_b = y_p cos \theta + D sin \theta,

where yc is the camera height, yp is a scanline offset (measured from the center of the screen) and D is the focus length.

Now, the point is that while yb is variable and non-integral when θ ≠ 0, it is still bounded! What's more, you can easily calculate its maximum value, since it's simply the maximum length of xp. Calling this factor R, we get:

R = \sqrt{max(y_p)^2 + D^2}

This factor R, rounded up, is the size of the required LUT. In my particular case, I've used yp= scanline−80 and D = 256, which gives R = sqrt((160−80)² + 256²) = 268.2. In other words, I need a division LUT with 269 entries. Using .16 fixed point numbers for this LUT, the replacement code is essentially:

// The new division LUT. For 1/0 and 1/1, 0xFFFF is used.
u16 m7_div_lut[270]=
{
    0xFFFF, 0xFFFF, 0x8000, 0x5556, ...
};


// Inside the function
    for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
    {
        yb= (ii-M7_TOP)*ct + M7_D*st;           // .8
        lam= (yc*m7_div_lut[yb>>8])>>12;        // .8*.16/.12 = .12
       
        ... // business as usual
    }

At this point, several questions may arise.

  • What about negative yb? The beauty here is that while yb may be negative in principle, such values would correspond to lines above the horizon and we don't calculate those anyway.
  • Won't non-integral yb cause inaccurate look-ups? True, yb will have a fractional part that is simply cut off during a simple look-up and some sort of interpolation would be better. However, in testing there were no noticeable differences between direct look-up, lerped look-up or using Div(), so the simplest method suffices.
  • Are .16 fixed point numbers enough?. Yes, apparently so.
  • ZOMG OVERFLOW! Are .16 fixed point numbers too high? Technically, yes, there is a risk of overflow when the camera height gets too high. However, at high altitudes the map is going to look like crap anyway due to the low resolution of the screen. Furthermore, the hardware only uses 8.8 fixeds, so scales above 256.0 wouldn't work anyway.

And finally:

  • What do I win? With Div() m7_prep_affines() takes about 51k cycles. With the direct look-up this reduces to about 13k: a speed increase by a factor of 4.
 

So yeah, this is what I should have figured out years ago, but somehow kept overlooking it. I'm not sure if I'll add this whole thing to Tonc's text and code, but I'll at least put up a link to here. Thanks Ruben, for showing me how to do this properly.

tonc 1.4 official release

The files have need downloadable for a while now as a preview, but I finally put the text up on the main site as well so I guess that makes it official. Tonc is now at version 1.4. As mentioned before, the main new thing is TTE, a system for text for all occasions. I've also used grit in some of the advanced demos, so if you want to see how you can do advanced work with it, check out the mode 7 demos and the tte demo.

This will be the last version of Tonc. It's really gone on long enough now.


Files and linkies :


Right! Now what …

tonc 1.4 preview

I'm close to releasing the latest (and probably last; this really has gone on long enough) version of Tonc. As a preview, I'm releasing the PDF a little early in the hope that someone may take a look and offer some feedback before the official release (aw, c'mon, it's only 400 pages).

The changes mostly relate to the new Tonc Text Engine, a text system for all occasions. There's a new chapter describing how TTE works, how to write general character printers for (almost) for arbitrary sized fonts and every type of graphics, and a few other things. It's fairly long and could use sanity checking from someone else.

Also, many of the older demos now use TTE for their text as well. As a result they look cleaner and prettier, but it's possible there are some left-overs from older versions. So have at it.

Surface drawing routines.

I've been building a basic interface for dealing with graphic surfaces lately. I already had most of the routines for 16bpp and 8bpp bitmaps in older Toncs, but but their use was still somewhat awkward because you had to provide some details of the destination manually; most notably a base pointer and the pitch. This got more than a little annoying, especially when trying to make blitters as well. So I made some changes.


typedef struct TSurface
{
    u8  *data;      //!< Surface data pointer.
    u32 pitch;      //!< Scanline pitch in bytes (PONDER: alignment?).
    u16 width;      //!< Image width in pixels.
    u16 height;     //!< Image width in pixels.
    u8  bpp;        //!< Bits per pixel.
    u8  type;       //!< Surface type (not used that much).
    u16 palSize;    //!< Number of colors.
    u16 *palData;   //!< Pointer to palette.
} TSurface;

I've rebuilt the routines around a surface description struct called TSurface (see above). This way, I can just initialize the surface somewhere and just pass the pointer to that surface around. There are a number of different kinds of surfaces. The most important ones are these three:

  • bmp16. 16bpp bitmap surfaces.
  • bmp8. 8bpp bitmap surfaces.
  • chr4c. 4bpp tiled surfaces, in column-major order (i.e., tile 1 is under tile 0 instead of to the right). Column-major order may seem strange, but it actually simplifies the code considerably. There is also a chr4r mode for normal, row-major tiling, but that's unfinished and will probably remain so.
surface.gba movie
Demonstrating surface routines for 4bpp tiles.

For each of these three, I have the most important rendering functions: plotting pixels, lines, rectangles and blits. Yes, blits too. Even for chr4c-mode. There are routines for frames (empty rectangles) and floodfill as well. The functions have a uniform interface with respect to surface-type, so switching between them should be easy were it necessary. There are also tables with function pointers to these routines, so by using those you need not really care about the details of the surface after its creation. I'll probably add a pointer to such a table in TSurface in the future.


Linkies


The image on the right is the result of the following routine. Turret pic semi-knowingly provided by Kawa.

void test_surface_procs(const TSurface *src, TSurface *dst,
    const TSurfaceProcTab *procs, u16 colors[])
{
    // Init object text
    tte_init_obj(&oam_mem[127], ATTR0_TALL, ATTR1_SIZE_8, 512,
        CLR_YELLOW, 0, &vwf_default, NULL);
    tte_init_con();
    tte_set_margins(8, 140, 160, 152);

    // And go!
    tte_printf("#{es;P}%s surface primitives#{w:60}", procs->name);

    tte_printf("#{es;P}Rect#{w:20}");
    procs->rect(dst, 20, 20, 100, 100, colors[0]);

    tte_printf("#{w:30;es;P}Frame#{w:20}");
    procs->frame(dst, 21, 21, 99, 99, colors[1]);

    tte_printf("#{w:30;es;P}Hlines#{w:20}");

    procs->hline(dst, 23, 23, 96, colors[2]);
    procs->hline(dst, 23, 96, 96, colors[2]);

    tte_printf("#{w:30;es;P}Vlines#{w:20}");
    procs->vline(dst, 23, 25, 94, colors[3]);
    procs->vline(dst, 96, 25, 94, colors[3]);

    tte_printf("#{w:30;es;P}Lines#{w:20}");
    procs->line(dst, 25, 25, 94, 40, colors[4]);
    procs->line(dst, 94, 25, 79, 94, colors[4]);
    procs->line(dst, 94, 94, 25, 79, colors[4]);
    procs->line(dst, 25, 94, 40, 25, colors[4]);

    tte_printf("#{w:30;es;P}Full blit#{w:20}");
    procs->blit(dst, 120, 16, src->width, src->height, src, 0, 0);

    tte_printf("#{w:30;es;P}Partial blit#{w:20}");
    procs->blit(dst, 40, 40, 40, 40, src, 12, 8);

    tte_printf("#{w:30;es;P}Floodfill#{w:20}");
    procs->flood(dst, 40, 32, colors[5]);
    tte_printf("#{w:30;es;P}Again !#{w:20}");
    procs->flood(dst, 40, 32, colors[6]);

    tte_printf("#{w:30;es;P;w:30}Ta-dah!!!#{w:20}");

    key_wait_till_hit(KEY_ANY);
}

// Test 4bpp tiled, column-major surfaces
void test_chr4c_procs()
{
    TSurface turret, dst;

    // Init turret for blitting.
    srf_init(&turret, SRF_CHR4C, turretChr4cTiles, 128, 128, 4, NULL);

    // Init destination surface
    srf_init(&dst, SRF_CHR4C, tile_mem[0], 240, 160, 4, pal_bg_mem);
    schr4c_prep_map(&dst, se_mem[31], 0);
    GRIT_CPY(pal_bg_mem, turretChr4cPal);

    // Set video stuff
    REG_DISPCNT= DCNT_MODE0 | DCNT_BG2 | DCNT_OBJ | DCNT_OBJ_1D;
    REG_BG2CNT= BG_CBB(0)|BG_SBB(31);

    u16 colors[8]= { 6, 13, 1, 14, 15, 0, 14, 0 };

    // Run internal tester
    test_surface_procs(&turret, &dst, &chr4c_tab, colors);
}

Usenti 1.7.8 and TTE demo

One major and some smaller changes to usenti. The major one is that there is now a font exporter that can convert bitmaps to TTE-usable fonts. I'm not sure if it's final yet, but any later changes should be small. The text-tool has been altered to facilitate creation of fonts by adding an opaque mode, an align-to-grid option and proper clipboard support (so you should be able to just copy an ASCII table into it, at which point the font is practically made already).

Also, there are separate pasting modes: one that matches the colors to the current palette (potentially mixing up the colors) and a direct-pixel paste, regardless of colors. Thanks, gauauu, for finally making me do this. Secondly, Kawa's been badgering me (politely) about editing colors in raw hex rather than via RGB triplets. This can be found, for various bad reasons, under the Palette menu under ‘Advanced color edit’.

Both of these items have … interesting side effects. Through the former, you can replace colors of a given palette-entry by copy-all, swap and paste. The color-edit accepts multiple colors separated by white-space and, later on, by commas as well, meaning that accidentally it's now possible to take previously exported palettes and add them again. Yes, it's feep, but it's interesting, somewhat hidden and cheap feep, so that's alright. I'm thinking about adding something similar for the image itself as well (plus raw image imports), but only when I'm bored enough.


To show off the font exporter and the TTE system itself, there is a little demo of what it can do here. I'm still pondering over what it should and should not be able to do, but most of the things shown in the demo would be in the final version as well.


Oh, RE: that wordpress bug. A v2.3.1 has been released now with the fix. Interesting factoid: the error (#5088) was classified as “highestomgbbq”.