Coranac

mode 7 addendum

2009-04-19 – 18:32 | .

Okay. Apparently, I am an idiot who can't do math.

 

One of the longer chapters in Tonc is Mode 7 part 2, which covers pretty much all the hairy details of producing mode 7 effects on the GBA. The money shot for in terms of code is the following functions, which calculates the affine parameters of the background for each scanline in section 21.7.3.

IWRAM_CODE void m7_prep_affines(M7_LEVEL *level)
{
    if(level->horizon >= SCREEN_HEIGHT)
        return;

    int ii, ii0= (level->horizon>=0 ? level->horizon : 0);

    M7_CAM *cam= level->camera;
    FIXED xc= cam->pos.x, yc= cam->pos.y, zc=cam->pos.z;

    BG_AFFINE *bga= &level->bgaff[ii0];

    FIXED yb, zb;           // b' = Rx(theta) *  (L, ys, -D)
    FIXED cf, sf, ct, st;   // sines and cosines
    FIXED lam, lcf, lsf;    // scale and scaled (co)sine(phi)
    cf= cam->u.x;      sf= cam->u.z;
    ct= cam->v.y;      st= cam->w.y;
    for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
    {
        yb= (ii-M7_TOP)*ct + M7_D*st;
        lam= DivSafe( yc<<12,  yb);     // .12f    <- OI!!!

        lcf= lam*cf>>8;                 // .12f
        lsf= lam*sf>>8;                 // .12f

        bga->pa= lcf>>4;                // .8f
        bga->pc= lsf>>4;                // .8f

        // lambda·Rx·b
        zb= (ii-M7_TOP)*st - M7_D*ct;   // .8f
        bga->dx= xc + (lcf>>4)*M7_LEFT - (lsf*zb>>12);  // .8f
        bga->dy= zc + (lsf>>4)*M7_LEFT + (lcf*zb>>12);  // .8f

        // hack that I need for fog. pb and pd are unused anyway
        bga->pb= lam;
        bga++;
    }
    level->bgaff[SCREEN_HEIGHT]= level->bgaff[0];
}

For details on what all the terms mean, go the page in question. For now, just note that call to DivSafe() to calculate the scaling factor λ and recall that division on the GBA is pretty slow. In Mode 7 part 1, I used a LUT, but here I figured that since the yb term can be anything thanks to the pitch you can't do that. After helping Ruben with his mode 7 demo, it turns out that you can.

 

Fig 1. Sideview of the camera and floor. The camera is tilted slightly down by angle θ.

Fig 1 shows the situation. There is a camera (the black triangle) that is tilted down by pitch angle θ. I've put the origin at the back of the camera because it makes things easier to read. The front of the camera is the projection plane, which is essentially the screen. A ray is cast from the back of the camera on to the floor and this ray intersects the projection plane. The coordinates of this point are xp = (yp, D) in projection plane space, which corresponds to point (yb, zb) in world space. This is simply rotating point xp by θ. The scaling factor is the ratio between the y or z coordinates of the points on the floor and on the projection plane, so that's:

\lambda = y_c / y_b,

and for yb the rotation gives us:

y_b = y_p cos \theta + D sin \theta,

where yc is the camera height, yp is a scanline offset (measured from the center of the screen) and D is the focus length.

Now, the point is that while yb is variable and non-integral when θ ≠ 0, it is still bounded! What's more, you can easily calculate its maximum value, since it's simply the maximum length of xp. Calling this factor R, we get:

R = \sqrt{max(y_p)^2 + D^2}

This factor R, rounded up, is the size of the required LUT. In my particular case, I've used yp= scanline−80 and D = 256, which gives R = sqrt((160−80)² + 256²) = 268.2. In other words, I need a division LUT with 269 entries. Using .16 fixed point numbers for this LUT, the replacement code is essentially:

// The new division LUT. For 1/0 and 1/1, 0xFFFF is used.
u16 m7_div_lut[270]=
{
    0xFFFF, 0xFFFF, 0x8000, 0x5556, ...
};


// Inside the function
    for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
    {
        yb= (ii-M7_TOP)*ct + M7_D*st;           // .8
        lam= (yc*m7_div_lut[yb>>8])>>12;        // .8*.16/.12 = .12
       
        ... // business as usual
    }

At this point, several questions may arise.

  • What about negative yb? The beauty here is that while yb may be negative in principle, such values would correspond to lines above the horizon and we don't calculate those anyway.
  • Won't non-integral yb cause inaccurate look-ups? True, yb will have a fractional part that is simply cut off during a simple look-up and some sort of interpolation would be better. However, in testing there were no noticeable differences between direct look-up, lerped look-up or using Div(), so the simplest method suffices.
  • Are .16 fixed point numbers enough?. Yes, apparently so.
  • ZOMG OVERFLOW! Are .16 fixed point numbers too high? Technically, yes, there is a risk of overflow when the camera height gets too high. However, at high altitudes the map is going to look like crap anyway due to the low resolution of the screen. Furthermore, the hardware only uses 8.8 fixeds, so scales above 256.0 wouldn't work anyway.

And finally:

  • What do I win? With Div() m7_prep_affines() takes about 51k cycles. With the direct look-up this reduces to about 13k: a speed increase by a factor of 4.
 

So yeah, this is what I should have figured out years ago, but somehow kept overlooking it. I'm not sure if I'll add this whole thing to Tonc's text and code, but I'll at least put up a link to here. Thanks Ruben, for showing me how to do this properly.

tonc 1.4 official release

2008-08-19 – 15:05 | .

The files have need downloadable for a while now as a preview, but I finally put the text up on the main site as well so I guess that makes it official. Tonc is now at version 1.4. As mentioned before, the main new thing is TTE, a system for text for all occasions. I've also used grit in some of the advanced demos, so if you want to see how you can do advanced work with it, check out the mode 7 demos and the tte demo.

This will be the last version of Tonc. It's really gone on long enough now.


Files and linkies :


Right! Now what …

tonc 1.4 preview

2008-05-26 – 22:54 | .

I'm close to releasing the latest (and probably last; this really has gone on long enough) version of Tonc. As a preview, I'm releasing the PDF a little early in the hope that someone may take a look and offer some feedback before the official release (aw, c'mon, it's only 400 pages).

The changes mostly relate to the new Tonc Text Engine, a text system for all occasions. There's a new chapter describing how TTE works, how to write general character printers for (almost) for arbitrary sized fonts and every type of graphics, and a few other things. It's fairly long and could use sanity checking from someone else.

Also, many of the older demos now use TTE for their text as well. As a result they look cleaner and prettier, but it's possible there are some left-overs from older versions. So have at it.

Surface drawing routines.

2008-05-14 – 18:19 | .

I've been building a basic interface for dealing with graphic surfaces lately. I already had most of the routines for 16bpp and 8bpp bitmaps in older Toncs, but but their use was still somewhat awkward because you had to provide some details of the destination manually; most notably a base pointer and the pitch. This got more than a little annoying, especially when trying to make blitters as well. So I made some changes.


typedef struct TSurface
{
    u8  *data;      //!< Surface data pointer.
    u32 pitch;      //!< Scanline pitch in bytes (PONDER: alignment?).
    u16 width;      //!< Image width in pixels.
    u16 height;     //!< Image width in pixels.
    u8  bpp;        //!< Bits per pixel.
    u8  type;       //!< Surface type (not used that much).
    u16 palSize;    //!< Number of colors.
    u16 *palData;   //!< Pointer to palette.
} TSurface;

I've rebuilt the routines around a surface description struct called TSurface (see above). This way, I can just initialize the surface somewhere and just pass the pointer to that surface around. There are a number of different kinds of surfaces. The most important ones are these three:

  • bmp16. 16bpp bitmap surfaces.
  • bmp8. 8bpp bitmap surfaces.
  • chr4c. 4bpp tiled surfaces, in column-major order (i.e., tile 1 is under tile 0 instead of to the right). Column-major order may seem strange, but it actually simplifies the code considerably. There is also a chr4r mode for normal, row-major tiling, but that's unfinished and will probably remain so.
surface.gba movie
Demonstrating surface routines for 4bpp tiles.

For each of these three, I have the most important rendering functions: plotting pixels, lines, rectangles and blits. Yes, blits too. Even for chr4c-mode. There are routines for frames (empty rectangles) and floodfill as well. The functions have a uniform interface with respect to surface-type, so switching between them should be easy were it necessary. There are also tables with function pointers to these routines, so by using those you need not really care about the details of the surface after its creation. I'll probably add a pointer to such a table in TSurface in the future.


Linkies


The image on the right is the result of the following routine. Turret pic semi-knowingly provided by Kawa.

void test_surface_procs(const TSurface *src, TSurface *dst,
    const TSurfaceProcTab *procs, u16 colors[])
{
    // Init object text
    tte_init_obj(&oam_mem[127], ATTR0_TALL, ATTR1_SIZE_8, 512,
        CLR_YELLOW, 0, &vwf_default, NULL);
    tte_init_con();
    tte_set_margins(8, 140, 160, 152);

    // And go!
    tte_printf("#{es;P}%s surface primitives#{w:60}", procs->name);

    tte_printf("#{es;P}Rect#{w:20}");
    procs->rect(dst, 20, 20, 100, 100, colors[0]);

    tte_printf("#{w:30;es;P}Frame#{w:20}");
    procs->frame(dst, 21, 21, 99, 99, colors[1]);

    tte_printf("#{w:30;es;P}Hlines#{w:20}");

    procs->hline(dst, 23, 23, 96, colors[2]);
    procs->hline(dst, 23, 96, 96, colors[2]);

    tte_printf("#{w:30;es;P}Vlines#{w:20}");
    procs->vline(dst, 23, 25, 94, colors[3]);
    procs->vline(dst, 96, 25, 94, colors[3]);

    tte_printf("#{w:30;es;P}Lines#{w:20}");
    procs->line(dst, 25, 25, 94, 40, colors[4]);
    procs->line(dst, 94, 25, 79, 94, colors[4]);
    procs->line(dst, 94, 94, 25, 79, colors[4]);
    procs->line(dst, 25, 94, 40, 25, colors[4]);

    tte_printf("#{w:30;es;P}Full blit#{w:20}");
    procs->blit(dst, 120, 16, src->width, src->height, src, 0, 0);

    tte_printf("#{w:30;es;P}Partial blit#{w:20}");
    procs->blit(dst, 40, 40, 40, 40, src, 12, 8);

    tte_printf("#{w:30;es;P}Floodfill#{w:20}");
    procs->flood(dst, 40, 32, colors[5]);
    tte_printf("#{w:30;es;P}Again !#{w:20}");
    procs->flood(dst, 40, 32, colors[6]);

    tte_printf("#{w:30;es;P;w:30}Ta-dah!!!#{w:20}");

    key_wait_till_hit(KEY_ANY);
}

// Test 4bpp tiled, column-major surfaces
void test_chr4c_procs()
{
    TSurface turret, dst;

    // Init turret for blitting.
    srf_init(&turret, SRF_CHR4C, turretChr4cTiles, 128, 128, 4, NULL);

    // Init destination surface
    srf_init(&dst, SRF_CHR4C, tile_mem[0], 240, 160, 4, pal_bg_mem);
    schr4c_prep_map(&dst, se_mem[31], 0);
    GRIT_CPY(pal_bg_mem, turretChr4cPal);

    // Set video stuff
    REG_DISPCNT= DCNT_MODE0 | DCNT_BG2 | DCNT_OBJ | DCNT_OBJ_1D;
    REG_BG2CNT= BG_CBB(0)|BG_SBB(31);

    u16 colors[8]= { 6, 13, 1, 14, 15, 0, 14, 0 };

    // Run internal tester
    test_surface_procs(&turret, &dst, &chr4c_tab, colors);
}

Artsy fartsy

2008-04-10 – 23:20 | .

I've been working on a few functions for rendering onto tiles recently. Yesterday was the turn of a rectangle filler. The traditional routine of double-looping over a pixel-plotter would be slow in every case, but for tiled surfaces it's positively evil, so I made something that divides the rectangle in 5 areas and fills them using by words or better. Yes, this is a little tricky but I figured the speed increase of up to 300 would be worth it.

For testing purposes, I filled each region with a different color so that ifwhen something went wrong, I could easily identify the problem. When playing around with the test app, I more or less accidentally came up with this:

accidental mondriaan

Hmmm ... Mondriaany.

Anyway, it seems that this thing went alright. So now tonclib also has plot, hline, vline, line, rect and frame functions for 4bpp tiled modes. No, there's no blitting yet. In anyone wants that, I'm going to insist on some mental hazard pay.

Tonc:setup update

2008-02-17 – 12:04 | .

Finally got round to updating Tonc's dev setup page. It finally mentions devkitPro's template makefiles and the basics of how to use them. I've also added a list of potential problems you may encounter when installing/upgrading devkitARM or just building projects. I have not updated the downloadables yet because there's still a few unfinished edits there. I just wanted to get this one out of the way because it's so very, very overdue.

The standard C functions for copying and filling are memcpy() and memset(). They're part of the standard library, are easy to use and are often implemented with some optimizations so that they're usually faster than manual looping. The DKA version, for example will fill as words if the alignments and sizes allow for it. This can be much faster than doing the loops yourself.

There is, however, one small annoying fact about these two: they're not VRAM-safe. If the alignment and size aren't right for the word transfers, they will transfer bytes. Not only will this be slow, of course, but because you can't write to VRAM in bytes, the data will be corrupted.

The solutions for this have mostly come down to “so don't do that then”. Often, this can be sufficient: tiles in VRAM are word-aligned by definition, and source graphics data can and should be word-aligned anyway. However, now that I'm finally working on a bitmap blitter for 8bpp and 16bpp, I find that it's simply not enough. So I wrote the following set of functions to serve as replacements.

The code

My main goal here was to create smallish and portable replacements, not to have the greatest and fastestest code around because that's rather platform dependent. Yes, even the difference between GBA and NDS should matter, because of the differences in ldr/str times and caching.

There are 5 functions here. The main functions here are tonccpy and __toncset for copying and filling words, respectively. The other 3 are interfaces for __toncset for filling 8-bit, 16-bit and 32-bit data; you need these for, say, filling with a color instead of 8-bit data. For the rest of the discussion, I will use the name “toncset” for the internal routine for convenience.

(more...)

Tonc cleanup and fixes

2007-12-06 – 1:21 | .

I've uploaded some newer tonc files today, mostly for devkitArm r21 compatibility. Because the linkscript for multiboot got b0rken somehow, I had to change all the demos to default to cart-builds. I intended to change to cart-boot later anyway, but not being able to build the demos properly with the latest devkit kinda forces me to do it now. I've also had to change the tier-3 makefiles because it used `-fno-expections' in the CXXFLAGS. This should of course have been `-fno-exceptions' (thanks muff).

There have also been a few changes in the text parts: build-specs are set to cart-boot there too now, and I fixed some broken links. I've also fixed a slew of spelling and grammar issues that Patater sent in. These part of the text shouldn't be gibberish anymore – just unintelligible :P.


Usenti 1.7.8 and TTE demo

2007-10-30 – 0:15 | .

One major and some smaller changes to usenti. The major one is that there is now a font exporter that can convert bitmaps to TTE-usable fonts. I'm not sure if it's final yet, but any later changes should be small. The text-tool has been altered to facilitate creation of fonts by adding an opaque mode, an align-to-grid option and proper clipboard support (so you should be able to just copy an ASCII table into it, at which point the font is practically made already).

Also, there are separate pasting modes: one that matches the colors to the current palette (potentially mixing up the colors) and a direct-pixel paste, regardless of colors. Thanks, gauauu, for finally making me do this. Secondly, Kawa's been badgering me (politely) about editing colors in raw hex rather than via RGB triplets. This can be found, for various bad reasons, under the Palette menu under ‘Advanced color edit’.

Both of these items have … interesting side effects. Through the former, you can replace colors of a given palette-entry by copy-all, swap and paste. The color-edit accepts multiple colors separated by white-space and, later on, by commas as well, meaning that accidentally it's now possible to take previously exported palettes and add them again. Yes, it's feep, but it's interesting, somewhat hidden and cheap feep, so that's alright. I'm thinking about adding something similar for the image itself as well (plus raw image imports), but only when I'm bored enough.


To show off the font exporter and the TTE system itself, there is a little demo of what it can do here. I'm still pondering over what it should and should not be able to do, but most of the things shown in the demo would be in the final version as well.


Oh, RE: that wordpress bug. A v2.3.1 has been released now with the fix. Interesting factoid: the error (#5088) was classified as “highestomgbbq”.

new tonclib

2007-10-05 – 22:55 | .

I've been making a lot of changes to tonclib – mostly adding, but also some removals. The most important changes are:

  • A more unified interface for the base drawing routines. Whereas I used to have something like bm8_foo(...), I now have bmp8_foo(..., void *dstBase, u32 dstPitch) for everything. Although the extra parameters make the routines a little slower, it makes it easier to switch video-modes.
  • A few color routines like blending/fading, convert to rgbscale (like grayscale, but for any color vector) and a few color adjustments.
  • I'm trying to include (well, annex, really) some of libgba's functionality. In terms of shared functionality, the libgba names can be used by including tonc_libgba.h. This is definitely not a finished item yet.
  • Tonc's Text Engine. I already had some basic routines for text on different video types, but this is a good deal better. Instead of having separate foo_puts() routines, TTE uses function pointers for placing glyphs on screen. This means there can be a single interface for all modes, and customizable writers. Already provides are glyph renderers for 8/16bit bitmaps and 4bit tiles, using a 1bpp bitpacked font. In principle, the renderers can handle any sized fixed and variable-width fonts (within reason: 128x128 fonts would be impractical, for example). There are also hooks for the stdio functions (printf, yay!) and some simple commands for positioning, color and font changes. Example of use:
    // Set-up 4bpp tile rendering bg 0 using cbb=0 and sbb=31.
    // The default options set implicitly here are: verdana 9 font, yellow for 
    // text color
    tte_init_chr4_dflt(0, BG_CBB(0)|BG_SBB(31));
    // Init stdio hooks
    tte_init_con();
    // Print something at position (10,10)
    iprintf("\\{P:10,10}'Ello world!, %d", 1337);
    
    Aside from the initializer, using TTE is basically independent of what you're writing with or on. Of course, all this stuff does have a fair amount of per-character overhead (about 150 cycles, I believe). It shouldn't be too hard to port TTE to NDS; I am planning to do this at some point.

There are more smaller changes here and there, but those are of lesser consequence.


tonclib 1.3 linky.
Next Page »

Powered by WordPress