Okay. Apparently, I am an idiot who can't do math.
One of the longer chapters in Tonc is
Mode 7 part 2, which covers
pretty much all the hairy details of producing mode 7 effects on the
GBA. The money shot for in terms of code is the following functions,
which calculates the affine parameters of the background for each
scanline in section 21.7.3.
IWRAM_CODE void m7_prep_affines(M7_LEVEL *level)
if(level->horizon >= SCREEN_HEIGHT)
int ii, ii0= (level->horizon>=0 ? level->horizon : 0);
M7_CAM *cam= level->camera;
FIXED xc= cam->pos.x, yc= cam->pos.y, zc=cam->pos.z;
BG_AFFINE *bga= &level->bgaff[ii0];
FIXED yb, zb; // b' = Rx(theta) * (L, ys, -D)
FIXED cf, sf, ct, st; // sines and cosines
FIXED lam, lcf, lsf; // scale and scaled (co)sine(phi)
cf= cam->u.x; sf= cam->u.z;
ct= cam->v.y; st= cam->w.y;
for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
yb= (ii-M7_TOP)*ct + M7_D*st;
lam= DivSafe( yc<<12, yb); // .12f <- OI!!!
lcf= lam*cf>>8; // .12f
lsf= lam*sf>>8; // .12f
bga->pa= lcf>>4; // .8f
bga->pc= lsf>>4; // .8f
zb= (ii-M7_TOP)*st - M7_D*ct; // .8f
bga->dx= xc + (lcf>>4)*M7_LEFT - (lsf*zb>>12); // .8f
bga->dy= zc + (lsf>>4)*M7_LEFT + (lcf*zb>>12); // .8f
// hack that I need for fog. pb and pd are unused anyway
For details on what all the terms mean, go the page in question.
For now, just note that call to
DivSafe() to calculate
the scaling factor λ and recall that division on the GBA is
pretty slow. In Mode 7 part 1,
I used a LUT, but here I figured that since the
can be anything thanks to the pitch you can't do that. After helping
Ruben with his mode 7 demo, it turns out that you can.
Fig 1. Sideview of the camera and floor. The camera is tilted slightly
down by angle θ.
Fig 1 shows the situation. There is a camera
(the black triangle) that is tilted down by pitch angle θ. I've
put the origin at the back of the camera because it makes things
easier to read. The
front of the camera is the projection plane, which is essentially
the screen. A ray is cast from the back of the camera on to the floor
and this ray intersects the projection plane. The coordinates
of this point are xp =
(yp, D) in projection plane space, which
corresponds to point (yb, zb) in
world space. This is simply rotating point xp by
θ. The scaling factor is the ratio between the y or
z coordinates of the points on the floor and on the projection
plane, so that's:
and for yb the rotation gives us:
where yc is the camera height,
yp is a scanline offset (measured from the center of the screen) and D is the focus
Now, the point is that while yb is variable
and non-integral when θ ≠ 0, it is still bounded! What's more,
you can easily calculate its maximum value, since it's simply the
maximum length of xp. Calling this factor R,
This factor R, rounded up, is the size of the required LUT.
In my particular case, I've used yp= scanline−80
and D = 256, which gives
R = sqrt((160−80)² + 256²)
= 268.2. In other words, I need a division LUT with 269 entries. Using .16
fixed point numbers for this LUT, the replacement code is essentially:
// The new division LUT. For 1/0 and 1/1, 0xFFFF is used.
0xFFFF, 0xFFFF, 0x8000, 0x5556, ...
// Inside the function
for(ii= ii0; ii<SCREEN_HEIGHT; ii++)
yb= (ii-M7_TOP)*ct + M7_D*st; // .8
lam= (yc*m7_div_lut[yb>>8])>>12; // .8*.16/.12 = .12
... // business as usual
At this point, several questions may arise.
What about negative yb? The beauty here
is that while yb may be negative in principle,
such values would correspond to lines above the horizon and we don't
calculate those anyway.
Won't non-integral yb cause inaccurate look-ups?
True, yb will have a fractional part that
is simply cut off during a simple look-up and some sort of
interpolation would be better. However, in testing there were no
noticeable differences between direct look-up, lerped look-up or
Div(), so the simplest method suffices.
Are .16 fixed point numbers enough?. Yes, apparently so.
ZOMG OVERFLOW! Are .16 fixed point numbers too high?
Technically, yes, there is a risk of overflow when the camera height
gets too high. However, at high altitudes the map is going to look
like crap anyway due to the low resolution of the screen.
Furthermore, the hardware only uses 8.8 fixeds, so scales above
256.0 wouldn't work anyway.
What do I win?
about 51k cycles. With the direct look-up this reduces to about 13k:
a speed increase by a factor of 4.
So yeah, this is what I should have figured out years ago, but
somehow kept overlooking it. I'm not sure if I'll add this whole thing to
Tonc's text and code, but I'll at least put up a link to here. Thanks
Ruben, for showing me how to do this properly.